Showing 1–2 of 2 results for author: Namba, H
-
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits
Authors:
Hiroyuki Namba,
Shota Horiguchi,
Masaki Hamamoto,
Masashi Egi
Abstract:
Data cleansing aims to improve model performance by removing a set of harmful instances from the training dataset. Data Shapley is a common theoretically guaranteed method to evaluate the contribution of each instance to model performance; however, it requires training on all subsets of the training data, which is computationally expensive. In this paper, we propose an iterativemethod to fast iden…
▽ More
Data cleansing aims to improve model performance by removing a set of harmful instances from the training dataset. Data Shapley is a common theoretically guaranteed method to evaluate the contribution of each instance to model performance; however, it requires training on all subsets of the training data, which is computationally expensive. In this paper, we propose an iterativemethod to fast identify a subset of instances with low data Shapley values by using the thresholding bandit algorithm. We provide a theoretical guarantee that the proposed method can accurately select harmful instances if a sufficiently large number of iterations is conducted. Empirical evaluation using various models and datasets demonstrated that the proposed method efficiently improved the computational speed while maintaining the model performance.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Shortest (A+B)-path packing via hafnian
Authors:
Hiroshi Hirai,
Hiroyuki Namba
Abstract:
Björklund and Husfeldt developed a randomized polynomial time algorithm to solve the shortest two disjoint paths problem. Their algorithm is based on computation of permanents modulo 4 and the isolation lemma. In this paper, we consider the following generalization of the shortest two disjoint paths problem, and develop a similar algebraic algorithm. The shortest perfect $(A+B)$-path packing probl…
▽ More
Björklund and Husfeldt developed a randomized polynomial time algorithm to solve the shortest two disjoint paths problem. Their algorithm is based on computation of permanents modulo 4 and the isolation lemma. In this paper, we consider the following generalization of the shortest two disjoint paths problem, and develop a similar algebraic algorithm. The shortest perfect $(A+B)$-path packing problem is: given an undirected graph $G$ and two disjoint node subsets $A,B$ with even cardinalities, find a shortest $|A|/2+|B|/2$ disjoint paths whose ends are both in $A$ or both in $B$. Besides its NP-hardness, we prove that this problem can be solved in randomized polynomial time if $|A|+|B|$ is fixed. Our algorithm basically follows the framework of Björklund and Husfeldt but uses a new technique: computation of hafnian modulo $2^k$ combined with Gallai's reduction from $T$-paths to matchings. We also generalize our technique for solving other path packing problems, and discuss its limitation.
△ Less
Submitted 13 June, 2017; v1 submitted 26 March, 2016;
originally announced March 2016.