-
Distributionally Robust Safe Sample Screening
Authors:
Hiroyuki Hanada,
Aoyama Tatsuya,
Akahane Satoshi,
Tomonari Tanaka,
Yoshito Okura,
Yu Inatsu,
Noriaki Hashimoto,
Shion Takeno,
Taro Murayama,
Hanju Lee,
Shinya Kojima,
Ichiro Takeuchi
Abstract:
In this study, we propose a machine learning method called Distributionally Robust Safe Sample Screening (DRSSS). DRSSS aims to identify unnecessary training samples, even when the distribution of the training samples changes in the future. To achieve this, we effectively combine the distributionally robust (DR) paradigm, which aims to enhance model robustness against variations in data distributi…
▽ More
In this study, we propose a machine learning method called Distributionally Robust Safe Sample Screening (DRSSS). DRSSS aims to identify unnecessary training samples, even when the distribution of the training samples changes in the future. To achieve this, we effectively combine the distributionally robust (DR) paradigm, which aims to enhance model robustness against variations in data distribution, with the safe sample screening (SSS), which identifies unnecessary training samples prior to model training. Since we need to consider an infinite number of scenarios regarding changes in the distribution, we applied SSS because it does not require model training after the change of the distribution. In this paper, we employed the covariate shift framework to represent the distribution of training samples and reformulated the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the existing SSS technique to accommodate this weight uncertainty, the DRSSS method is capable of reliably identifying unnecessary samples under any future distribution within a specified range. We provide a theoretical guarantee for the DRSSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Distributionally Robust Safe Screening
Authors:
Hiroyuki Hanada,
Satoshi Akahane,
Tatsuya Aoyama,
Tomonari Tanaka,
Yoshito Okura,
Yu Inatsu,
Noriaki Hashimoto,
Taro Murayama,
Lee Hanju,
Shinya Kojima,
Ichiro Takeuchi
Abstract:
In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples…
▽ More
In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Efficient Nonparametric Tensor Decomposition for Binary and Count Data
Authors:
Zerui Tao,
Toshihisa Tanaka,
Qibin Zhao
Abstract:
In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-l…
▽ More
In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Label Smoothing is Robustification against Model Misspecification
Authors:
Ryoya Yamasaki,
Toshiyuki Tanaka
Abstract:
Label smoothing (LS) adopts smoothed targets in classification tasks. For example, in binary classification, instead of the one-hot target $(1,0)^\top$ used in conventional logistic regression (LR), LR with LS (LSLR) uses the smoothed target $(1-\fracα{2},\fracα{2})^\top$ with a smoothing level $α\in(0,1)$, which causes squeezing of values of the logit. Apart from the common regularization-based i…
▽ More
Label smoothing (LS) adopts smoothed targets in classification tasks. For example, in binary classification, instead of the one-hot target $(1,0)^\top$ used in conventional logistic regression (LR), LR with LS (LSLR) uses the smoothed target $(1-\fracα{2},\fracα{2})^\top$ with a smoothing level $α\in(0,1)$, which causes squeezing of values of the logit. Apart from the common regularization-based interpretation of LS that leads to an inconsistent probability estimator, we regard LSLR as modifying the loss function and consistent estimator for probability estimation. In order to study the significance of each of these two modifications by LSLR, we introduce a modified LSLR (MLSLR) that uses the same loss function as LSLR and the same consistent estimator as LR, while not squeezing the logits. For the loss function modification, we theoretically show that MLSLR with a larger smoothing level has lower efficiency with correctly-specified models, while it exhibits higher robustness against model misspecification than LR. Also, for the modification of the probability estimator, an experimental comparison between LSLR and MLSLR showed that this modification and squeezing of the logits in LSLR have negative effects on the probability estimation and classification performance. The understanding of the properties of LS provided by these comparisons allows us to propose MLSLR as an improvement over LSLR.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Convergence Analysis of Mean Shift
Authors:
Ryoya Yamasaki,
Toshiyuki Tanaka
Abstract:
The mean shift (MS) algorithm seeks a mode of the kernel density estimate (KDE). This study presents a convergence guarantee of the mode estimate sequence generated by the MS algorithm and an evaluation of the convergence rate, under fairly mild conditions, with the help of the argument concerning the Łojasiewicz inequality. Our findings extend existing ones covering analytic kernels and the Epane…
▽ More
The mean shift (MS) algorithm seeks a mode of the kernel density estimate (KDE). This study presents a convergence guarantee of the mode estimate sequence generated by the MS algorithm and an evaluation of the convergence rate, under fairly mild conditions, with the help of the argument concerning the Łojasiewicz inequality. Our findings extend existing ones covering analytic kernels and the Epanechnikov kernel. Those are significant in that they cover the biweight kernel, which is optimal among non-negative kernels in terms of the asymptotic statistical efficiency for the KDE-based mode estimation.
△ Less
Submitted 7 November, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Optimal Kernel for Kernel-Based Modal Statistical Methods
Authors:
Ryoya Yamasaki,
Toshiyuki Tanaka
Abstract:
Kernel-based modal statistical methods include mode estimation, regression, and clustering. Estimation accuracy of these methods depends on the kernel used as well as the bandwidth. We study effect of the selection of the kernel function to the estimation accuracy of these methods. In particular, we theoretically show a (multivariate) optimal kernel that minimizes its analytically-obtained asympto…
▽ More
Kernel-based modal statistical methods include mode estimation, regression, and clustering. Estimation accuracy of these methods depends on the kernel used as well as the bandwidth. We study effect of the selection of the kernel function to the estimation accuracy of these methods. In particular, we theoretically show a (multivariate) optimal kernel that minimizes its analytically-obtained asymptotic error criterion when using an optimal bandwidth, among a certain kernel class defined via the number of its sign changes.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints
Authors:
Suguru Yasutomi,
Toshihisa Tanaka
Abstract:
Extracting fine-grained features such as styles from unlabeled data is crucial for data analysis. Unsupervised methods such as variational autoencoders (VAEs) can extract styles that are usually mixed with other features. Conditional VAEs (CVAEs) can isolate styles using class labels; however, there are no established methods to extract only styles using unlabeled data. In this paper, we propose a…
▽ More
Extracting fine-grained features such as styles from unlabeled data is crucial for data analysis. Unsupervised methods such as variational autoencoders (VAEs) can extract styles that are usually mixed with other features. Conditional VAEs (CVAEs) can isolate styles using class labels; however, there are no established methods to extract only styles using unlabeled data. In this paper, we propose a CVAE-based method that extracts style features using only unlabeled data. The proposed model consists of a contrastive learning (CL) part that extracts style-independent features and a CVAE part that extracts style features. The CL model learns representations independent of data augmentation, which can be viewed as a perturbation in styles, in a self-supervised manner. Considering the style-independent features from the pretrained CL model as a condition, the CVAE learns to extract only styles. Additionally, we introduce a constraint based on mutual information between the CL and VAE features to prevent the CVAE from ignoring the condition. Experiments conducted using two simple datasets, MNIST and an original dataset based on Google Fonts, demonstrate that the proposed method can efficiently extract style features. Further experiments using real-world natural image datasets were also conducted to illustrate the method's extendability.
△ Less
Submitted 16 March, 2023; v1 submitted 3 February, 2023;
originally announced March 2023.
-
Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains
Authors:
Yusuke Tanaka,
Toshiyuki Tanaka,
Tomoharu Iwata,
Takeshi Kurashima,
Maya Okawa,
Yasunori Akagi,
Hiroyuki Toda
Abstract:
Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process…
▽ More
Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Bei**g and various kinds of spatial datasets from New York City and Chicago.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Assessment of design and analysis frameworks for on-farm experimentation through a simulation study of wheat yield in Japan
Authors:
Takashi S. T. Tanaka
Abstract:
On-farm experiments can provide farmers with information on more efficient crop management in their own fields. Developments in precision agricultural technologies, such as yield monitoring and variable-rate application technology, allow farmers to implement on-farm experiments. Research frameworks including the experimental design and the statistical analysis method strongly influences the precis…
▽ More
On-farm experiments can provide farmers with information on more efficient crop management in their own fields. Developments in precision agricultural technologies, such as yield monitoring and variable-rate application technology, allow farmers to implement on-farm experiments. Research frameworks including the experimental design and the statistical analysis method strongly influences the precision of the experiment. Conventional statistical approaches (e.g., ordinary least squares regression) may not be appropriate for on-farm experiments because they are not capable of accurately accounting for the underlying spatial variation in a particular response variable (e.g., yield data). The effects of experimental designs and statistical approaches on type I error rates and estimation accuracy were explored through a simulation study hypothetically conducted on experiments in three wheat fields in Japan. Isotropic and anisotropic spatial linear mixed models were established for comparison with ordinary least squares regression models. The repeated designs were not sufficient to reduce both the risk of a type I error and the estimation bias on their own. A combination of a repeated design and an anisotropic model is sometimes required to improve the precision of the experiments. Model selection should be performed to determine whether the anisotropic model is required for analysis of any specific field. The anisotropic model had larger standard errors than the other models, especially when the estimates had large biases. This finding highlights an advantage of anisotropic models since they enable experimenters to cautiously consider the reliability of the estimates when they have a large bias.
△ Less
Submitted 15 March, 2021; v1 submitted 27 April, 2020;
originally announced April 2020.
-
YuruGAN: Yuru-Chara Mascot Generator Using Generative Adversarial Networks With Clustering Small Dataset
Authors:
Yuki Hagiwara,
Toshihisa Tanaka
Abstract:
A yuru-chara is a mascot character created by local governments and companies for publicizing information on areas and products. Because it takes various costs to create a yuruchara, the utilization of machine learning techniques such as generative adversarial networks (GANs) can be expected. In recent years, it has been reported that the use of class conditions in a dataset for GANs training stab…
▽ More
A yuru-chara is a mascot character created by local governments and companies for publicizing information on areas and products. Because it takes various costs to create a yuruchara, the utilization of machine learning techniques such as generative adversarial networks (GANs) can be expected. In recent years, it has been reported that the use of class conditions in a dataset for GANs training stabilizes learning and improves the quality of the generated images. However, it is difficult to apply class conditional GANs when the amount of original data is small and when a clear class is not given, such as a yuruchara image. In this paper, we propose a class conditional GAN based on clustering and data augmentation. Specifically, first, we performed clustering based on K-means++ on the yuru-chara image dataset and converted it into a class conditional dataset. Next, data augmentation was performed on the class conditional dataset so that the amount of data was increased five times. In addition, we built a model that incorporates ResBlock and self-attention into a network based on class conditional GAN and trained the class conditional yuru-chara dataset. As a result of evaluating the generated images, the effect on the generated images by the difference of the clustering method was confirmed.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS Algorithm
Authors:
Ryoya Yamasaki,
Toshiyuki Tanaka
Abstract:
Modal linear regression (MLR) is a method for obtaining a conditional mode predictor as a linear model. We study kernel selection for MLR from two perspectives: "which kernel achieves smaller error?" and "which kernel is computationally efficient?". First, we show that a Biweight kernel is optimal in the sense of minimizing an asymptotic mean squared error of a resulting MLR parameter. This result…
▽ More
Modal linear regression (MLR) is a method for obtaining a conditional mode predictor as a linear model. We study kernel selection for MLR from two perspectives: "which kernel achieves smaller error?" and "which kernel is computationally efficient?". First, we show that a Biweight kernel is optimal in the sense of minimizing an asymptotic mean squared error of a resulting MLR parameter. This result is derived from our refined analysis of an asymptotic statistical behavior of MLR. Secondly, we provide a kernel class for which iteratively reweighted least-squares algorithm (IRLS) is guaranteed to converge, and especially prove that IRLS with an Epanechnikov kernel terminates in a finite number of iterations. Simulation studies empirically verified that using a Biweight kernel provides good estimation accuracy and that using an Epanechnikov kernel is computationally efficient. Our results improve MLR of which existing studies often stick to a Gaussian kernel and modal EM algorithm specialized for it, by providing guidelines of kernel selection.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs
Authors:
Yusuke Tanaka,
Toshiyuki Tanaka,
Tomoharu Iwata,
Takeshi Kurashima,
Maya Okawa,
Yasunori Akagi,
Hiroyuki Toda
Abstract:
We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models can only utilize the sufficiently fine-grained auxiliary data sets on the same domain (e.g., a city). With the proposed model, the functions for respective areal d…
▽ More
We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models can only utilize the sufficiently fine-grained auxiliary data sets on the same domain (e.g., a city). With the proposed model, the functions for respective areal data sets are assumed to be a multivariate dependent Gaussian process (GP) that is modeled as a linear mixing of independent latent GPs. Sharing of latent GPs across multiple areal data sets allows us to effectively estimate the spatial correlation for each areal data set; moreover it can easily be extended to transfer learning across multiple domains. To handle the multivariate areal data, we design an observation model with a spatial aggregation process for each areal data set, which is an integral of the mixed GP over the corresponding region. By deriving the posterior GP, we can predict the data value at any location point by considering the spatial correlations and the dependences between areal data sets, simultaneously. Our experiments on real-world data sets demonstrate that our model can 1) accurately refine coarse-grained areal data, and 2) offer performance improvements by using the areal data sets from multiple domains.
△ Less
Submitted 7 January, 2020; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Efficient logic architecture in training gradient boosting decision tree for high-performance and edge computing
Authors:
Takuya Tanaka,
Ryosuke Kasahara,
Daishiro Kobayashi
Abstract:
This study proposes a logic architecture for the high-speed and power efficiently training of a gradient boosting decision tree model of binary classification. We implemented the proposed logic architecture on an FPGA and compared training time and power efficiency with three general GBDT software libraries using CPU and GPU. The training speed of the logic architecture on the FPGA was 26-259 time…
▽ More
This study proposes a logic architecture for the high-speed and power efficiently training of a gradient boosting decision tree model of binary classification. We implemented the proposed logic architecture on an FPGA and compared training time and power efficiency with three general GBDT software libraries using CPU and GPU. The training speed of the logic architecture on the FPGA was 26-259 times faster than the software libraries. The power efficiency of the logic architecture was 90-1,104 times higher than the software libraries. The results show that the logic architecture suits for high-performance and edge computing.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
Refining Coarse-grained Spatial Data using Auxiliary Spatial Data Sets with Various Granularities
Authors:
Yusuke Tanaka,
Tomoharu Iwata,
Toshiyuki Tanaka,
Takeshi Kurashima,
Maya Okawa,
Hiroyuki Toda
Abstract:
We propose a probabilistic model for refining coarse-grained spatial data by utilizing auxiliary spatial data sets. Existing methods require that the spatial granularities of the auxiliary data sets are the same as the desired granularity of target data. The proposed model can effectively make use of auxiliary data sets with various granularities by hierarchically incorporating Gaussian processes.…
▽ More
We propose a probabilistic model for refining coarse-grained spatial data by utilizing auxiliary spatial data sets. Existing methods require that the spatial granularities of the auxiliary data sets are the same as the desired granularity of target data. The proposed model can effectively make use of auxiliary data sets with various granularities by hierarchically incorporating Gaussian processes. With the proposed model, a distribution for each auxiliary data set on the continuous space is modeled using a Gaussian process, where the representation of uncertainty considers the levels of granularity. The fine-grained target data are modeled by another Gaussian process that considers both the spatial correlation and the auxiliary data sets with their uncertainty. We integrate the Gaussian process with a spatial aggregation process that transforms the fine-grained target data into the coarse-grained target data, by which we can infer the fine-grained target Gaussian process from the coarse-grained data. Our model is designed such that the inference of model parameters based on the exact marginal likelihood is possible, in which the variables of fine-grained target and auxiliary data are analytically integrated out. Our experiments on real-world spatial data sets demonstrate the effectiveness of the proposed model.
△ Less
Submitted 17 July, 2019; v1 submitted 21 September, 2018;
originally announced September 2018.
-
Generalized Gaussian Kernel Adaptive Filtering
Authors:
Tomoya Wada,
Kosuke Fukumori,
Toshihisa Tanaka,
Simone Fiori
Abstract:
The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type…
▽ More
The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positive-definiteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1-regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Parametric Return Density Estimation for Reinforcement Learning
Authors:
Tetsuro Morimura,
Masashi Sugiyama,
Hisashi Kashima,
Hirotaka Hachiya,
Toshiyuki Tanaka
Abstract:
Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision-making rules in terms of the expected returns. However, especially for risk management purposes, other risk-sensitive criteria such as the value-at-risk or the expected shortfall are sometimes preferred in real applications. Here, we describe a parametric method for estimating density of the returns, which allows us…
▽ More
Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision-making rules in terms of the expected returns. However, especially for risk management purposes, other risk-sensitive criteria such as the value-at-risk or the expected shortfall are sometimes preferred in real applications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace distributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.