-
Meta-learning for Positive-unlabeled Classification
Authors:
Atsutoshi Kumagai,
Tomoharu Iwata,
Yasuhiro Fujiwara
Abstract:
We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient da…
▽ More
We propose a meta-learning method for positive and unlabeled (PU) classification, which improves the performance of binary classifiers obtained from only PU data in unseen target tasks. PU learning is an important problem since PU data naturally arise in real-world applications such as outlier detection and information retrieval. Existing PU learning methods require many PU data, but sufficient data are often unavailable in practice. The proposed method minimizes the test classification risk after the model is adapted to PU data by using related tasks that consist of positive, negative, and unlabeled data. We formulate the adaptation as an estimation problem of the Bayes optimal classifier, which is an optimal classifier to minimize the classification risk. The proposed method embeds each instance into a task-specific space using neural networks. With the embedded PU data, the Bayes optimal classifier is estimated through density-ratio estimation of PU densities, whose solution is obtained as a closed-form solution. The closed-form solution enables us to efficiently and effectively minimize the test classification risk. We empirically show that the proposed method outperforms existing methods with one synthetic and three real-world datasets.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data
Authors:
Hiroshi Takahashi,
Tomoharu Iwata,
Atsutoshi Kumagai,
Yuuki Yamanaka
Abstract:
Semi-supervised anomaly detection, which aims to improve the performance of the anomaly detector by using a small amount of anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that unlabeled data are mostly normal. They train the anomaly detector to minimize the anomaly scores for the unlabeled data, and to maximize those for the anomaly…
▽ More
Semi-supervised anomaly detection, which aims to improve the performance of the anomaly detector by using a small amount of anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that unlabeled data are mostly normal. They train the anomaly detector to minimize the anomaly scores for the unlabeled data, and to maximize those for the anomaly data. However, in practice, the unlabeled data are often contaminated with anomalies. This weakens the effect of maximizing the anomaly scores for anomalies, and prevents us from improving the detection performance. To solve this problem, we propose the positive-unlabeled autoencoder, which is based on positive-unlabeled learning and the anomaly detector such as the autoencoder. With our approach, we can approximate the anomaly scores for normal data using the unlabeled and anomaly data. Therefore, without the labeled normal data, we can train the anomaly detector to minimize the anomaly scores for normal data, and to maximize those for the anomaly data. In addition, our approach is applicable to various anomaly detectors such as the DeepSVDD. Experiments on various datasets show that our approach achieves better detection performance than existing approaches.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Neural Operators Meet Energy-based Theory: Operator Learning for Hamiltonian and Dissipative PDEs
Authors:
Yusuke Tanaka,
Takaharu Yaguchi,
Tomoharu Iwata,
Naonori Ueda
Abstract:
The operator learning has received significant attention in recent years, with the aim of learning a map** between function spaces. Prior works have proposed deep neural networks (DNNs) for learning such a map**, enabling the learning of solution operators of partial differential equations (PDEs). However, these works still struggle to learn dynamics that obeys the laws of physics. This paper…
▽ More
The operator learning has received significant attention in recent years, with the aim of learning a map** between function spaces. Prior works have proposed deep neural networks (DNNs) for learning such a map**, enabling the learning of solution operators of partial differential equations (PDEs). However, these works still struggle to learn dynamics that obeys the laws of physics. This paper proposes Energy-consistent Neural Operators (ENOs), a general framework for learning solution operators of PDEs that follows the energy conservation or dissipation law from observed solution trajectories. We introduce a novel penalty function inspired by the energy-based theory of physics for training, in which the energy functional is modeled by another DNN, allowing one to bias the outputs of the DNN-based solution operators to ensure energetic consistency without explicit PDEs. Experiments on multiple physical systems show that ENO outperforms existing DNN models in predicting solutions from data, especially in super-resolution settings.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Meta-Learning for Neural Network-based Temporal Point Processes
Authors:
Yoshiaki Takimoto,
Yusuke Tanaka,
Tomoharu Iwata,
Maya Okawa,
Hideaki Kim,
Hiroyuki Toda,
Takeshi Kurashima
Abstract:
Human activities generate various event sequences such as taxi trip records, bike-sharing pick-ups, crime occurrence, and infectious disease transmission. The point process is widely used in many applications to predict such events related to human activities. However, point processes present two problems in predicting events related to human activities. First, recent high-performance point proces…
▽ More
Human activities generate various event sequences such as taxi trip records, bike-sharing pick-ups, crime occurrence, and infectious disease transmission. The point process is widely used in many applications to predict such events related to human activities. However, point processes present two problems in predicting events related to human activities. First, recent high-performance point process models require the input of sufficient numbers of events collected over a long period (i.e., long sequences) for training, which are often unavailable in realistic situations. Second, the long-term predictions required in real-world applications are difficult. To tackle these problems, we propose a novel meta-learning approach for periodicity-aware prediction of future events given short sequences. The proposed method first embeds short sequences into hidden representations (i.e., task representations) via recurrent neural networks for creating predictions from short sequences. It then models the intensity of the point process by monotonic neural networks (MNNs), with the input being the task representations. We transfer the prior knowledge learned from related tasks and can improve event prediction given short sequences of target tasks. We design the MNNs to explicitly take temporal periodic patterns into account, contributing to improved long-term prediction performance. Experiments on multiple real-world datasets demonstrate that the proposed method has higher prediction performance than existing alternatives.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Meta-learning to Calibrate Gaussian Processes with Deep Kernels for Regression Uncertainty Estimation
Authors:
Tomoharu Iwata,
Atsutoshi Kumagai
Abstract:
Although Gaussian processes (GPs) with deep kernels have been successfully used for meta-learning in regression tasks, its uncertainty estimation performance can be poor. We propose a meta-learning method for calibrating deep kernel GPs for improving regression uncertainty estimation performance with a limited number of training data. The proposed method meta-learns how to calibrate uncertainty us…
▽ More
Although Gaussian processes (GPs) with deep kernels have been successfully used for meta-learning in regression tasks, its uncertainty estimation performance can be poor. We propose a meta-learning method for calibrating deep kernel GPs for improving regression uncertainty estimation performance with a limited number of training data. The proposed method meta-learns how to calibrate uncertainty using data from various tasks by minimizing the test expected calibration error, and uses the knowledge for unseen tasks. We design our model such that the adaptation and calibration for each task can be performed without iterative procedures, which enables effective meta-learning. In particular, a task-specific uncalibrated output distribution is modeled by a GP with a task-shared encoder network, and it is transformed to a calibrated one using a cumulative density function of a task-specific Gaussian mixture model (GMM). By integrating the GP and GMM into our neural network-based model, we can meta-learn model parameters in an end-to-end fashion. Our experiments demonstrate that the proposed method improves uncertainty estimation performance while kee** high regression performance compared with the existing methods using real-world datasets in few-shot settings.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Meta-learning of semi-supervised learning from tasks with heterogeneous attribute spaces
Authors:
Tomoharu Iwata,
Atsutoshi Kumagai
Abstract:
We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space, which prevents us from learning with a wide variety of tasks. With the proposed method, the expected test performance on tasks with a small amount of labeled data…
▽ More
We propose a meta-learning method for semi-supervised learning that learns from multiple tasks with heterogeneous attribute spaces. The existing semi-supervised meta-learning methods assume that all tasks share the same attribute space, which prevents us from learning with a wide variety of tasks. With the proposed method, the expected test performance on tasks with a small amount of labeled data is improved with unlabeled data as well as data in various tasks, where the attribute spaces are different among tasks. The proposed method embeds labeled and unlabeled data simultaneously in a task-specific space using a neural network, and the unlabeled data's labels are estimated by adapting classification or regression models in the embedding space. For the neural network, we develop variable-feature self-attention layers, which enable us to find embeddings of data with different attribute spaces with a single neural network by considering interactions among examples, attributes, and labels. Our experiments on classification and regression datasets with heterogeneous attribute spaces demonstrate that our proposed method outperforms the existing meta-learning and semi-supervised learning methods.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs
Authors:
Tomoharu Iwata,
Yusuke Tanaka,
Naonori Ueda
Abstract:
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coef…
▽ More
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coefficients of a polynomial function of partial derivatives, and boundary conditions are represented by a set of point-condition pairs. We use the problem representation as an input of a neural network for predicting solutions, which enables us to efficiently predict problem-specific solutions by the forwarding process of the neural network without updating model parameters. To train our model, we minimize the expected error when adapted to a PDE problem based on the physics-informed neural network framework, by which we can evaluate the error even when solutions are unknown. We demonstrate that our proposed method outperforms existing methods in predicting solutions of PDE problems.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Explanation-based Training with Differentiable Insertion/Deletion Metric-aware Regularizers
Authors:
Yuya Yoshikawa,
Tomoharu Iwata
Abstract:
The quality of explanations for the predictions made by complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how accurately the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes…
▽ More
The quality of explanations for the predictions made by complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how accurately the explanations reflect the predictor's behavior. To improve the faithfulness, we propose insertion/deletion metric-aware explanation-based optimization (ID-ExpO), which optimizes differentiable predictors to improve both the insertion and deletion scores of the explanations while maintaining their predictive accuracy. Because the original insertion and deletion metrics are non-differentiable with respect to the explanations and directly unavailable for gradient-based optimization, we extend the metrics so that they are differentiable and use them to formalize insertion and deletion metric-based regularizers. Our experimental results on image and tabular datasets show that the deep neural network-based predictors that are fine-tuned using ID-ExpO enable popular post-hoc explainers to produce more faithful and easier-to-interpret explanations while maintaining high predictive accuracy. The code is available at https://github.com/yuyay/idexpo.
△ Less
Submitted 11 March, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Information-theoretic Analysis of Test Data Sensitivity in Uncertainty
Authors:
Futoshi Futami,
Tomoharu Iwata
Abstract:
Bayesian inference is often utilized for uncertainty quantification tasks. A recent analysis by Xu and Raginsky 2022 rigorously decomposed the predictive uncertainty in Bayesian inference into two uncertainties, called aleatoric and epistemic uncertainties, which represent the inherent randomness in the data-generating process and the variability due to insufficient data, respectively. They analyz…
▽ More
Bayesian inference is often utilized for uncertainty quantification tasks. A recent analysis by Xu and Raginsky 2022 rigorously decomposed the predictive uncertainty in Bayesian inference into two uncertainties, called aleatoric and epistemic uncertainties, which represent the inherent randomness in the data-generating process and the variability due to insufficient data, respectively. They analyzed those uncertainties in an information-theoretic way, assuming that the model is well-specified and treating the model's parameters as latent variables. However, the existing information-theoretic analysis of uncertainty cannot explain the widely believed property of uncertainty, known as the sensitivity between the test and training data. It implies that when test data are similar to training data in some sense, the epistemic uncertainty should become small. In this work, we study such uncertainty sensitivity using our novel decomposition method for the predictive uncertainty. Our analysis successfully defines such sensitivity using information-theoretic quantities. Furthermore, we extend the existing analysis of Bayesian meta-learning and show the novel sensitivities among tasks for the first time.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Meta-learning for heterogeneous treatment effect estimation with closed-form solvers
Authors:
Tomoharu Iwata,
Yoichi Chikahara
Abstract:
This article proposes a meta-learning method for estimating the conditional average treatment effect (CATE) from a few observational data. The proposed method learns how to estimate CATEs from multiple tasks and uses the knowledge for unseen tasks. In the proposed method, based on the meta-learner framework, we decompose the CATE estimation problem into sub-problems. For each sub-problem, we formu…
▽ More
This article proposes a meta-learning method for estimating the conditional average treatment effect (CATE) from a few observational data. The proposed method learns how to estimate CATEs from multiple tasks and uses the knowledge for unseen tasks. In the proposed method, based on the meta-learner framework, we decompose the CATE estimation problem into sub-problems. For each sub-problem, we formulate our estimation models using neural networks with task-shared and task-specific parameters. With our formulation, we can obtain optimal task-specific parameters in a closed form that are differentiable with respect to task-shared parameters, making it possible to perform effective meta-learning. The task-shared parameters are trained such that the expected CATE estimation performance in few-shot settings is improved by minimizing the difference between a CATE estimated with a large amount of data and one estimated with just a few data. Our experimental results demonstrate that our method outperforms the existing meta-learning approaches and CATE estimation methods.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Modeling Nonlinear Dynamics in Continuous Time with Inductive Biases on Decay Rates and/or Frequencies
Authors:
Tomoharu Iwata,
Yoshinobu Kawahara
Abstract:
We propose a neural network-based model for nonlinear dynamics in continuous time that can impose inductive biases on decay rates and/or frequencies. Inductive biases are helpful for training neural networks especially when training data are small. The proposed model is based on the Koopman operator theory, where the decay rate and frequency information is used by restricting the eigenvalues of th…
▽ More
We propose a neural network-based model for nonlinear dynamics in continuous time that can impose inductive biases on decay rates and/or frequencies. Inductive biases are helpful for training neural networks especially when training data are small. The proposed model is based on the Koopman operator theory, where the decay rate and frequency information is used by restricting the eigenvalues of the Koopman operator that describe linear evolution in a Koopman space. We use neural networks to find an appropriate Koopman space, which are trained by minimizing multi-step forecasting and backcasting errors using irregularly sampled time-series data. Experiments on various time-series datasets demonstrate that the proposed method achieves higher forecasting performance given a single short training sequence than the existing methods.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Linear Embedding-based High-dimensional Batch Bayesian Optimization without Reconstruction Map**s
Authors:
Shuhei A. Horiguchi,
Tomoharu Iwata,
Taku Tsuzuki,
Yosuke Ozawa
Abstract:
The optimization of high-dimensional black-box functions is a challenging problem. When a low-dimensional linear embedding structure can be assumed, existing Bayesian optimization (BO) methods often transform the original problem into optimization in a low-dimensional space. They exploit the low-dimensional structure and reduce the computational burden. However, we reveal that this approach could…
▽ More
The optimization of high-dimensional black-box functions is a challenging problem. When a low-dimensional linear embedding structure can be assumed, existing Bayesian optimization (BO) methods often transform the original problem into optimization in a low-dimensional space. They exploit the low-dimensional structure and reduce the computational burden. However, we reveal that this approach could be limited or inefficient in exploring the high-dimensional space mainly due to the biased reconstruction of the high-dimensional queries from the low-dimensional queries. In this paper, we investigate a simple alternative approach: tackling the problem in the original high-dimensional space using the information from the learned low-dimensional structure. We provide a theoretical analysis of the exploration ability. Furthermore, we show that our method is applicable to batch optimization problems with thousands of dimensions without any computational difficulty. We demonstrate the effectiveness of our method on high-dimensional benchmarks and a real-world function.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Active Learning for Regression with Aggregated Outputs
Authors:
Tomoharu Iwata
Abstract:
Due to the privacy protection or the difficulty of data collection, we cannot observe individual outputs for each instance, but we can observe aggregated outputs that are summed over multiple instances in a set in some real-world applications. To reduce the labeling cost for training regression models for such aggregated data, we propose an active learning method that sequentially selects sets to…
▽ More
Due to the privacy protection or the difficulty of data collection, we cannot observe individual outputs for each instance, but we can observe aggregated outputs that are summed over multiple instances in a set in some real-world applications. To reduce the labeling cost for training regression models for such aggregated data, we propose an active learning method that sequentially selects sets to be labeled to improve the predictive performance with fewer labeled sets. For the selection measurement, the proposed method uses the mutual information, which quantifies the reduction of the uncertainty of the model parameters by observing the aggregated output. With Bayesian linear basis functions for modeling outputs given an input, which include approximated Gaussian processes and neural networks, we can efficiently calculate the mutual information in a closed form. With the experiments using various datasets, we demonstrate that the proposed method achieves better predictive performance with fewer labeled sets than existing methods.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Data-driven End-to-end Learning of Pole Placement Control for Nonlinear Dynamics via Koopman Invariant Subspaces
Authors:
Tomoharu Iwata,
Yoshinobu Kawahara
Abstract:
We propose a data-driven method for controlling the frequency and convergence rate of black-box nonlinear dynamical systems based on the Koopman operator theory. With the proposed method, a policy network is trained such that the eigenvalues of a Koopman operator of controlled dynamics are close to the target eigenvalues. The policy network consists of a neural network to find a Koopman invariant…
▽ More
We propose a data-driven method for controlling the frequency and convergence rate of black-box nonlinear dynamical systems based on the Koopman operator theory. With the proposed method, a policy network is trained such that the eigenvalues of a Koopman operator of controlled dynamics are close to the target eigenvalues. The policy network consists of a neural network to find a Koopman invariant subspace, and a pole placement module to adjust the eigenvalues of the Koopman operator. Since the policy network is differentiable, we can train it in an end-to-end fashion using reinforcement learning. We demonstrate that the proposed method achieves better performance than model-free reinforcement learning and model-based control with system identification.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Predicting Opinion Dynamics via Sociologically-Informed Neural Networks
Authors:
Maya Okawa,
Tomoharu Iwata
Abstract:
Opinion formation and propagation are crucial phenomena in social networks and have been extensively studied across several disciplines. Traditionally, theoretical models of opinion dynamics have been proposed to describe the interactions between individuals (i.e., social interaction) and their impact on the evolution of collective opinions. Although these models can incorporate sociological and p…
▽ More
Opinion formation and propagation are crucial phenomena in social networks and have been extensively studied across several disciplines. Traditionally, theoretical models of opinion dynamics have been proposed to describe the interactions between individuals (i.e., social interaction) and their impact on the evolution of collective opinions. Although these models can incorporate sociological and psychological knowledge on the mechanisms of social interaction, they demand extensive calibration with real data to make reliable predictions, requiring much time and effort. Recently, the widespread use of social media platforms provides new paradigms to learn deep learning models from a large volume of social media data. However, these methods ignore any scientific knowledge about the mechanism of social interaction. In this work, we present the first hybrid method called Sociologically-Informed Neural Network (SINN), which integrates theoretical models and social media data by transporting the concepts of physics-informed neural networks (PINNs) from natural science (i.e., physics) into social science (i.e., sociology and social psychology). In particular, we recast theoretical models as ordinary differential equations (ODEs). Then we train a neural network that simultaneously approximates the data and conforms to the ODEs that represent the social scientific knowledge. In addition, we extend PINNs by integrating matrix factorization and a language model to incorporate rich side information (e.g., user profiles) and structural knowledge (e.g., cluster structure of the social interaction network). Moreover, we develop an end-to-end training procedure for SINN, which involves Gumbel-Softmax approximation to include stochastic mechanisms of social interaction. Extensive experiments on real-world and synthetic datasets show SINN outperforms six baseline methods in predicting opinion dynamics.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains
Authors:
Yusuke Tanaka,
Toshiyuki Tanaka,
Tomoharu Iwata,
Takeshi Kurashima,
Maya Okawa,
Yasunori Akagi,
Hiroyuki Toda
Abstract:
Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process…
▽ More
Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Bei**g and various kinds of spatial datasets from New York City and Chicago.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Meta-learning for Out-of-Distribution Detection via Density Estimation in Latent Space
Authors:
Tomoharu Iwata,
Atsutoshi Kumagai
Abstract:
Many neural network-based out-of-distribution (OoD) detection methods have been proposed. However, they require many training data for each target task. We propose a simple yet effective meta-learning method to detect OoD with small in-distribution data in a target task. With the proposed method, the OoD detection is performed by density estimation in a latent space. A neural network shared among…
▽ More
Many neural network-based out-of-distribution (OoD) detection methods have been proposed. However, they require many training data for each target task. We propose a simple yet effective meta-learning method to detect OoD with small in-distribution data in a target task. With the proposed method, the OoD detection is performed by density estimation in a latent space. A neural network shared among all tasks is used to flexibly map instances in the original space to the latent space. The neural network is meta-learned such that the expected OoD detection performance is improved by using various tasks that are different from the target tasks. This meta-learning procedure enables us to obtain appropriate representations in the latent space for OoD detection. For density estimation, we use a Gaussian mixture model (GMM) with full covariance for each class. We can adapt the GMM parameters to in-distribution data in each task in a closed form by maximizing the likelihood. Since the closed form solution is differentiable, we can meta-learn the neural network efficiently with a stochastic gradient descent method by incorporating the solution into the meta-learning objective function. In experiments using six datasets, we demonstrate that the proposed method achieves better performance than existing meta-learning and OoD detection methods.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Excess risk analysis for epistemic uncertainty with application to variational inference
Authors:
Futoshi Futami,
Tomoharu Iwata,
Naonori Ueda,
Issei Sato,
Masashi Sugiyama
Abstract:
Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis…
▽ More
Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis exists on EU, although many numerical experiments have been conducted on it. In this study, we analyze the EU of supervised learning in approximate Bayesian inference by focusing on its excess risk. First, we theoretically show the novel relations between generalization error and the widely used EU measurements, such as the variance and mutual information of predictive distribution, and derive their convergence behaviors. Next, we clarify how the objective function of VI regularizes the EU. With this analysis, we propose a new objective function for VI that directly controls the prediction performance and the EU based on the PAC-Bayesian theory. Numerical experiments show that our algorithm significantly improves the EU evaluation over the existing VI methods.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider
Authors:
C. Fanelli,
Z. Papandreou,
K. Suresh,
J. K. Adkins,
Y. Akiba,
A. Albataineh,
M. Amaryan,
I. C. Arsene,
C. Ayerbe Gayoso,
J. Bae,
X. Bai,
M. D. Baker,
M. Bashkanov,
R. Bellwied,
F. Benmokhtar,
V. Berdnikov,
J. C. Bernauer,
F. Bock,
W. Boeglin,
M. Borysova,
E. Brash,
P. Brindza,
W. J. Briscoe,
M. Brooks,
S. Bueltmann
, et al. (258 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to…
▽ More
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector.
△ Less
Submitted 19 May, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model
Authors:
Keisuke Kinoshita,
Marc Delcroix,
Tomoharu Iwata
Abstract:
Speaker diarization has been investigated extensively as an important central task for meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and clustering-based diarization is a promising approach to handle realistic conversational data containing overlapped speech with an arbitrarily large number of speakers, and achieved state-of-the-art results on various tasks. How…
▽ More
Speaker diarization has been investigated extensively as an important central task for meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and clustering-based diarization is a promising approach to handle realistic conversational data containing overlapped speech with an arbitrarily large number of speakers, and achieved state-of-the-art results on various tasks. However, the approaches proposed so far have not realized {\it tight} integration yet, because the clustering employed therein was not optimal in any sense for clustering the speaker embeddings estimated by the EEND module. To address this problem, this paper introduces a {\it trainable} clustering algorithm into the integration framework, by deep-unfolding a non-parametric Bayesian model called the infinite Gaussian mixture model (iGMM). Specifically, the speaker embeddings are optimized during training such that it better fits iGMM clustering, based on a novel clustering loss based on Adjusted Rand Index (ARI). Experimental results based on CALLHOME data show that the proposed approach outperforms the conventional approach in terms of diarization error rate (DER), especially by substantially reducing speaker confusion errors, that indeed reflects the effectiveness of the proposed iGMM integration.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Training Deep Models to be Explained with Fewer Examples
Authors:
Tomoharu Iwata,
Yuya Yoshikawa
Abstract:
Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a pr…
▽ More
Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while kee** the predictive performance.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Evacuation Shelter Scheduling Problem
Authors:
Hitoshi Shimizu,
Hirohiko Suwa,
Tomoharu Iwata,
Akinori Fu**o,
Hiroshi Sawada,
Keiichi Yasumoto
Abstract:
Evacuation shelters, which are urgently required during natural disasters, are designed to minimize the burden of evacuation on human survivors. However, the larger the scale of the disaster, the more costly it becomes to operate shelters. When the number of evacuees decreases, the operation costs can be reduced by moving the remaining evacuees to other shelters and closing shelters as quickly as…
▽ More
Evacuation shelters, which are urgently required during natural disasters, are designed to minimize the burden of evacuation on human survivors. However, the larger the scale of the disaster, the more costly it becomes to operate shelters. When the number of evacuees decreases, the operation costs can be reduced by moving the remaining evacuees to other shelters and closing shelters as quickly as possible. On the other hand, relocation between shelters imposes a huge emotional burden on evacuees. In this study, we formulate the "Evacuation Shelter Scheduling Problem," which allocates evacuees to shelters in such a way to minimize the movement costs of the evacuees and the operation costs of the shelters. Since it is difficult to solve this quadratic programming problem directly, we show its transformation into a 0-1 integer programming problem. In addition, such a formulation struggles to calculate the burden of relocating them from historical data because no payments are actually made. To solve this issue, we propose a method that estimates movement costs based on the numbers of evacuees and shelters during an actual disaster. Simulation experiments with records from the Kobe earthquake (Great Hanshin-Awaji Earthquake) showed that our proposed method reduced operation costs by 33.7 million dollars: 32%.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian Optimization
Authors:
Tomoharu Iwata
Abstract:
For Bayesian optimization (BO) on high-dimensional data with complex structure, neural network-based kernels for Gaussian processes (GPs) have been used to learn flexible surrogate functions by the high representation power of deep learning. However, existing methods train neural networks by maximizing the marginal likelihood, which do not directly improve the BO performance. In this paper, we pro…
▽ More
For Bayesian optimization (BO) on high-dimensional data with complex structure, neural network-based kernels for Gaussian processes (GPs) have been used to learn flexible surrogate functions by the high representation power of deep learning. However, existing methods train neural networks by maximizing the marginal likelihood, which do not directly improve the BO performance. In this paper, we propose a meta-learning method for BO with neural network-based kernels that minimizes the expected gap between the true optimum value and the best value found by BO. We model a policy, which takes the current evaluated data points as input and outputs the next data point to be evaluated, by a neural network, where neural network-based kernels, GPs, and mutual information-based acquisition functions are used as its layers. With our model, the neural network-based kernel is trained to be appropriate for the acquisition function by backpropagating the gap through the acquisition function and GP. Our model is trained by a reinforcement learning framework from multiple tasks. Since the neural network is shared across different tasks, we can gather knowledge on BO from multiple training tasks, and use the knowledge for unseen test tasks. In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.
△ Less
Submitted 31 October, 2021;
originally announced November 2021.
-
Few-shot Learning for Unsupervised Feature Selection
Authors:
Atsutoshi Kumagai,
Tomoharu Iwata,
Yasuhiro Fujiwara
Abstract:
We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instance…
▽ More
We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances by training with unlabeled instances in multiple source tasks. Our model consists of a feature selector and decoder. The feature selector outputs a subset of relevant features taking a few unlabeled instances as input such that the decoder can reconstruct the original features of unseen instances from the selected ones. The feature selector uses the Concrete random variables to select features via gradient descent. To encode task-specific properties from a few unlabeled instances to the model, the Concrete random variables and decoder are modeled using permutation-invariant neural networks that take a few unlabeled instances as input. Our model is trained by minimizing the expected test reconstruction error given a few unlabeled instances that is calculated with datasets in source tasks. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Meta-Learning for Relative Density-Ratio Estimation
Authors:
Atsutoshi Kumagai,
Tomoharu Iwata,
Yasuhiro Fujiwara
Abstract:
The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE)…
▽ More
The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets' information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Meta-learning for Matrix Factorization without Shared Rows or Columns
Authors:
Tomoharu Iwata
Abstract:
We propose a method that meta-learns a knowledge on matrix factorization from various matrices, and uses the knowledge for factorizing unseen matrices. The proposed method uses a neural network that takes a matrix as input, and generates prior distributions of factorized matrices of the given matrix. The neural network is meta-learned such that the expected imputation error is minimized when the f…
▽ More
We propose a method that meta-learns a knowledge on matrix factorization from various matrices, and uses the knowledge for factorizing unseen matrices. The proposed method uses a neural network that takes a matrix as input, and generates prior distributions of factorized matrices of the given matrix. The neural network is meta-learned such that the expected imputation error is minimized when the factorized matrices are adapted to each matrix by a maximum a posteriori (MAP) estimation. We use a gradient descent method for the MAP estimation, which enables us to backpropagate the expected imputation error through the gradient descent steps for updating neural network parameters since each gradient descent step is written in a closed form and is differentiable. The proposed method can meta-learn from matrices even when their rows and columns are not shared, and their sizes are different from each other. In our experiments with three user-item rating datasets, we demonstrate that our proposed method can impute the missing values from a limited number of observations in unseen matrices after being trained with different matrices.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Loss function based second-order Jensen inequality and its application to particle variational inference
Authors:
Futoshi Futami,
Tomoharu Iwata,
Naonori Ueda,
Issei Sato,
Masashi Sugiyama
Abstract:
Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure…
▽ More
Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.
△ Less
Submitted 9 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Dynamic Hawkes Processes for Discovering Time-evolving Communities' States behind Diffusion Processes
Authors:
Maya Okawa,
Tomoharu Iwata,
Yusuke Tanaka,
Hiroyuki Toda,
Takeshi Kurashima,
Hisashi Kashima
Abstract:
Sequences of events including infectious disease outbreaks, social network activities, and crimes are ubiquitous and the data on such events carry essential information about the underlying diffusion processes between communities (e.g., regions, online user groups). Modeling diffusion processes and predicting future events are crucial in many applications including epidemic control, viral marketin…
▽ More
Sequences of events including infectious disease outbreaks, social network activities, and crimes are ubiquitous and the data on such events carry essential information about the underlying diffusion processes between communities (e.g., regions, online user groups). Modeling diffusion processes and predicting future events are crucial in many applications including epidemic control, viral marketing, and predictive policing. Hawkes processes offer a central tool for modeling the diffusion processes, in which the influence from the past events is described by the triggering kernel. However, the triggering kernel parameters, which govern how each community is influenced by the past events, are assumed to be static over time. In the real world, the diffusion processes depend not only on the influences from the past, but also the current (time-evolving) states of the communities, e.g., people's awareness of the disease and people's current interests. In this paper, we propose a novel Hawkes process model that is able to capture the underlying dynamics of community states behind the diffusion processes and predict the occurrences of events based on the dynamics. Specifically, we model the latent dynamic function that encodes these hidden dynamics by a mixture of neural networks. Then we design the triggering kernel using the latent dynamic function and its integral. The proposed method, termed DHP (Dynamic Hawkes Processes), offers a flexible way to learn complex representations of the time-evolving communities' states, while at the same time it allows to computing the exact likelihood, which makes parameter learning tractable. Extensive experiments on four real-world event datasets show that DHP outperforms five widely adopted methods for event prediction.
△ Less
Submitted 6 June, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Few-shot Learning for Topic Modeling
Authors:
Tomoharu Iwata
Abstract:
Topic models have been successfully used for analyzing text documents. However, with existing topic models, many documents are required for training. In this paper, we propose a neural network-based few-shot learning method that can learn a topic model from just a few documents. The neural networks in our model take a small number of documents as inputs, and output topic model priors. The proposed…
▽ More
Topic models have been successfully used for analyzing text documents. However, with existing topic models, many documents are required for training. In this paper, we propose a neural network-based few-shot learning method that can learn a topic model from just a few documents. The neural networks in our model take a small number of documents as inputs, and output topic model priors. The proposed method trains the neural networks such that the expected test likelihood is improved when topic model parameters are estimated by maximizing the posterior probability using the priors based on the EM algorithm. Since each step in the EM algorithm is differentiable, the proposed method can backpropagate the loss through the EM algorithm to train the neural networks. The expected test likelihood is maximized by a stochastic gradient descent method using a set of multiple text corpora with an episodic training framework. In our experiments, we demonstrate that the proposed method achieves better perplexity than existing methods using three real-world text document sets.
△ Less
Submitted 18 April, 2021;
originally announced April 2021.
-
Meta-learning representations for clustering with infinite Gaussian mixture models
Authors:
Tomoharu Iwata
Abstract:
For better clustering performance, appropriate representations are critical. Although many neural network-based metric learning methods have been proposed, they do not directly train neural networks to improve clustering performance. We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves when the representations are clus…
▽ More
For better clustering performance, appropriate representations are critical. Although many neural network-based metric learning methods have been proposed, they do not directly train neural networks to improve clustering performance. We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves when the representations are clustered by the variational Bayesian (VB) inference with an infinite Gaussian mixture model. The proposed method can cluster unseen unlabeled data using knowledge meta-learned with labeled data that are different from the unlabeled data. For the objective function, we propose a continuous approximation of the adjusted Rand index (ARI), by which we can evaluate the clustering performance from soft clustering assignments. Since the approximated ARI and the VB inference procedure are differentiable, we can backpropagate the objective function through the VB inference procedure to train the neural networks. With experiments using text and image data sets, we demonstrate that our proposed method has a higher adjusted Rand index than existing methods do.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection
Authors:
Tomoharu Iwata,
Atsutoshi Kumagai
Abstract:
Neural network-based anomaly detection methods have shown to achieve high performance. However, they require a large amount of training data for each task. We propose a neural network-based meta-learning method for supervised anomaly detection. The proposed method improves the anomaly detection performance on unseen tasks, which contains a few labeled normal and anomalous instances, by meta-traini…
▽ More
Neural network-based anomaly detection methods have shown to achieve high performance. However, they require a large amount of training data for each task. We propose a neural network-based meta-learning method for supervised anomaly detection. The proposed method improves the anomaly detection performance on unseen tasks, which contains a few labeled normal and anomalous instances, by meta-training with various datasets. With a meta-learning framework, quick adaptation to each task and its effective backpropagation are important since the model is trained by the adaptation for each epoch. Our model enables them by formulating adaptation as a generalized eigenvalue problem with one-class classification; its global optimum solution is obtained, and the solver is differentiable. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods on various datasets.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Meta-Learning for Koopman Spectral Analysis with Short Time-series
Authors:
Tomoharu Iwata,
Yoshinobu Kawahara
Abstract:
Koopman spectral analysis has attracted attention for nonlinear dynamical systems since we can analyze nonlinear dynamics with a linear regime by embedding data into a Koopman space by a nonlinear function. For the analysis, we need to find appropriate embedding functions. Although several neural network-based methods have been proposed for learning embedding functions, existing methods require lo…
▽ More
Koopman spectral analysis has attracted attention for nonlinear dynamical systems since we can analyze nonlinear dynamics with a linear regime by embedding data into a Koopman space by a nonlinear function. For the analysis, we need to find appropriate embedding functions. Although several neural network-based methods have been proposed for learning embedding functions, existing methods require long time-series for training neural networks. This limitation prohibits performing Koopman spectral analysis in applications where only short time-series are available. In this paper, we propose a meta-learning method for estimating embedding functions from unseen short time-series by exploiting knowledge learned from related but different time-series. With the proposed method, a representation of a given short time-series is obtained by a bidirectional LSTM for extracting its properties. The embedding function of the short time-series is modeled by a neural network that depends on the time-series representation. By sharing the LSTM and neural networks across multiple time-series, we can learn common knowledge from different time-series while modeling time-series-specific embedding functions with the time-series representation. Our model is trained such that the expected test prediction error is minimized with the episodic training framework. We experimentally demonstrate that the proposed method achieves better performance in terms of eigenvalue estimation and future prediction than existing methods.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic Regression
Authors:
Masanori Yamada,
Sekitoshi Kanai,
Tomoharu Iwata,
Tomokatsu Takahashi,
Yuki Yamanaka,
Hiroshi Takahashi,
Atsutoshi Kumagai
Abstract:
Adversarial training is actively studied for learning robust models against adversarial examples. A recent study finds that adversarially trained models degenerate generalization performance on adversarial examples when their weight loss landscape, which is loss changes with respect to weights, is sharp. Unfortunately, it has been experimentally shown that adversarial training sharpens the weight…
▽ More
Adversarial training is actively studied for learning robust models against adversarial examples. A recent study finds that adversarially trained models degenerate generalization performance on adversarial examples when their weight loss landscape, which is loss changes with respect to weights, is sharp. Unfortunately, it has been experimentally shown that adversarial training sharpens the weight loss landscape, but this phenomenon has not been theoretically clarified. Therefore, we theoretically analyze this phenomenon in this paper. As a first step, this paper proves that adversarial training with the L2 norm constraints sharpens the weight loss landscape in the linear logistic regression model. Our analysis reveals that the sharpness of the weight loss landscape is caused by the noise aligned in the direction of increasing the loss, which is used in adversarial training. We theoretically and experimentally confirm that the weight loss landscape becomes sharper as the magnitude of the noise of adversarial training increases in the linear logistic regression model. Moreover, we experimentally confirm the same phenomena in ResNet18 with softmax as a more general case.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Neural Dynamic Mode Decomposition for End-to-End Modeling of Nonlinear Dynamics
Authors:
Tomoharu Iwata,
Yoshinobu Kawahara
Abstract:
Koopman spectral analysis has attracted attention for understanding nonlinear dynamical systems by which we can analyze nonlinear dynamics with a linear regime by lifting observations using a nonlinear function. For analysis, we need to find an appropriate lift function. Although several methods have been proposed for estimating a lift function based on neural networks, the existing methods train…
▽ More
Koopman spectral analysis has attracted attention for understanding nonlinear dynamical systems by which we can analyze nonlinear dynamics with a linear regime by lifting observations using a nonlinear function. For analysis, we need to find an appropriate lift function. Although several methods have been proposed for estimating a lift function based on neural networks, the existing methods train neural networks without spectral analysis. In this paper, we propose neural dynamic mode decomposition, in which neural networks are trained such that the forecast error is minimized when the dynamics is modeled based on spectral decomposition in the lifted space. With our proposed method, the forecast error is backpropagated through the neural networks and the spectral decomposition, enabling end-to-end learning of Koopman spectral analysis. When information is available on the frequencies or the growth rates of the dynamics, the proposed method can exploit it as regularizers for training. We also propose an extension of our approach when observations are influenced by exogenous control time-series. Our experiments demonstrate the effectiveness of our proposed method in terms of eigenvalue estimation and forecast performance.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
Authors:
Takashi Wada,
Tomoharu Iwata,
Yuji Matsumoto,
Timothy Baldwin,
Jey Han Lau
Abstract:
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a…
▽ More
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common cross-lingual space. We also propose to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on real-world data from endangered languages, namely Yongning Na, Shipibo-Konibo, and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoder-decoder translation model is beneficial for learning cross-lingual representations even in extremely low-resource conditions. Furthermore, our model also works well on high-resource conditions, achieving state-of-the-art performance on a German-English word-alignment task.
△ Less
Submitted 19 October, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Meta-Active Learning for Node Response Prediction in Graphs
Authors:
Tomoharu Iwata
Abstract:
Meta-learning is an important approach to improve machine learning performance with a limited number of observations for target tasks. However, when observations are unbalancedly obtained, it is difficult to improve the performance even with meta-learning methods. In this paper, we propose an active learning method for meta-learning on node response prediction tasks in attributed graphs, where nod…
▽ More
Meta-learning is an important approach to improve machine learning performance with a limited number of observations for target tasks. However, when observations are unbalancedly obtained, it is difficult to improve the performance even with meta-learning methods. In this paper, we propose an active learning method for meta-learning on node response prediction tasks in attributed graphs, where nodes to observe are selected to improve performance with as few observed nodes as possible. With the proposed method, we use models based on graph convolutional neural networks for both predicting node responses and selecting nodes, by which we can predict responses and select nodes even for graphs with unseen response variables. The response prediction model is trained by minimizing the expected test error. The node selection model is trained by maximizing the expected error reduction with reinforcement learning. We demonstrate the effectiveness of the proposed method with 11 types of road congestion prediction tasks.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Few-shot Learning for Spatial Regression
Authors:
Tomoharu Iwata,
Yusuke Tanaka
Abstract:
We propose a few-shot learning method for spatial regression. Although Gaussian processes (GPs) have been successfully used for spatial regression, they require many observations in the target task to achieve a high predictive performance. Our model is trained using spatial datasets on various attributes in various regions, and predicts values on unseen attributes in unseen regions given a few obs…
▽ More
We propose a few-shot learning method for spatial regression. Although Gaussian processes (GPs) have been successfully used for spatial regression, they require many observations in the target task to achieve a high predictive performance. Our model is trained using spatial datasets on various attributes in various regions, and predicts values on unseen attributes in unseen regions given a few observed data. With our model, a task representation is inferred from given small data using a neural network. Then, spatial values are predicted by neural networks with a GP framework, in which task-specific properties are controlled by the task representations. The GP framework allows us to analytically obtain predictions that are adapted to small data. By using the adapted predictions in the objective function, we can train our model efficiently and effectively so that the test predictive performance improves when adapted to newly given small data. In our experiments, we demonstrate that the proposed method achieves better predictive performance than existing meta-learning methods using spatial datasets.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Few-shot Learning for Time-series Forecasting
Authors:
Tomoharu Iwata,
Atsutoshi Kumagai
Abstract:
Time-series forecasting is important for many applications. Forecasting models are usually trained using time-series data in a specific target task. However, sufficient data in the target task might be unavailable, which leads to performance degradation. In this paper, we propose a few-shot learning method that forecasts a future value of a time-series in a target task given a few time-series in t…
▽ More
Time-series forecasting is important for many applications. Forecasting models are usually trained using time-series data in a specific target task. However, sufficient data in the target task might be unavailable, which leads to performance degradation. In this paper, we propose a few-shot learning method that forecasts a future value of a time-series in a target task given a few time-series in the target task. Our model is trained using time-series data in multiple training tasks that are different from target tasks. Our model uses a few time-series to build a forecasting function based on a recurrent neural network with an attention mechanism. With the attention mechanism, we can retrieve useful patterns in a small number of time-series for the current situation. Our model is trained by minimizing an expected test error of forecasting next timestep values. We demonstrate the effectiveness of the proposed method using 90 time-series datasets.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Gaussian Process Regression with Local Explanation
Authors:
Yuya Yoshikawa,
Tomoharu Iwata
Abstract:
Gaussian process regression (GPR) is a fundamental model used in machine learning. Owing to its accurate prediction with uncertainty and versatility in handling various data structures via kernels, GPR has been successfully used in various applications. However, in GPR, how the features of an input contribute to its prediction cannot be interpreted. Herein, we propose GPR with local explanation, w…
▽ More
Gaussian process regression (GPR) is a fundamental model used in machine learning. Owing to its accurate prediction with uncertainty and versatility in handling various data structures via kernels, GPR has been successfully used in various applications. However, in GPR, how the features of an input contribute to its prediction cannot be interpreted. Herein, we propose GPR with local explanation, which reveals the feature contributions to the prediction of each sample, while maintaining the predictive performance of GPR. In the proposed model, both the prediction and explanation for each sample are performed using an easy-to-interpret locally linear model. The weight vector of the locally linear model is assumed to be generated from multivariate Gaussian process priors. The hyperparameters of the proposed models are estimated by maximizing the marginal likelihood. For a new test sample, the proposed model can predict the values of its target variable and weight vector, as well as their uncertainties, in a closed form. Experimental results on various benchmark datasets verify that the proposed model can achieve predictive performance comparable to those of GPR and superior to that of existing interpretable models, and can achieve higher interpretability than them, both quantitatively and qualitatively.
△ Less
Submitted 2 December, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Probabilistic Optimal Transport based on Collective Graphical Models
Authors:
Yasunori Akagi,
Yusuke Tanaka,
Tomoharu Iwata,
Takeshi Kurashima,
Hiroyuki Toda
Abstract:
Optimal Transport (OT) is being widely used in various fields such as machine learning and computer vision, as it is a powerful tool for measuring the similarity between probability distributions and histograms. In previous studies, OT has been defined as the minimum cost to transport probability mass from one probability distribution to another. In this study, we propose a new framework in which…
▽ More
Optimal Transport (OT) is being widely used in various fields such as machine learning and computer vision, as it is a powerful tool for measuring the similarity between probability distributions and histograms. In previous studies, OT has been defined as the minimum cost to transport probability mass from one probability distribution to another. In this study, we propose a new framework in which OT is considered as a maximum a posteriori (MAP) solution of a probabilistic generative model. With the proposed framework, we show that OT with entropic regularization is equivalent to maximizing a posterior probability of a probabilistic model called Collective Graphical Model (CGM), which describes aggregated statistics of multiple samples generated from a graphical model. Interpreting OT as a MAP solution of a CGM has the following two advantages: (i) We can calculate the discrepancy between noisy histograms by modeling noise distributions. Since various distributions can be used for noise modeling, it is possible to select the noise distribution flexibly to suit the situation. (ii) We can construct a new method for interpolation between histograms, which is an important application of OT. The proposed method allows for intuitive modeling based on the probabilistic interpretations, and a simple and efficient estimation algorithm is available. Experiments using synthetic and real-world spatio-temporal population datasets show the effectiveness of the proposed interpolation method.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Neural Generators of Sparse Local Linear Models for Achieving both Accuracy and Interpretability
Authors:
Yuya Yoshikawa,
Tomoharu Iwata
Abstract:
For reliability, it is important that the predictions made by machine learning methods are interpretable by human. In general, deep neural networks (DNNs) can provide accurate predictions, although it is difficult to interpret why such predictions are obtained by DNNs. On the other hand, interpretation of linear models is easy, although their predictive performance would be low since real-world da…
▽ More
For reliability, it is important that the predictions made by machine learning methods are interpretable by human. In general, deep neural networks (DNNs) can provide accurate predictions, although it is difficult to interpret why such predictions are obtained by DNNs. On the other hand, interpretation of linear models is easy, although their predictive performance would be low since real-world data is often intrinsically non-linear. To combine both the benefits of the high predictive performance of DNNs and high interpretability of linear models into a single model, we propose neural generators of sparse local linear models (NGSLLs). The sparse local linear models have high flexibility as they can approximate non-linear functions. The NGSLL generates sparse linear weights for each sample using DNNs that take original representations of each sample (e.g., word sequence) and their simplified representations (e.g., bag-of-words) as input. By extracting features from the original representations, the weights can contain rich information to achieve high predictive performance. Additionally, the prediction is interpretable because it is obtained by the inner product between the simplified representations and the sparse weights, where only a small number of weights are selected by our gate module in the NGSLL. In experiments with real-world datasets, we demonstrate the effectiveness of the NGSLL quantitatively and qualitatively by evaluating prediction performance and visualizing generated weights on image and text classification tasks.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Semi-supervised Anomaly Detection on Attributed Graphs
Authors:
Atsutoshi Kumagai,
Tomoharu Iwata,
Yasuhiro Fujiwara
Abstract:
We propose a simple yet effective method for detecting anomalous instances on an attribute graph with label information of a small number of instances. Although with standard anomaly detection methods it is usually assumed that instances are independent and identically distributed, in many real-world applications, instances are often explicitly connected with each other, resulting in so-called att…
▽ More
We propose a simple yet effective method for detecting anomalous instances on an attribute graph with label information of a small number of instances. Although with standard anomaly detection methods it is usually assumed that instances are independent and identically distributed, in many real-world applications, instances are often explicitly connected with each other, resulting in so-called attributed graphs. The proposed method embeds nodes (instances) on the attributed graph in the latent space by taking into account their attributes as well as the graph structure based on graph convolutional networks (GCNs). To learn node embeddings specialized for anomaly detection, in which there is a class imbalance due to the rarity of anomalies, the parameters of a GCN are trained to minimize the volume of a hypersphere that encloses the node embeddings of normal instances while embedding anomalous ones outside the hypersphere. This enables us to detect anomalies by simply calculating the distances between the node embeddings and hypersphere center. The proposed method can effectively propagate label information on a small amount of nodes to unlabeled ones by taking into account the node's attributes, graph structure, and class imbalance. In experiments with five real-world attributed graph datasets, we demonstrate that the proposed method achieves better performance than various existing anomaly detection methods.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Efficient Transfer Bayesian Optimization with Auxiliary Information
Authors:
Tomoharu Iwata,
Takuma Otsuka
Abstract:
We propose an efficient transfer Bayesian optimization method, which finds the maximum of an expensive-to-evaluate black-box function by using data on related optimization tasks. Our method uses auxiliary information that represents the task characteristics to effectively transfer knowledge for estimating a distribution over target functions. In particular, we use a Gaussian process, in which the…
▽ More
We propose an efficient transfer Bayesian optimization method, which finds the maximum of an expensive-to-evaluate black-box function by using data on related optimization tasks. Our method uses auxiliary information that represents the task characteristics to effectively transfer knowledge for estimating a distribution over target functions. In particular, we use a Gaussian process, in which the mean and covariance functions are modeled with neural networks that simultaneously take both the auxiliary information and feature vectors as input. With a neural network mean function, we can estimate the target function even without evaluations. By using the neural network covariance function, we can extract nonlinear correlation among feature vectors that are shared across related tasks. Our Gaussian process-based formulation not only enables an analytic calculation of the posterior distribution but also swiftly adapts the target function to observations. Our method is also advantageous because the computational costs scale linearly with the number of source tasks. Through experiments using a synthetic dataset and datasets for finding the optimal pedestrian traffic regulations and optimal machine learning algorithms, we demonstrate that our method identifies the optimal points with fewer target function evaluations than existing methods.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Anomaly Detection with Inexact Labels
Authors:
Tomoharu Iwata,
Machiko Toyoda,
Shotaro Tora,
Naonori Ueda
Abstract:
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which…
▽ More
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e.g., autoencoders. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs
Authors:
Yusuke Tanaka,
Toshiyuki Tanaka,
Tomoharu Iwata,
Takeshi Kurashima,
Maya Okawa,
Yasunori Akagi,
Hiroyuki Toda
Abstract:
We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models can only utilize the sufficiently fine-grained auxiliary data sets on the same domain (e.g., a city). With the proposed model, the functions for respective areal d…
▽ More
We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models can only utilize the sufficiently fine-grained auxiliary data sets on the same domain (e.g., a city). With the proposed model, the functions for respective areal data sets are assumed to be a multivariate dependent Gaussian process (GP) that is modeled as a linear mixing of independent latent GPs. Sharing of latent GPs across multiple areal data sets allows us to effectively estimate the spatial correlation for each areal data set; moreover it can easily be extended to transfer learning across multiple domains. To handle the multivariate areal data, we design an observation model with a spatial aggregation process for each areal data set, which is an integral of the mixed GP over the corresponding region. By deriving the posterior GP, we can predict the data value at any location point by considering the spatial correlations and the dependences between areal data sets, simultaneously. Our experiments on real-world data sets demonstrate that our model can 1) accurately refine coarse-grained areal data, and 2) offer performance improvements by using the areal data sets from multiple domains.
△ Less
Submitted 7 January, 2020; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information
Authors:
Maya Okawa,
Tomoharu Iwata,
Takeshi Kurashima,
Yusuke Tanaka,
Hiroyuki Toda,
Naonori Ueda
Abstract:
Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the considerat…
▽ More
Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic. In this paper, we propose \textsf{DMPP} (Deep Mixture Point Processes), a point process model for predicting spatio-temporal events with the use of rich contextual information; a key advance is its incorporation of the heterogeneous and high-dimensional context available in image and text data. Specifically, we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network. This formulation allows us to automatically learn the complex nonlinear effects of the contextual factors on event occurrence. At the same time, this formulation makes analytical integration over the intensity, which is required for point process estimation, tractable. We use real-world data sets from different domains to demonstrate that DMPP has better predictive performance than existing methods.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Supervised Anomaly Detection based on Deep Autoregressive Density Estimators
Authors:
Tomoharu Iwata,
Yuki Yamanaka
Abstract:
We propose a supervised anomaly detection method based on neural density estimators, where the negative log likelihood is used for the anomaly score. Density estimators have been widely used for unsupervised anomaly detection. By the recent advance of deep learning, the density estimation performance has been greatly improved. However, the neural density estimators cannot exploit anomaly label inf…
▽ More
We propose a supervised anomaly detection method based on neural density estimators, where the negative log likelihood is used for the anomaly score. Density estimators have been widely used for unsupervised anomaly detection. By the recent advance of deep learning, the density estimation performance has been greatly improved. However, the neural density estimators cannot exploit anomaly label information, which would be valuable for improving the anomaly detection performance. The proposed method effectively utilizes the anomaly label information by training the neural density estimator so that the likelihood of normal instances is maximized and the likelihood of anomalous instances is lower than that of the normal instances. We employ an autoregressive model for the neural density estimator, which enables us to calculate the likelihood exactly. With the experiments using 16 datasets, we demonstrate that the proposed method improves the anomaly detection performance with a few labeled anomalous instances, and achieves better performance than existing unsupervised and supervised anomaly detection methods.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Autoencoding Binary Classifiers for Supervised Anomaly Detection
Authors:
Yuki Yamanaka,
Tomoharu Iwata,
Hiroshi Takahashi,
Masanori Yamada,
Sekitoshi Kanai
Abstract:
We propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects the known anomalies included in training data, but it cannot detect the unknown anomalies. Meanwhile, the unsupervised approach can detect both known and u…
▽ More
We propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects the known anomalies included in training data, but it cannot detect the unknown anomalies. Meanwhile, the unsupervised approach can detect both known and unknown anomalies that are located away from normal data points. However, it does not detect known anomalies as accurately as the supervised approach. Furthermore, even if we have labeled normal data points and anomalies, the unsupervised approach cannot utilize these labels. The ABC is a probabilistic binary classifier that effectively exploits the label information, where normal data points are modeled using the AE as a component. By maximizing the likelihood, the AE in the proposed ABC is trained to minimize the reconstruction error for normal data points, and to maximize it for known anomalies. Since our approach becomes able to reconstruct the normal data points accurately and fails to reconstruct the known and unknown anomalies, it can accurately discriminate both known and unknown anomalies from normal data points. Experimental results show that the ABC achieves higher detection performance than existing supervised and unsupervised methods.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Finding Appropriate Traffic Regulations via Graph Convolutional Networks
Authors:
Tomoharu Iwata,
Takuma Otsuka,
Hitoshi Shimizu,
Hiroshi Sawada,
Futoshi Naya,
Naonori Ueda
Abstract:
Appropriate traffic regulations, e.g. planned road closure, are important in congested events. Crowd simulators have been used to find appropriate regulations by simulating multiple scenarios with different regulations. However, this approach requires multiple simulation runs, which are time-consuming. In this paper, we propose a method to learn a function that outputs regulation effects given the…
▽ More
Appropriate traffic regulations, e.g. planned road closure, are important in congested events. Crowd simulators have been used to find appropriate regulations by simulating multiple scenarios with different regulations. However, this approach requires multiple simulation runs, which are time-consuming. In this paper, we propose a method to learn a function that outputs regulation effects given the current traffic situation as inputs. If the function is learned using the training data of many simulation runs in advance, we can obtain an appropriate regulation efficiently by bypassing simulations for the current situation. We use the graph convolutional networks for modeling the function, which enable us to find regulations even for unseen areas. With the proposed method, we construct a graph for each area, where a node represents a road, and an edge represents the road connection. By running crowd simulations with various regulations on various areas, we generate traffic situations and regulation effects. The graph convolutional networks are trained to output the regulation effects given the graph with the traffic situation information as inputs. With experiments using real-world road networks and a crowd simulator, we demonstrate that the proposed method can find a road to close that reduces the average time needed to reach the destination.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Unsupervised Object Matching for Relational Data
Authors:
Tomoharu Iwata,
Naonori Ueda
Abstract:
We propose an unsupervised object matching method for relational data, which finds matchings between objects in different relational datasets without correspondence information. For example, the proposed method matches documents in different languages in multi-lingual document-word networks without dictionaries nor alignment information. The proposed method assumes that each object has latent vect…
▽ More
We propose an unsupervised object matching method for relational data, which finds matchings between objects in different relational datasets without correspondence information. For example, the proposed method matches documents in different languages in multi-lingual document-word networks without dictionaries nor alignment information. The proposed method assumes that each object has latent vectors, and the probability of neighbor objects is modeled by the inner-product of the latent vectors, where the neighbors are generated by short random walks over the relations. The latent vectors are estimated by maximizing the likelihood of the neighbors for each dataset. The estimated latent vectors contain hidden structural information of each object in the given relational dataset. Then, the proposed method linearly projects the latent vectors for all the datasets onto a common latent space shared across all datasets by matching the distributions while preserving the structural information. The projection matrix is estimated by minimizing the distance between the latent vector distributions with an orthogonality regularizer. To represent the distributions effectively, we use the kernel embedding of distributions that hold high-order moment information about a distribution as an element in a reproducing kernel Hilbert space, which enables us to calculate the distance between the distributions without density estimation. The structural information encoded in the latent vectors are preserved by using the orthogonality regularizer. We demonstrate the effectiveness of the proposed method with experiments using real-world multi-lingual document-word relational datasets and multiple user-item relational datasets.
△ Less
Submitted 26 December, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.