Search | arXiv e-print repository

arXiv:2405.19704 [pdf, other]

Enhancing Sufficient Dimension Reduction via Hellinger Correlation

Authors: Seungbeom Hong, Ilmun Kim, Jun Song

Abstract: In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting th… ▽ More In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting the dimension reduction subspace, complete with theoretical justification. Through extensive numerical experiments, we demonstrate that our proposed method significantly enhances and outperforms existing SDR methods. This improvement is largely attributed to our proposed method's deeper understanding of data dependencies and the refinement of existing SDR techniques. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.03272 [pdf, other]

Cryptographic Hardness of Score Estimation

Authors: Min Jae Song

Abstract: We show that $L^2$-accurate score estimation, in the absence of strong assumptions on the data distribution, is computationally hard even when sample complexity is polynomial in the relevant problem parameters. Our reduction builds on the result of Chen et al. (ICLR 2023), who showed that the problem of generating samples from an unknown data distribution reduces to $L^2$-accurate score estimation… ▽ More We show that $L^2$-accurate score estimation, in the absence of strong assumptions on the data distribution, is computationally hard even when sample complexity is polynomial in the relevant problem parameters. Our reduction builds on the result of Chen et al. (ICLR 2023), who showed that the problem of generating samples from an unknown data distribution reduces to $L^2$-accurate score estimation. Our hard-to-estimate distributions are the "Gaussian pancakes" distributions, originally due to Diakonikolas et al. (FOCS 2017), which have been shown to be computationally indistinguishable from the standard Gaussian under widely believed hardness assumptions from lattice-based cryptography (Bruna et al., STOC 2021; Gupte et al., FOCS 2022). △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 28 pages

arXiv:2403.13118 [pdf, other]

Modal Analysis of Spatiotemporal Data via Multivariate Gaussian Process Regression

Authors: Jiwoo Song, Daning Huang

Abstract: Modal analysis has become an essential tool to understand the coherent structure of complex flows. The classical modal analysis methods, such as dynamic mode decomposition (DMD) and spectral proper orthogonal decomposition (SPOD), rely on a sufficient amount of data that is regularly sampled in time. However, often one needs to deal with sparse temporally irregular data, e.g., due to experimental… ▽ More Modal analysis has become an essential tool to understand the coherent structure of complex flows. The classical modal analysis methods, such as dynamic mode decomposition (DMD) and spectral proper orthogonal decomposition (SPOD), rely on a sufficient amount of data that is regularly sampled in time. However, often one needs to deal with sparse temporally irregular data, e.g., due to experimental measurements and simulation algorithm. To overcome the limitations of data scarcity and irregular sampling, we propose a novel modal analysis technique using multi-variate Gaussian process regression (MVGPR). We first establish the connection between MVGPR and the existing modal analysis techniques, DMD and SPOD, from a linear system identification perspective. Next, leveraging this connection, we develop a MVGPR-based modal analysis technique that addresses the aforementioned limitations. The capability of MVGPR is endowed by its judiciously designed kernel structure for correlation function, that is derived from the assumed linear dynamics. Subsequently, the proposed MVGPR method is benchmarked against DMD and SPOD on a range of examples, from academic and synthesized data to unsteady airfoil aerodynamics. The results demonstrate MVGPR as a promising alternative to classical modal analysis methods, especially in the scenario of scarce and temporally irregular data. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 43 pages, 35 figures

arXiv:2402.15734 [pdf, other]

Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

Authors: Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, Michael W. Mahoney

Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domainspecific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding t… ▽ More Recent years have witnessed the promise of coupling machine learning methods and physical domainspecific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding these expensive simulations. In this work, seeking data efficiency, we design unsupervised pretraining for PDE operator learning. To reduce the need for training data with heavy simulation costs, we mine unlabeled PDE data without simulated solutions, and pretrain neural operators with physics-inspired reconstruction-based proxy tasks. To improve out-of-distribution performance, we further assist neural operators in flexibly leveraging in-context learning methods, without incurring extra training costs or designs. Extensive empirical evaluations on a diverse set of PDEs demonstrate that our method is highly data-efficient, more generalizable, and even outperforms conventional vision-pretrained models. △ Less

Submitted 13 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2310.13911 [pdf, other]

Multilevel Matrix Factor Model

Authors: Yuteng Zhang, Yongchang Hui, Junrong Song, Shurong Zheng

Abstract: Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix t… ▽ More Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix time series. The estimation procedures can consistently estimate the factor loadings and determine the number of factors. We establish the asymptotic properties of the estimators. The simulation is presented to illustrate the performance of the proposed estimation method. We utilize the model to analyze eight indicators across 200 stocks from ten distinct industries, demonstrating the empirical utility of our proposed approach. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: 47 pages, 22 figures

arXiv:2310.10232 [pdf, other]

Efficient seismic reliability and fragility analysis of lifeline networks using subset simulation

Authors: Dongkyu Lee, Ziqi Wang, Junho Song

Abstract: Various simulation-based and analytical methods have been developed to evaluate the seismic fragilities of individual structures. However, a community's seismic safety and resilience are substantially affected by network reliability, determined not only by component fragilities but also by network topology and commodity/information flows. However, seismic reliability analyses of networks often enc… ▽ More Various simulation-based and analytical methods have been developed to evaluate the seismic fragilities of individual structures. However, a community's seismic safety and resilience are substantially affected by network reliability, determined not only by component fragilities but also by network topology and commodity/information flows. However, seismic reliability analyses of networks often encounter significant challenges due to complex network topologies, interdependencies among ground motions, and low failure probabilities. This paper proposes to overcome these challenges by a variance-reduction method for network fragility analysis using subset simulation. The binary network limit-state function in the subset simulation is reformulated into more informative piecewise continuous functions. The proposed limit-state functions quantify the proximity of each sample to a potential network failure domain, thereby enabling the construction of specialized intermediate failure events, which can be utilized in subset simulation and other sequential Monte Carlo approaches. Moreover, by discovering an implicit connection between intermediate failure events and seismic intensity, we propose a technique to obtain the entire network fragility curve with a single execution of specialized subset simulation. Numerical examples demonstrate that the proposed method can effectively evaluate system-level fragility for large-scale networks. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.04861 [pdf, other]

Uncovering hidden geometry in Transformers via disentangling position and context

Authors: Jiajun Song, Yiqiao Zhong

Abstract: Transformers are widely used to extract semantic meanings from input tokens, yet they usually operate as black-box models. In this paper, we present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For any layer, embedding vectors of input sequence samples are represented by a tensor… ▽ More Transformers are widely used to extract semantic meanings from input tokens, yet they usually operate as black-box models. In this paper, we present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For any layer, embedding vectors of input sequence samples are represented by a tensor $\boldsymbol{h} \in \mathbb{R}^{C \times T \times d}$. Given embedding vector $\boldsymbol{h}_{c,t} \in \mathbb{R}^d$ at sequence position $t \le T$ in a sequence (or context) $c \le C$, extracting the mean effects yields the decomposition \[ \boldsymbol{h}_{c,t} = \boldsymbolμ + \mathbf{pos}_t + \mathbf{ctx}_c + \mathbf{resid}_{c,t} \] where $\boldsymbolμ$ is the global mean vector, $\mathbf{pos}_t$ and $\mathbf{ctx}_c$ are the mean vectors across contexts and across positions respectively, and $\mathbf{resid}_{c,t}$ is the residual vector. For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure: (1) $(\mathbf{pos}_t)_{t}$ forms a low-dimensional, continuous, and often spiral shape across layers, (2) $(\mathbf{ctx}_c)_c$ shows clear cluster structure that falls into context topics, and (3) $(\mathbf{pos}_t)_{t}$ and $(\mathbf{ctx}_c)_c$ are mutually nearly orthogonal. We argue that smoothness is pervasive and beneficial to transformers trained on languages, and our decomposition leads to improved model interpretability. △ Less

Submitted 3 February, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: 38 pages, 34 figures

arXiv:2307.13371 [pdf, other]

Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

Authors: Fengxue Zhang, Jialin Song, James Bowden, Alexander Ladd, Yisong Yue, Thomas A. Desautels, Yuxin Chen

Abstract: We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a… ▽ More We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2305.04391 [pdf, other]

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Authors: Morteza Mardani, Jiaming Song, Jan Kautz, Arash Vahdat

Abstract: Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This i… ▽ More Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models. △ Less

Submitted 29 September, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2304.13836 [pdf, other]

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

Authors: Junhwa Song, Keumgang Cha, Junghoon Seo

Abstract: Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of resilient techniques for performance benchmarking constitutes a critical concern in the sphere of explainable deep learning. This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure,… ▽ More Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of resilient techniques for performance benchmarking constitutes a critical concern in the sphere of explainable deep learning. This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates. The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks, contradicting the original intent of ROAR. This occurrence is similarly observed in the recently introduced variant RemOve-And-Debias (ROAD), and we posit a persistent pattern of blurriness bias in ROAR attribution metrics. Our findings serve as a warning against indiscriminate use on ROAR metrics. The code is available as open source. △ Less

Submitted 10 May, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Code: https://github.com/SIAnalytics/roar

arXiv:2304.06252 [pdf, other]

doi 10.1016/j.strusafe.2023.102404

Adaptive active subspace-based metamodeling for high-dimensional reliability analysis

Authors: Jungho Kim, Ziqi Wang, Junho Song

Abstract: To address the challenges of reliability analysis in high-dimensional probability spaces, this paper proposes a new metamodeling method that couples active subspace, heteroscedastic Gaussian process, and active learning. The active subspace is leveraged to identify low-dimensional salient features of a high-dimensional computational model. A surrogate computational model is built in the low-dimens… ▽ More To address the challenges of reliability analysis in high-dimensional probability spaces, this paper proposes a new metamodeling method that couples active subspace, heteroscedastic Gaussian process, and active learning. The active subspace is leveraged to identify low-dimensional salient features of a high-dimensional computational model. A surrogate computational model is built in the low-dimensional feature space by a heteroscedastic Gaussian process. Active learning adaptively guides the surrogate model training toward the critical region that significantly contributes to the failure probability. A critical trait of the proposed method is that the three main ingredients-active subspace, heteroscedastic Gaussian process, and active learning-are coupled to adaptively optimize the feature space map** in conjunction with the surrogate modeling. This coupling empowers the proposed method to accurately solve nontrivial high-dimensional reliability problems via low-dimensional surrogate modeling. Finally, numerical examples of a high-dimensional nonlinear function and structural engineering applications are investigated to verify the performance of the proposed method. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2302.07400 [pdf, other]

Score-based Diffusion Models in Function Space

Authors: Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar

Abstract: Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many… ▽ More Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many domains where the data has a functional form such as in scientific computing and 3D geometric data analysis. In this work, we introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by integrating a function-valued Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF). △ Less

Submitted 22 November, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 52 pages

MSC Class: 46B09 (Primary); 60J22 (Secondary) ACM Class: I.2.6; J.2

arXiv:2210.15651 [pdf, other]

Learning Single-Index Models with Shallow Neural Networks

Authors: Alberto Bietti, Joan Bruna, Clayton Sanford, Min Jae Song

Abstract: Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recover… ▽ More Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recovering the relevant (one-dimensional) subspace, are well-understood, they rely on tailored algorithms that exploit the specific structure of the target function. In this work, we introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow. More precisely, we consider shallow networks in which biases of the neurons are frozen at random initialization. We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 76 pages. To appear at NeurIPS 2022

arXiv:2206.13035 [pdf, other]

A General Recipe for Likelihood-free Bayesian Optimization

Authors: Jiaming Song, Lantao Yu, Willie Neiswanger, Stefano Ermon

Abstract: The acquisition function, a critical component in Bayesian optimization (BO), can often be written as the expectation of a utility function under a surrogate model. However, to ensure that acquisition functions are tractable to optimize, restrictions must be placed on the surrogate model and utility function. To extend BO to a broader class of models and utilities, we propose likelihood-free BO (L… ▽ More The acquisition function, a critical component in Bayesian optimization (BO), can often be written as the expectation of a utility function under a surrogate model. However, to ensure that acquisition functions are tractable to optimize, restrictions must be placed on the surrogate model and utility function. To extend BO to a broader class of models and utilities, we propose likelihood-free BO (LFBO), an approach based on likelihood-free inference. LFBO directly models the acquisition function without having to separately perform inference with a probabilistic surrogate model. We show that computing the acquisition function in LFBO can be reduced to optimizing a weighted classification problem, where the weights correspond to the utility being chosen. By choosing the utility function for expected improvement (EI), LFBO outperforms various state-of-the-art black-box optimization methods on several real-world optimization problems. LFBO can also effectively leverage composite structures of the objective function, which further improves its regret by several orders of magnitude. △ Less

Submitted 6 October, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: ICML 2022. This version fixes a typo in eq 33

arXiv:2206.05253 [pdf, other]

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

Authors: Zhi-Qi Cheng, Qi Dai, Hong Li, **gKuan Song, Xiao Wu, Alexander G. Hauptmann

Abstract: Previous work generally believes that improving the spatial invariance of convolutional networks is the key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level spatial invariance would cause overfit noise in the density map generation. In this paper, we try to use locally connected Gaussian kernels to replace the original… ▽ More Previous work generally believes that improving the spatial invariance of convolutional networks is the key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level spatial invariance would cause overfit noise in the density map generation. In this paper, we try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. The purpose of this is to allow the feature extraction process to potentially stimulate the density map generation process to overcome the annotation noise. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our work points a new direction for follow-up research, which should investigate how to properly relax the overly strict pixel-level spatial invariance for object counting. We evaluate our methods on 4 mainstream object counting networks (i.e., MCNN, CSRNet, SANet, and ResNet-50). Extensive experiments were conducted on 7 popular benchmarks for 3 applications (i.e., crowd, vehicle, and plant counting). Experimental results show that our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects. △ Less

Submitted 18 August, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: Accepted to CVPR 2022, Code: https://github.com/zhiqic/Rethinking-Counting

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2112.03898 [pdf, ps, other]

Lattice-Based Methods Surpass Sum-of-Squares in Clustering

Authors: Ilias Zadik, Min Jae Song, Alexander S. Wein, Joan Bruna

Abstract: Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the cl… ▽ More Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is statistically possible but no polynomial-time algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model does not exhibit a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of d+1 samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps. △ Less

Submitted 7 January, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: Added a new tight information-theoretic lower bound for label recovery

arXiv:2107.03502 [pdf, other]

CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

Authors: Yusuke Tashiro, Jiaming Song, Yang Song, Stefano Ermon

Abstract: The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In… ▽ More The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI. △ Less

Submitted 27 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: NeurIPS 2021

arXiv:2107.02146 [pdf, other]

doi 10.1371/journal.pone.0265940

Multivariate functional group sparse regression: functional predictor selection

Authors: Ali Mahzarnia, Jun Song

Abstract: In this paper, we propose methods for functional predictor selection and the estimation of smooth functional coefficients simultaneously in a scalar-on-function regression problem under high-dimensional multivariate functional data setting. In particular, we develop two methods for functional group-sparse regression under a generic Hilbert space of infinite dimension. We show the convergence of al… ▽ More In this paper, we propose methods for functional predictor selection and the estimation of smooth functional coefficients simultaneously in a scalar-on-function regression problem under high-dimensional multivariate functional data setting. In particular, we develop two methods for functional group-sparse regression under a generic Hilbert space of infinite dimension. We show the convergence of algorithms and the consistency of the estimation and the selection (oracle property) under infinite-dimensional Hilbert spaces. Simulation studies show the effectiveness of the methods in both the selection and the estimation of functional coefficients. The applications to the functional magnetic resonance imaging (fMRI) reveal the regions of the human brain related to ADHD and IQ. △ Less

Submitted 8 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: The R package that is developed for this paper is available at GitHub. See https://github.com/Ali-Mahzarnia/MFSGrp

arXiv:2106.10744 [pdf, other]

On the Cryptographic Hardness of Learning Single Periodic Neurons

Authors: Min Jae Song, Ilias Zadik, Joan Bruna

Abstract: We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice p… ▽ More We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.'21) and phase retrieval with an optimal sample complexity of $d+1$ samples. In the former case, this improves upon the quadratic-in-$d$ sample complexity required in (Bruna et al.'21). △ Less

Submitted 16 September, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

Comments: 64 pages. Added more references, and a proof of the sample complexity lower bound

arXiv:2106.09246 [pdf, other]

Federated CycleGAN for Privacy-Preserving Image-to-Image Translation

Authors: Joonyoung Song, Jong Chul Ye

Abstract: Unsupervised image-to-image translation methods such as CycleGAN learn to convert images from one domain to another using unpaired training data sets from different domains. Unfortunately, these approaches still require centrally collected unpaired records, potentially violating privacy and security issues. Although the recent federated learning (FL) allows a neural network to be trained without d… ▽ More Unsupervised image-to-image translation methods such as CycleGAN learn to convert images from one domain to another using unpaired training data sets from different domains. Unfortunately, these approaches still require centrally collected unpaired records, potentially violating privacy and security issues. Although the recent federated learning (FL) allows a neural network to be trained without data exchange, the basic assumption of the FL is that all clients have their own training data from a similar domain, which is different from our image-to-image translation scenario in which each client has images from its unique domain and the goal is to learn image translation between different domains without accessing the target domain data. To address this, here we propose a novel federated CycleGAN architecture that can learn image translation in an unsupervised manner while maintaining the data privacy. Specifically, our approach arises from a novel observation that CycleGAN loss can be decomposed into the sum of client specific local objectives that can be evaluated using only their data. This local objective decomposition allows multiple clients to participate in federated CycleGAN training without sacrificing performance. Furthermore, our method employs novel switchable generator and discriminator architecture using Adaptive Instance Normalization (AdaIN) that significantly reduces the band-width requirement of the federated learning. Our experimental results on various unsupervised image translation tasks show that our federated CycleGAN provides comparable performance compared to the non-federated counterpart. △ Less

Submitted 17 June, 2021; originally announced June 2021.

arXiv:2010.12810 [pdf, other]

Autoregressive Score Matching

Authors: Chenlin Meng, Lantao Yu, Yang Song, Jiaming Song, Stefano Ermon

Abstract: Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-condit… ▽ More Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-conditionals (scores), which need not be normalized. To train AR-CSM, we introduce a new divergence between distributions named Composite Score Matching (CSM). For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. Compared to previous score matching algorithms, our method is more scalable to high dimensional data and more stable to optimize. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders. △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020

arXiv:2010.09808 [pdf, other]

Imitation with Neural Density Models

Authors: Kuno Kim, Akshat **dal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon

Abstract: We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We prese… ▽ More We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2009.07368 [pdf, other]

Evaluating representations by the complexity of learning low-loss predictors

Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to… ▽ More We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to prior methods, which measure the amount of information about the optimal predictor that is present in a specific amount of data, our methods measure the amount of information needed from the data to recover an approximation of the optimal predictor up to a specified tolerance. We present a framework to compare these methods based on plotting the validation loss versus evaluation dataset size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length probes, correspond to slices and integrals along the data axis of the loss-data curve, while ours correspond to slices and integrals along the loss axis. We provide experiments on real data to compare the behavior of each of these methods over datasets of varying size along with a high performance open source library for representation evaluation at https://github.com/willwhitney/reprieve. △ Less

Submitted 5 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:2008.09643 [pdf, ps, other]

Privacy Preserving Recalibration under Domain Shift

Authors: Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

Abstract: Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training dat… ▽ More Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training data, and privacy preservation is paramount (e.g. protecting patient records). We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. This framework allows us to adapt existing recalibration algorithms to satisfy differential privacy while remaining effective for domain-shift situations. Guided by our framework, we also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets. In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy. On the 15 highest severity perturbations of the ImageNet-C dataset, our method achieves a median ECE of 0.029, over 2x better than the next best recalibration method and almost 5x better than without recalibration. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2008.07902 [pdf, other]

Bayesian geoacoustic inversion using mixture density network

Authors: Guoli Wu, Hefeng Dong, Junqiang Song, **gya Zhang

Abstract: Bayesian geoacoustic inversion problems are conventionally solved by Markov chain Monte Carlo methods or its variants, which are computationally expensive. This paper extends the classic Bayesian geoacoustic inversion framework by deriving important geoacoustic statistics of Bayesian geoacoustic inversion from the multidimensional posterior probability density (PPD) using the mixture density netwo… ▽ More Bayesian geoacoustic inversion problems are conventionally solved by Markov chain Monte Carlo methods or its variants, which are computationally expensive. This paper extends the classic Bayesian geoacoustic inversion framework by deriving important geoacoustic statistics of Bayesian geoacoustic inversion from the multidimensional posterior probability density (PPD) using the mixture density network (MDN) theory. These statistics make it convenient to train the network directly on the whole parameter space and get the multidimensional PPD of model parameters. The present approach provides a much more efficient way to solve geoacoustic inversion problems in Bayesian inference framework. The network is trained on a simulated dataset of surface-wave dispersion curves with shear-wave velocities as labels and tested on both synthetic and real data cases. The results show that the network gives reliable predictions and has good generalization performance on unseen data. Once trained, the network can rapidly (within seconds) give a fully probabilistic solution which is comparable to Monte Carlo methods. It provides an promising approach for real-time inversion. △ Less

Submitted 16 January, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

arXiv:2008.02365 [pdf, other]

Sequential change point test in the presence of outliers: the density power divergence based approach

Authors: Junmo Song

Abstract: In this study, we consider a problem of monitoring parameter changes particularly in the presence of outliers. To propose a sequential procedure that is robust against outliers, we use the density power divergence to derive a detector and stop** time that make up our procedure. We first investigate the asymptotic properties of our sequential procedure for i.i.d. sequences, and then extend the pr… ▽ More In this study, we consider a problem of monitoring parameter changes particularly in the presence of outliers. To propose a sequential procedure that is robust against outliers, we use the density power divergence to derive a detector and stop** time that make up our procedure. We first investigate the asymptotic properties of our sequential procedure for i.i.d. sequences, and then extend the proposed procedure to stationary time series models, where we provide a set of sufficient conditions under which the proposed procedure has an asymptotically controlled size and consistency in power. As an application, our procedure is applied to the GARCH models. We demonstrate the validity and robustness of the proposed procedure through a simulation study. Finally, two real data analyses are provided to illustrate the usefulness of the proposed sequential procedure. △ Less

Submitted 27 June, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

arXiv:2007.09852 [pdf, other]

Multi-label Contrastive Predictive Coding

Authors: Jiaming Song, Stefano Ermon

Abstract: Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal di… ▽ More Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by $\log m$, and could thus severely underestimate unless $m$ is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat a current state-of-the-art knowledge distillation method over 10 out of 13 tasks. △ Less

Submitted 2 December, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

Comments: Post camera-ready version. Reorganized the theorems in the last version as corollaries of more general theorems

arXiv:2007.00295 [pdf, ps, other]

Belief Propagation Neural Networks

Authors: Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon

Abstract: Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP… ▽ More Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while providing tighter bounds. On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.07815 [pdf, other]

Optimistic Distributionally Robust Policy Optimization

Authors: Jun Song, Chaoyue Zhao

Abstract: Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), as the widely employed policy based reinforcement learning (RL) methods, are prone to converge to a sub-optimal solution as they limit the policy representation to a particular parametric distribution class. To address this issue, we develop an innovative Optimistic Distributionally Robust Policy Optimization (ODRPO) a… ▽ More Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), as the widely employed policy based reinforcement learning (RL) methods, are prone to converge to a sub-optimal solution as they limit the policy representation to a particular parametric distribution class. To address this issue, we develop an innovative Optimistic Distributionally Robust Policy Optimization (ODRPO) algorithm, which effectively utilizes Optimistic Distributionally Robust Optimization (DRO) approach to solve the trust region constrained optimization problem without parameterizing the policies. Our algorithm improves TRPO and PPO with a higher sample efficiency and a better performance of the final policy while attaining the learning stability. Moreover, it achieves a globally optimal policy update that is not promised in the prevailing policy based RL algorithms. Experiments across tabular domains and robotic locomotion tasks demonstrate the effectiveness of our approach. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2006.07363 [pdf, other]

Analysis, Design, and Generalization of Electrochemical Impedance Spectroscopy (EIS) Inversion Algorithms

Authors: Surya Effendy, Juhyun Song, Martin Z. Bazant

Abstract: We introduce a framework for analyzing and designing EIS inversion algorithms. Our framework stems from the observation of four features common to well-defined EIS inversion algorithms, namely (1) the representation of unknown distributions, (2) the minimization of a metric of error to estimate parameters arising from the chosen representation, subject to constraints on (3) the complexity control… ▽ More We introduce a framework for analyzing and designing EIS inversion algorithms. Our framework stems from the observation of four features common to well-defined EIS inversion algorithms, namely (1) the representation of unknown distributions, (2) the minimization of a metric of error to estimate parameters arising from the chosen representation, subject to constraints on (3) the complexity control parameters, and (4) a means for choosing optimal control parameter values. These features must be present to overcome the ill-posed nature of EIS inversion problems. We review three established EIS inversion algorithms to illustrate the pervasiveness of these features, and show the utility of the framework by resolving ambiguities concerning three more algorithms. Our framework is then used to design the generalized EIS inversion (gEISi) algorithm, which uses Gaussian basis function representation, modality control parameter, and cross-validation for choosing the optimal control parameter value. The gEISi algorithm is applicable to the generalized EIS inversion problem, which allows for a wider range of underlying models. We also considered the construction of credible intervals for distributions arising from the algorithm. The algorithm is able to accurately reproduce distributions which have been difficult to obtain using existing algorithms. It is provided gratis on the repository https://github.com/suryaeff/gEISi.git. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: 46 pages, to be submitted to the Journal of the Electrochemical Society

arXiv:2005.09595 [pdf, other]

Continuous LWE

Authors: Joan Bruna, Oded Regev, Min Jae Song, Yi Tang

Abstract: We introduce a continuous analogue of the Learning with Errors (LWE) problem, which we name CLWE. We give a polynomial-time quantum reduction from worst-case lattice problems to CLWE, showing that CLWE enjoys similar hardness guarantees to those of LWE. Alternatively, our result can also be seen as opening new avenues of (quantum) attacks on lattice problems. Our work resolves an open problem rega… ▽ More We introduce a continuous analogue of the Learning with Errors (LWE) problem, which we name CLWE. We give a polynomial-time quantum reduction from worst-case lattice problems to CLWE, showing that CLWE enjoys similar hardness guarantees to those of LWE. Alternatively, our result can also be seen as opening new avenues of (quantum) attacks on lattice problems. Our work resolves an open problem regarding the computational complexity of learning mixtures of Gaussians without separability assumptions (Diakonikolas 2016, Moitra 2018). As an additional motivation, (a slight variant of) CLWE was considered in the context of robust machine learning (Diakonikolas et al.~FOCS 2017), where hardness in the statistical query (SQ) model was shown; our work addresses the open question regarding its computational hardness (Bubeck et al.~ICML 2019). △ Less

Submitted 24 October, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: 29 pages

arXiv:2004.00422 [pdf, other]

A General Large Neighborhood Search Framework for Solving Integer Linear Programs

Authors: Jialin Song, Ravi Lanka, Yisong Yue, Bistra Dilkina

Abstract: This paper studies a strategy for data-driven algorithm design for large-scale combinatorial optimization problems that can leverage existing state-of-the-art solvers in general purpose ways. The goal is to arrive at new approaches that can reliably outperform existing solvers in wall-clock time. We focus on solving integer programs, and ground our approach in the large neighborhood search (LNS) p… ▽ More This paper studies a strategy for data-driven algorithm design for large-scale combinatorial optimization problems that can leverage existing state-of-the-art solvers in general purpose ways. The goal is to arrive at new approaches that can reliably outperform existing solvers in wall-clock time. We focus on solving integer programs, and ground our approach in the large neighborhood search (LNS) paradigm, which iteratively chooses a subset of variables to optimize while leaving the remainder fixed. The appeal of LNS is that it can easily use any existing solver as a subroutine, and thus can inherit the benefits of carefully engineered heuristic or complete approaches and their software implementations. We show that one can learn a good neighborhood selector using imitation and reinforcement learning techniques. Through an extensive empirical validation in bounded-time optimization, we demonstrate that our LNS framework can significantly outperform compared to state-of-the-art commercial solvers such as Gurobi. △ Less

Submitted 22 December, 2020; v1 submitted 29 March, 2020; originally announced April 2020.

Comments: NeurIPS 2020

arXiv:2003.03463 [pdf, other]

Training Deep Energy-Based Models with f-Divergence Minimization

Authors: Lantao Yu, Yang Song, Jiaming Song, Stefano Ermon

Abstract: Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging because of the intractable partition function. They are typically trained via maximum likelihood, using contrastive divergence to approximate the gradient of the KL divergence between data and model distribution. While KL divergence has many desirable properties, other f-divergences ha… ▽ More Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging because of the intractable partition function. They are typically trained via maximum likelihood, using contrastive divergence to approximate the gradient of the KL divergence between data and model distribution. While KL divergence has many desirable properties, other f-divergences have shown advantages in training implicit density generative models such as generative adversarial networks. In this paper, we propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. We introduce a corresponding optimization algorithm and prove its local convergence property with non-linear dynamical systems theory. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL. △ Less

Submitted 20 July, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: ICML 2020

arXiv:2003.02205 [pdf, other]

Probabilistic Performance-Pattern Decomposition (PPPD): analysis framework and applications to stochastic mechanical systems

Authors: Ziqi Wang, Marco Broccardo, Junho Song

Abstract: Since the early 1900s, numerous research efforts have been devoted to develo** quantitative solutions to stochastic mechanical systems. In general, the problem is perceived as solved when a complete or partial probabilistic description on the quantity of interest (QoI) is determined. However, in the presence of complex system behavior, there is a critical need to go beyond mere probabilistic des… ▽ More Since the early 1900s, numerous research efforts have been devoted to develo** quantitative solutions to stochastic mechanical systems. In general, the problem is perceived as solved when a complete or partial probabilistic description on the quantity of interest (QoI) is determined. However, in the presence of complex system behavior, there is a critical need to go beyond mere probabilistic descriptions. In fact, to gain a full understanding of the system, it is crucial to extract physical characterizations from the probabilistic structure of the QoI, especially when the QoI solution is obtained in a data-driven fashion. Motivated by this perspective, the paper proposes a framework to obtain structuralized characterizations on behaviors of stochastic systems. The framework is named Probabilistic Performance-Pattern Decomposition (PPPD). PPPD analysis aims to decompose complex response behaviors, conditional to a prescribed performance state, into meaningful patterns in the space of system responses, and to investigate how the patterns are triggered in the space of basic random variables. To illustrate the application of PPPD, the paper studies three numerical examples: 1) an illustrative example with hypothetical stochastic processes input and output; 2) a stochastic Lorenz system with periodic as well as chaotic behaviors; and 3) a simplified shear-building model subjected to a stochastic ground motion excitation. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: Autoencoder, clustering, diffusion map, manifold learning, Monte Carlo simulation, pattern recognition, stochastic dynamics, uncertainty quantification. 44 Pages

arXiv:2003.01941 [pdf, other]

Gaussianization Flows

Authors: Chenlin Meng, Yang Song, Jiaming Song, Stefano Ermon

Abstract: Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal appro… ▽ More Iterative Gaussianization is a fixed-point iteration procedure that can transform any continuous random vector into a Gaussian one. Based on iterative Gaussianization, we propose a new type of normalizing flow model that enables both efficient computation of likelihoods and efficient inversion for sample generation. We demonstrate that these models, named Gaussianization flows, are universal approximators for continuous probability distributions under some regularity conditions. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation. Experimentally, we show that Gaussianization flows achieve better or comparable performance on several tabular datasets compared to other efficiently invertible flow models such as Real NVP, Glow and FFJORD. In particular, Gaussianization flows are easier to initialize, demonstrate better robustness with respect to different transformations of the training data, and generalize better on small training sets. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: AISTATS 2020

arXiv:2003.00638 [pdf, other]

Permutation Invariant Graph Generation via Score-Based Generative Modeling

Authors: Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, Stefano Ermon

Abstract: Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a p… ▽ More Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling. In particular, we design a permutation equivariant, multi-channel graph neural network to model the gradient of the data distribution at the input graph (a.k.a., the score function). This permutation equivariant model of gradients implicitly defines a permutation invariant distribution for graphs. We train this graph neural network with score matching and sample from it with annealed Langevin dynamics. In our experiments, we first demonstrate the capacity of this new architecture in learning discrete graph algorithms. For graph generation, we find that our learning approach achieves better or comparable results to existing models on benchmark datasets. △ Less

Submitted 1 March, 2020; originally announced March 2020.

Comments: 14 pages, AISTATS 2020

arXiv:2002.10689 [pdf, other]

A Theory of Usable Information Under Computational Constraints

Authors: Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, Stefano Ermon

Abstract: We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive $\mathcal{V}$-information} encompasses mutual information and other notions of informativeness such as the coefficien… ▽ More We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive $\mathcal{V}$-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, $\mathcal{V}$-information can be created through computation. This is consistent with deep neural networks extracting hierarchies of progressively more informative features in representation learning. Additionally, we show that by incorporating computational constraints, $\mathcal{V}$-information can be reliably estimated from data even in high dimensions with PAC-style guarantees. Empirically, we demonstrate predictive $\mathcal{V}$-information is more effective than mutual information for structure learning and fair representation learning. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: ICLR 2020 (Talk)

arXiv:2002.09847 [pdf, other]

Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN

Authors: Joonyoung Song, Jae-Heon Jeong, Dae-Soon Park, Hyun-Ho Kim, Doo-Chun Seo, Jong Chul Ye

Abstract: Multi-spectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), near-infrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multi-spectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises o… ▽ More Multi-spectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), near-infrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multi-spectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises often affect the quality of scenes and have negative impacts on the applications of satellite imagery. Recently, deep learning approaches have been extensively explored for the removal of noises in satellite imagery. Most deep learning denoising methods, however, follow a supervised learning scheme, which requires matched noisy image and clean image pairs that are difficult to collect in real situations. In this paper, we propose a novel unsupervised multispectral denoising method for satellite imagery using wavelet subband cycle-consistent adversarial network (WavCycleGAN). The proposed method is based on unsupervised learning scheme using adversarial loss and cycle-consistency loss to overcome the lack of paired data. Moreover, in contrast to the standard image domain cycleGAN, we introduce a wavelet subband domain learning scheme for effective denoising without sacrificing high frequency components such as edges and detail information. Experimental results for the removal of vertical stripe and wave noises in satellite imaging sensors demonstrate that the proposed method effectively removes noises and preserves important high frequency features of satellite images. △ Less

Submitted 23 February, 2020; originally announced February 2020.

arXiv:2002.04997 [pdf, ps, other]

PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators

Authors: Zhanhong Tan, Jiebo Song, Xiaolong Ma, Sia-Huat Tan, Hongyang Chen, Yuanqing Miao, Yifu Wu, Shaokai Ye, Yanzhi Wang, Dehui Li, Kaisheng Ma

Abstract: Weight pruning is a powerful technique to realize model compression. We propose PCNN, a fine-grained regular 1D pruning method. A novel index format called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN. Leveraging SPM with limited pruning patterns and non-zero sequences with equal length, PCNN can be efficiently employed in hardware. Evaluated on VGG-16 and ResNet-18, our… ▽ More Weight pruning is a powerful technique to realize model compression. We propose PCNN, a fine-grained regular 1D pruning method. A novel index format called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN. Leveraging SPM with limited pruning patterns and non-zero sequences with equal length, PCNN can be efficiently employed in hardware. Evaluated on VGG-16 and ResNet-18, our PCNN achieves the compression rate up to 8.4X with only 0.2% accuracy loss. We also implement a pattern-aware architecture in 55nm process, achieving up to 9.0X speedup and 28.39 TOPS/W efficiency with only 3.1% on-chip memory overhead of indices. △ Less

Submitted 14 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: 6 pages, DAC 2020 accepted paper

arXiv:1912.11006 [pdf, other]

Data-Free Adversarial Distillation

Authors: Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song

Abstract: Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challengi… ▽ More Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results. Code and pretrained models are available at https://github.com/VainF/Data-Free-Adversarial-Distillation. △ Less

Submitted 2 March, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

arXiv:1911.09274 [pdf, other]

Computer Model Emulation with High-Dimensional Functional Output in Large-Scale Observing System Uncertainty Experiments

Authors: Pulong Ma, Anirban Mondal, Bledar Konomi, Jonathan Hobbs, Joon Song, Emily Kang

Abstract: Observing system uncertainty experiments (OSUEs) have been recently proposed as a cost-effective way to perform probabilistic assessment of retrievals for NASA's Orbiting Carbon Observatory-2 (OCO-2) mission. One important component in the OCO-2 retrieval algorithm is a full-physics forward model that describes the mathematical relationship between atmospheric variables such as carbon dioxide and… ▽ More Observing system uncertainty experiments (OSUEs) have been recently proposed as a cost-effective way to perform probabilistic assessment of retrievals for NASA's Orbiting Carbon Observatory-2 (OCO-2) mission. One important component in the OCO-2 retrieval algorithm is a full-physics forward model that describes the mathematical relationship between atmospheric variables such as carbon dioxide and radiances measured by the remote sensing instrument. This forward model is complicated and computationally expensive but large-scale OSUEs require evaluation of this model numerous times, which makes it infeasible for comprehensive experiments. To tackle this issue, we develop a statistical emulator to facilitate large-scale OSUEs in the OCO-2 mission with independent emulation. Within each distinct spectral band, the emulator represents radiances output at irregular wavelengths via a linear combination of basis functions and random coefficients. These random coefficients are then modeled with nearest-neighbor Gaussian processes with built-in input dimension reduction via active subspace. The proposed emulator reduces dimensionality in both input space and output space, so that fast computation is achieved within a fully Bayesian inference framework. Validation experiments demonstrate that this emulator outperforms other competing statistical methods and a reduced order model that approximates the full-physics forward model. △ Less

Submitted 2 November, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: 45 pages

arXiv:1910.09779 [pdf, other]

Bridging the Gap Between $f$-GANs and Wasserstein GANs

Authors: Jiaming Song, Stefano Ermon

Abstract: Generative adversarial networks (GANs) have enjoyed much success in learning high-dimensional distributions. Learning objectives approximately minimize an $f$-divergence ($f$-GANs) or an integral probability metric (Wasserstein GANs) between the model and the data distribution using a discriminator. Wasserstein GANs enjoy superior empirical performance, but in $f$-GANs the discriminator can be int… ▽ More Generative adversarial networks (GANs) have enjoyed much success in learning high-dimensional distributions. Learning objectives approximately minimize an $f$-divergence ($f$-GANs) or an integral probability metric (Wasserstein GANs) between the model and the data distribution using a discriminator. Wasserstein GANs enjoy superior empirical performance, but in $f$-GANs the discriminator can be interpreted as a density ratio estimator which is necessary in some GAN applications. In this paper, we bridge the gap between $f$-GANs and Wasserstein GANs (WGANs). First, we list two constraints over variational $f$-divergence estimation objectives that preserves the optimal solution. Next, we minimize over a Lagrangian relaxation of the constrained objective, and show that it generalizes critic objectives of both $f$-GAN and WGAN. Based on this generalization, we propose a novel practical objective, named KL-Wasserstein GAN (KL-WGAN). We demonstrate empirical success of KL-WGAN on synthetic datasets and real-world image generation benchmarks, and achieve state-of-the-art FID scores on CIFAR10 image generation. △ Less

Submitted 17 June, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

Comments: updated for ICML camera ready version

arXiv:1910.09115 [pdf, other]

Unsupervised Out-of-Distribution Detection with Batch Normalization

Authors: Jiaming Song, Yang Song, Stefano Ermon

Abstract: Likelihood from a generative model is a natural statistic for detecting out-of-distribution (OoD) samples. However, generative models have been shown to assign higher likelihood to OoD samples compared to ones from the training distribution, preventing simple threshold-based detection rules. We demonstrate that OoD detection fails even when using more sophisticated statistics based on the likeliho… ▽ More Likelihood from a generative model is a natural statistic for detecting out-of-distribution (OoD) samples. However, generative models have been shown to assign higher likelihood to OoD samples compared to ones from the training distribution, preventing simple threshold-based detection rules. We demonstrate that OoD detection fails even when using more sophisticated statistics based on the likelihoods of individual samples. To address these issues, we propose a new method that leverages batch normalization. We argue that batch normalization for generative models challenges the traditional i.i.d. data assumption and changes the corresponding maximum likelihood objective. Based on this insight, we propose to exploit in-batch dependencies for OoD detection. Empirical results suggest that this leads to more robust detection for high-dimensional images. △ Less

Submitted 20 October, 2019; originally announced October 2019.

arXiv:1910.06222 [pdf, other]

Understanding the Limitations of Variational Mutual Information Estimators

Authors: Jiaming Song, Stefano Ermon

Abstract: Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance tradeoffs. We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying… ▽ More Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance tradeoffs. We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI. We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved bias-variance trade-offs on standard benchmark tasks. △ Less

Submitted 24 March, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: Fixed some typos, credit to Yilun Xu

arXiv:1910.00105 [pdf, other]

Domain Adaptive Imitation Learning

Authors: Kuno Kim, Yihong Gu, Jiaming Song, Shengjia Zhao, Stefano Ermon

Abstract: We study the question of how to imitate tasks across domains with discrepancies such as embodiment, viewpoint, and dynamics mismatch. Many prior works require paired, aligned demonstrations and an additional RL step that requires environment interactions. However, paired, aligned demonstrations are seldom obtainable and RL procedures are expensive. We formalize the Domain Adaptive Imitation Learni… ▽ More We study the question of how to imitate tasks across domains with discrepancies such as embodiment, viewpoint, and dynamics mismatch. Many prior works require paired, aligned demonstrations and an additional RL step that requires environment interactions. However, paired, aligned demonstrations are seldom obtainable and RL procedures are expensive. We formalize the Domain Adaptive Imitation Learning (DAIL) problem, which is a unified framework for imitation learning in the presence of viewpoint, embodiment, and dynamics mismatch. Informally, DAIL is the process of learning how to perform a task optimally, given demonstrations of the task in a distinct domain. We propose a two step approach to DAIL: alignment followed by adaptation. In the alignment step we execute a novel unsupervised MDP alignment algorithm, Generative Adversarial MDP Alignment (GAMA), to learn state and action correspondences from \emph{unpaired, unaligned} demonstrations. In the adaptation step we leverage the correspondences to zero-shot imitate tasks across domains. To describe when DAIL is feasible via alignment and adaptation, we introduce a theory of MDP alignability. We experimentally evaluate GAMA against baselines in embodiment, viewpoint, and dynamics mismatch scenarios where aligned demonstrations don't exist and show the effectiveness of our approach. △ Less

Submitted 18 July, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

Comments: ICML 2020

arXiv:1908.11466 [pdf, ps, other]

A robust approach for testing parameter change in Poisson autoregressive models

Authors: Jiwon Kang, Junmo Song

Abstract: Parameter change test has been an important issue in time series analysis. The problem has also been actively explored in the field of integer-valued time series, but the testing in the presence of outliers has not yet been extensively investigated. This study considers the problem of testing for parameter change in Poisson autoregressive models particularly when observations are contaminated by o… ▽ More Parameter change test has been an important issue in time series analysis. The problem has also been actively explored in the field of integer-valued time series, but the testing in the presence of outliers has not yet been extensively investigated. This study considers the problem of testing for parameter change in Poisson autoregressive models particularly when observations are contaminated by outliers. To lessen the impact of outliers on testing procedure, we propose a test based on the density power divergence, which is introduced by Basu et al. (Biometrika, 1998), and derive its limiting null distribution. Monte Carlo simulation results demonstrate validity and strong robustness of the proposed test. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 15 pages

arXiv:1908.07307 [pdf, other]

Investigation of wind pressures on tall building under interference effects using machine learning techniques

Authors: Gang Hu, Lingbo Liu, Dacheng Tao, Jie Song, K. C. S. Kwok

Abstract: Interference effects of tall buildings have attracted numerous studies due to the boom of clusters of tall buildings in megacities. To fully understand the interference effects of buildings, it often requires a substantial amount of wind tunnel tests. Limited wind tunnel tests that only cover part of interference scenarios are unable to fully reveal the interference effects. This study used machin… ▽ More Interference effects of tall buildings have attracted numerous studies due to the boom of clusters of tall buildings in megacities. To fully understand the interference effects of buildings, it often requires a substantial amount of wind tunnel tests. Limited wind tunnel tests that only cover part of interference scenarios are unable to fully reveal the interference effects. This study used machine learning techniques to resolve the conflicting requirement between limited wind tunnel tests that produce unreliable results and a completed investigation of the interference effects that is costly and time-consuming. Four machine learning models including decision tree, random forest, XGBoost, generative adversarial networks (GANs), were trained based on 30% of a dataset to predict both mean and fluctuating pressure coefficients on the principal building. The GANs model exhibited the best performance in predicting these pressure coefficients. A number of GANs models were then trained based on different portions of the dataset ranging from 10% to 90%. It was found that the GANs model based on 30% of the dataset is capable of predicting both mean and fluctuating pressure coefficients under unseen interference conditions accurately. By using this GANs model, 70% of the wind tunnel test cases can be saved, largely alleviating the cost of this kind of wind tunnel testing study. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 15 pages, 14 figures

arXiv:1907.13220 [pdf, other]

Multi-Agent Adversarial Inverse Reinforcement Learning

Authors: Lantao Yu, Jiaming Song, Stefano Ermon

Abstract: Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is… ▽ More Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with ground truth ones, and significantly outperforms prior methods in terms of policy imitation. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: ICML 2019

arXiv:1907.04484 [pdf, other]

Co-training for Policy Learning

Authors: Jialin Song, Ravi Lanka, Yisong Yue, Masahiro Ono

Abstract: We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training f… ▽ More We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: UAI 2019, oral presentation

Showing 1–50 of 94 results for author: Song, J