Skip to main content

Showing 1–14 of 14 results for author: Chang, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19681  [pdf, other

    stat.ML cs.LG stat.CO

    Bayesian Online Natural Gradient (BONG)

    Authors: Matt Jones, Peter Chang, Kevin Murphy

    Abstract: We propose a novel approach to sequential Bayesian inference based on variational Bayes. The key insight is that, in the online setting, we do not need to add the KL term to regularize to the prior (which comes from the posterior at the previous timestep); instead we can optimize just the expected log-likelihood, performing a single step of natural gradient descent starting at the prior predictive… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 41 pages, 11 figures

  2. arXiv:2404.03828  [pdf, other

    cs.LG cs.AI stat.ML

    Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

    Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

    Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an out… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

  3. arXiv:2403.10929  [pdf, other

    stat.ML cs.LG

    Function-space Parameterization of Neural Networks for Sequential Learning

    Authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin

    Abstract: Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. While Gaussian processes elegantly tackle these problems, they struggle with scalability and handling rich inputs, such as images. To address these issues, we introduce a technique that converts neural networks from weight space to function space,… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 29 pages, 8 figures, Published in The Twelfth International Conference on Learning Representations

  4. arXiv:2401.16571  [pdf, other

    stat.ME stat.AP stat.ML

    Individualized Multi-Treatment Response Curves Estimation using RBF-net with Shared Neurons

    Authors: Peter Chang, Arkaprava Roy

    Abstract: Heterogeneous treatment effect estimation is an important problem in precision medicine. Specific interests lie in identifying the differential effect of different treatments based on some external covariates. We propose a novel non-parametric treatment effect estimation method in a multi-treatment setting. Our non-parametric modeling of the response curves relies on radial basis function (RBF)-ne… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 22 pages (not including references), with 4 pages of appendices 8 tables and 1 figure in the main paper, 2 tables and 2 figures in the appendices

  5. arXiv:2309.02195  [pdf, ps, other

    stat.ML cs.LG

    Sparse Function-space Representation of Neural Networks

    Authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin

    Abstract: Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization. Importantly, the dual parameterization enables us to formulate a sparse representation that captures information from the entire data set. This offers a compact an… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to ICML 2023 Workshop on Duality for Modern Machine Learning, Honolulu, Hawaii, USA. 4 pages, 2 figures, 1 table

  6. arXiv:2306.03566  [pdf, other

    cs.LG stat.ML

    Memory-Based Dual Gaussian Processes for Sequential Learning

    Authors: Paul E. Chang, Prakhar Verma, S. T. John, Arno Solin, Mohammad Emtiyaz Khan

    Abstract: Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dua… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: International Conference on Machine Learning (ICML) 2023

  7. arXiv:2305.19535  [pdf, other

    stat.ML cs.LG

    Low-rank extended Kalman filtering for online learning of neural networks from streaming data

    Authors: Peter G. Chang, Gerardo Durán-Martín, Alexander Y Shestopaloff, Matt Jones, Kevin Murphy

    Abstract: We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In… ▽ More

    Submitted 27 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Journal ref: COLLAS conference 2023

  8. arXiv:2211.01053  [pdf, other

    cs.LG stat.ML

    Fantasizing with Dual GPs in Bayesian Optimization and Active Learning

    Authors: Paul E. Chang, Prakhar Verma, ST John, Victor Picheny, Henry Moss, Arno Solin

    Abstract: Gaussian processes (GPs) are the main surrogate functions used for sequential modelling such as Bayesian Optimization and Active Learning. Their drawbacks are poor scaling with data and the need to run an optimization loop when using a non-Gaussian likelihood. In this paper, we focus on `fantasizing' batch acquisition functions that need the ability to condition on new fantasized data computationa… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: In the 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Modeling, and Decision-making Systems

  9. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  10. arXiv:2111.03412  [pdf, other

    cs.LG stat.ML

    Dual Parameterization of Sparse Variational Gaussian Processes

    Authors: Vincent Adam, Paul E. Chang, Mohammad Emtiyaz Khan, Arno Solin

    Abstract: Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up in… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2021)

  11. arXiv:2105.02211  [pdf, other

    q-fin.TR q-fin.CP stat.CO

    Simulation and estimation of a point-process market-model with a matching engine

    Authors: Ivan Jericevich, Patrick Chang, Tim Gebbie

    Abstract: The extent to which a matching engine can cloud the modelling of underlying order submission and management processes in a financial market remains an unanswered concern with regards to market models. Here we consider a 10-variate Hawkes process with simple rules to simulate common order types which are submitted to a matching engine. Hawkes processes can be used to model the time and order of eve… ▽ More

    Submitted 17 August, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: 19 pages, 33 figures

  12. arXiv:2007.05994  [pdf, other

    stat.ML cs.LG

    State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes

    Authors: William J. Wilkinson, Paul E. Chang, Michael Riis Andersen, Arno Solin

    Abstract: We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on thes… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2020

  13. arXiv:2007.04731  [pdf, other

    cs.LG stat.ML

    Fast Variational Learning in State-Space Gaussian Process Models

    Authors: Paul E. Chang, William J. Wilkinson, Mohammad Emtiyaz Khan, Arno Solin

    Abstract: Gaussian process (GP) regression with 1D inputs can often be performed in linear time via a stochastic differential equation formulation. However, for non-Gaussian likelihoods, this requires application of approximate inference methods which can make the implementation difficult, e.g., expectation propagation can be numerically unstable and variational inference can be computationally inefficient.… ▽ More

    Submitted 17 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: To appear in MLSP 2020

  14. arXiv:2003.02842  [pdf, other

    q-fin.CP q-fin.TR stat.CO

    Malliavin-Mancino estimators implemented with non-uniform fast Fourier transforms

    Authors: Patrick Chang, Etienne Pienaar, Tim Gebbie

    Abstract: We implement and test kernel averaging Non-Uniform Fast Fourier Transform (NUFFT) methods to enhance the performance of correlation and covariance estimation on asynchronously sampled event-data using the Malliavin-Mancino Fourier estimator. The methods are benchmarked for Dirichlet and Fejér Fourier basis kernels. We consider test cases formed from Geometric Brownian motions to replicate synchron… ▽ More

    Submitted 10 November, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: 29 pages, 15 figures, 3 tables, 10 algorithms, link to our supporting Julia code: https://github.com/CHNPAT005/PCEPTG-MM-NUFFT; v3: Accepted submitted version for SISC

    MSC Class: Primary 62G08; 65T04; secondary 62P08

    Journal ref: SIAM J. Sci. Comput., 2020, 42(6), B1378 - B1403