Skip to main content

Showing 1–47 of 47 results for author: Domke, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19747  [pdf, other

    cs.LG stat.ML

    Understanding and mitigating difficulties in posterior predictive evaluation

    Authors: Abhinav Agrawal, Justin Domke

    Abstract: Predictive posterior densities (PPDs) are of interest in approximate Bayesian inference. Typically, these are estimated by simple Monte Carlo (MC) averages using samples from the approximate posterior. We observe that the signal-to-noise ratio (SNR) of such estimators can be extremely low. An analysis for exact inference reveals SNR decays exponentially as there is an increase in (a) the mismatch… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2310.17009  [pdf, other

    stat.ME cs.LG stat.CO

    Simulation-based stacking

    Authors: Yuling Yao, Bruno Régaldo-Saint Blancard, Justin Domke

    Abstract: Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a consistency guarantee, we present a general posterior stacking framework to make use of all available approximations. Our… ▽ More

    Submitted 29 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Published at International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  3. arXiv:2310.03742  [pdf, other

    cs.NI

    A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network

    Authors: Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler

    Abstract: Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully a… ▽ More

    Submitted 21 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24) Santa Clara, CA, USA April 16-18, 2024

  4. arXiv:2306.03638  [pdf, ps, other

    cs.LG math.OC stat.ML

    Provable convergence guarantees for black-box variational inference

    Authors: Justin Domke, Guillaume Garrigos, Robert Gower

    Abstract: Black-box variational inference is widely used in situations where there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs: namely the challenge of gradient estimators with unusual noise bounds, and a composite non-smooth objective. For dense Gaussian variational families, we observe that existing gradient… ▽ More

    Submitted 21 December, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  5. arXiv:2305.14593  [pdf, other

    stat.ML cs.LG stat.CO

    Discriminative calibration: Check Bayesian computation from simulations and flexible classifier

    Authors: Yuling Yao, Justin Domke

    Abstract: To check the accuracy of Bayesian computations, it is common to use rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The test statistic is somewhat ad-hoc, interactions are difficult to examine, multiple testing is a challenge, and the resulting p-value is not a divergence metric. We propose to replace the marginal rank test with a flexible classification approach that le… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published at Neural Information Processing Systems (NeurIPS 2023)

  6. arXiv:2304.06803  [pdf, other

    cs.LG math.OC stat.ML

    Sample Average Approximation for Black-Box VI

    Authors: Javier Burroni, Justin Domke, Daniel Sheldon

    Abstract: We present a novel approach for black-box VI that bypasses the difficulties of stochastic gradient ascent, including the task of selecting step-sizes. Our approach involves using a sequence of sample average approximation (SAA) problems. SAA approximates the solution of stochastic optimization problems by transforming them into deterministic ones. We use quasi-Newton methods and line search to sol… ▽ More

    Submitted 17 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  7. arXiv:2302.13918  [pdf, other

    cs.LG stat.ML

    U-Statistics for Importance-Weighted Variational Inference

    Authors: Javier Burroni, Kenta Takatsu, Justin Domke, Daniel Sheldon

    Abstract: We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires $m > 1$ samples and a total of $n > m$ samples to be used for estimation, lower variance is achieved by averaging the base estimator on overlap** batches of size $m$ than disjoint batches, as current… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR)

  8. arXiv:2301.02432  [pdf, other

    cs.DC cs.AR cs.CY cs.LG cs.SI

    Myths and Legends in High-Performance Computing

    Authors: Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

    Abstract: In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, papers, and other communications such as tweets, blogs, and news articles within and beyond our community. We believe they represent the zeitgeist of the c… ▽ More

    Submitted 24 October, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

  9. arXiv:2210.07290  [pdf, other

    cs.LG stat.ML

    Joint control variate for faster black-box variational inference

    Authors: Xi Wang, Tomas Geffner, Justin Domke

    Abstract: Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance. This variance comes from two sources of randomness: Data subsampling and Monte Carlo sampling. While existing control variates only address Monte Carlo noise, and incremental gradient methods typically only address data subsampling, we propose a new "joint" control variate that j… ▽ More

    Submitted 8 March, 2024; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Published in the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  10. arXiv:2208.07743  [pdf, ps, other

    cs.LG

    Langevin Diffusion Variational Inference

    Authors: Tomas Geffner, Justin Domke

    Abstract: Many methods that build powerful variational distributions based on unadjusted Langevin transitions exist. Most of these were developed using a wide range of different approaches and techniques. Unfortunately, the lack of a unified analysis and derivation makes develo** new methods and reasoning about existing ones a challenging task. We address this giving a single analysis that unifies and gen… ▽ More

    Submitted 23 March, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

  11. arXiv:2207.00257  [pdf, other

    cs.PL cs.DC

    High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

    Authors: William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

    Abstract: While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  12. arXiv:2204.07336  [pdf, ps, other

    cs.DC

    Preparing for the Future -- Rethinking Proxy Apps

    Authors: Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Ray Bair, Andrew A. Chien, Jeffrey S. Vetter, John Shalf

    Abstract: A considerable amount of research and engineering went into designing proxy applications, which represent common high-performance computing workloads, to co-design and evaluate the current generation of supercomputers, e.g., RIKEN's Supercomputer Fugaku, ANL's Aurora, or ORNL's Frontier. This process was necessary to standardize the procurement while avoiding duplicated effort at each HPC center t… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  13. arXiv:2204.02235  [pdf, other

    cs.DC

    At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

    Authors: Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka

    Abstract: Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method… ▽ More

    Submitted 16 October, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

  14. arXiv:2203.04432  [pdf, other

    cs.LG stat.ML

    Variational Inference with Locally Enhanced Bounds for Hierarchical Models

    Authors: Tomas Geffner, Justin Domke

    Abstract: Hierarchical models represent a challenging setting for inference algorithms. MCMC methods struggle to scale to large models with many local variables and observations, and variational inference (VI) may fail to provide accurate approximations due to the use of simple variational families. Some variational methods (e.g. importance weighted VI) integrate Monte Carlo methods to give better accuracy,… ▽ More

    Submitted 25 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Presented at ICML 2022

    MSC Class: 68T99

  15. arXiv:2111.03144  [pdf, other

    cs.LG stat.ML

    Amortized Variational Inference for Simple Hierarchical Models

    Authors: Abhinav Agrawal, Justin Domke

    Abstract: It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset. Thus, inference in hierarchical models remains a challenge at large scale. It is helpful to use a variational family with structure matching the posterior, but optimization is still slow due to the huge number of local distributions. Instead, this… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Neural Information Processing Systems (NeurIPS) 2021

  16. arXiv:2110.11466  [pdf, other

    cs.LG cs.DC

    MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

    Authors: Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda , et al. (18 additional authors not shown)

    Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning appli… ▽ More

    Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

  17. arXiv:2109.15134  [pdf, other

    stat.ML cs.LG

    Variational Marginal Particle Filters

    Authors: **lin Lai, Justin Domke, Daniel Sheldon

    Abstract: Variational inference for state space models (SSMs) is known to be hard in general. Recent works focus on deriving variational objectives for SSMs from unbiased sequential Monte Carlo estimators. We reveal that the marginal particle filter is obtained from sequential Monte Carlo by applying Rao-Blackwellization operations, which sacrifices the trajectory information for reduced variance and differ… ▽ More

    Submitted 14 March, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: Accepted to AISTATS 2022

  18. arXiv:2107.07157  [pdf, other

    cs.DC cs.PF

    A64FX -- Your Compiler You Must Decide!

    Authors: Jens Domke

    Abstract: The current number one of the TOP500 list, Supercomputer Fugaku, has demonstrated that CPU-only HPC systems aren't dead and CPUs can be used for more than just being the host controller for a discrete accelerators. While the specifications of the chip and overall system architecture, and benchmarks submitted to various lists, like TOP500 and Green500, etc., are clearly highlighting the potential,… ▽ More

    Submitted 2 August, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

  19. arXiv:2107.04150  [pdf, other

    cs.LG stat.ML

    MCMC Variational Inference via Uncorrected Hamiltonian Annealing

    Authors: Tomas Geffner, Justin Domke

    Abstract: Given an unnormalized target distribution we want to obtain approximate samples from it and a tight lower bound on its (log) normalization constant log Z. Annealed Importance Sampling (AIS) with Hamiltonian MCMC is a powerful method that can be used to do this. Its main drawback is that it uses non-differentiable transition kernels, which makes tuning its many parameters hard. We propose a framewo… ▽ More

    Submitted 30 October, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Published at NeurIPS (2021)

    MSC Class: 68T99

    Journal ref: NeurIPS (2021)

  20. arXiv:2105.06587  [pdf, other

    cs.LG stat.ML

    Empirical Evaluation of Biased Methods for Alpha Divergence Minimization

    Authors: Tomas Geffner, Justin Domke

    Abstract: In this paper we empirically evaluate biased methods for alpha-divergence minimization. In particular, we focus on how the bias affects the final solutions found, and how this depends on the dimensionality of the problem. We find that (i) solutions returned by these methods appear to be strongly biased towards minimizers of the traditional "exclusive" KL-divergence, KL(q||p), and (ii) in high dime… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    MSC Class: 62F15

  21. arXiv:2103.01030  [pdf, other

    cs.LG stat.ML

    An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations

    Authors: Justin Domke

    Abstract: It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. Th… ▽ More

    Submitted 25 February, 2021; originally announced March 2021.

  22. arXiv:2010.14373  [pdf, other

    cs.DC

    Matrix Engines for High Performance Computing:A Paragon of Performance or Gras** at Straws?

    Authors: Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    Abstract: Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced from the No.1 benchmark in supercomputing, namely High Performance Linpack, one would expect an awakened enthusiasm by the HPC community, too. Hence… ▽ More

    Submitted 27 February, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

  23. arXiv:2010.09541  [pdf, other

    stat.ML cs.LG

    On the Difficulty of Unbiased Alpha Divergence Minimization

    Authors: Tomas Geffner, Justin Domke

    Abstract: Several approximate inference algorithms have been proposed to minimize an alpha-divergence between an approximating distribution and a target distribution. Many of these algorithms introduce bias, the magnitude of which becomes problematic in high dimensions. Other algorithms are unbiased. These often seem to suffer from high variance, but little is rigorously known. In this work we study unbiase… ▽ More

    Submitted 26 October, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: ICML 2021

    MSC Class: 62F15

    Journal ref: ICML 2021

  24. arXiv:2008.11421  [pdf, other

    cs.DC cs.LG

    Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

    Authors: Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka

    Abstract: The dedicated memory of hardware accelerators can be insufficient to store all weights and/or intermediate states of large deep learning models. Although model parallelism is a viable approach to reduce the memory pressure issue, significant modification of the source code and considerations for algorithms are required. An alternative solution is to use out-of-core methods instead of, or in additi… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: ACM/IEEE Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20)

  25. arXiv:2007.14634  [pdf, other

    cs.LG stat.ML

    Approximation Based Variance Reduction for Reparameterization Gradients

    Authors: Tomas Geffner, Justin Domke

    Abstract: Flexible variational distributions improve variational inference but are harder to optimize. In this work we present a control variate that is applicable for any reparameterizable distribution with known mean and covariance matrix, e.g. Gaussians with any covariance structure. The control variate is based on a quadratic approximation of the model, and its parameters are set using a double-descent… ▽ More

    Submitted 23 October, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: Neural Information Processing Systems (NeurIPS 2020)

    MSC Class: 68T37

  26. arXiv:2007.03776  [pdf, other

    cs.NI cs.DC cs.PF

    High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks

    Authors: Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler

    Abstract: The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits of these topologies is routing. On one hand, these networks provide shorter path lengths than established topologies such as Clos… ▽ More

    Submitted 29 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021

  27. arXiv:2006.10343  [pdf, other

    cs.LG stat.ML

    Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization

    Authors: Abhinav Agrawal, Daniel Sheldon, Justin Domke

    Abstract: Recent research has seen several advances relevant to black-box VI, but the current state of automatic posterior inference is unclear. One such advance is the use of normalizing flows to define flexible posterior densities for deep latent variable models. Another direction is the integration of Monte-Carlo methods to serve two purposes; first, to obtain tighter variational objectives for optimizat… ▽ More

    Submitted 23 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Neural Information Processing Systems (NeurIPS) 2020

  28. arXiv:2004.04628  [pdf, other

    cs.DC

    White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

    Authors: Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku

    Abstract: In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which a… ▽ More

    Submitted 11 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Report number: hal-02536316

  29. arXiv:2001.09771  [pdf, ps, other

    cs.LG stat.ML

    Moment-Matching Conditions for Exponential Families with Conditioning or Hidden Data

    Authors: Justin Domke

    Abstract: Maximum likelihood learning with exponential families leads to moment-matching of the sufficient statistics, a classic result. This can be generalized to conditional exponential families and/or when there are hidden data. This document gives a first-principles explanation of these generalized moment-matching conditions, along with a self-contained derivation.

    Submitted 7 January, 2020; originally announced January 2020.

  30. arXiv:1911.01894  [pdf, other

    cs.LG stat.ML

    A Rule for Gradient Estimator Selection, with an Application to Variational Inference

    Authors: Tomas Geffner, Justin Domke

    Abstract: Stochastic gradient descent (SGD) is the workhorse of modern machine learning. Sometimes, there are many different potential gradient estimators that can be used. When so, choosing the one with the best tradeoff between cost and variance is important. This paper analyzes the convergence rates of SGD as a function of time, rather than iterations. This results in a simple rule to select the estimato… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 18 pages, preliminary work. International Conference on Artificial Intelligence and Statistics. 2020

    MSC Class: 68T99

  31. arXiv:1908.04970  [pdf, other

    cs.LG stat.ML

    Thompson Sampling with Approximate Inference

    Authors: My Phan, Yasin Abbasi-Yadkori, Justin Domke

    Abstract: We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $α$-divergence) can lead to poor performance (linear regret) due to under-ex… ▽ More

    Submitted 14 January, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

  32. arXiv:1906.10115  [pdf, other

    cs.LG stat.ML

    Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

    Authors: Justin Domke, Daniel Sheldon

    Abstract: Recent work in variational inference (VI) uses ideas from Monte Carlo estimation to tighten the lower bounds on the log-likelihood that are used as objectives. However, there is no systematic understanding of how optimizing different objectives relates to approximating the posterior distribution. Develo** such a connection is important if the ideas are to be applied to inference-i.e., applicatio… ▽ More

    Submitted 7 January, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: Neural Information Processing Systems (NeurIPS) 2019

  33. arXiv:1906.08241  [pdf, other

    cs.LG stat.ML

    Provable Gradient Variance Guarantees for Black-Box Variational Inference

    Authors: Justin Domke

    Abstract: Recent variational inference methods use stochastic gradient estimators whose variance is not well understood. Theoretical guarantees for these estimators are important to understand when these methods will or will not work. This paper gives bounds for the common "reparameterization" estimators when the target is smooth and the variational family is a location-scale distribution. These bounds are… ▽ More

    Submitted 27 October, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Neural Information Processing Systems (NeurIPS) 2019

  34. arXiv:1901.08431  [pdf, other

    cs.LG stat.ML

    Provable Smoothness Guarantees for Black-Box Variational Inference

    Authors: Justin Domke

    Abstract: Black-box variational inference tries to approximate a complex target distribution though a gradient-based optimization of the parameters of a simpler distribution. Provable convergence guarantees require structural properties of the objective. This paper shows that for location-scale family approximations, if the target is M-Lipschitz smooth, then so is the objective, if the entropy is excluded.… ▽ More

    Submitted 14 August, 2020; v1 submitted 24 January, 2019; originally announced January 2019.

    Comments: International Conference on Machine Learning (ICML) 2020

  35. arXiv:1810.12482  [pdf, other

    cs.LG stat.ML

    Using Large Ensembles of Control Variates for Variational Inference

    Authors: Tomas Geffner, Justin Domke

    Abstract: Variational inference is increasingly being addressed with stochastic optimization. In this setting, the gradient's variance plays a crucial role in the optimization procedure, since high variance gradients lead to poor convergence. A popular approach used to reduce gradient's variance involves the use of control variates. Despite the good results obtained, control variates developed for variation… ▽ More

    Submitted 22 October, 2020; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Neural Information Processing Systems (NIPS 2018)

    MSC Class: 68T99

  36. arXiv:1810.09330  [pdf, ps, other

    cs.DC

    Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

    Authors: Jens Domke, Kazuaki Matsumura, Mohamed Wahib, Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, Artur Podobas, Satoshi Matsuoka

    Abstract: Among the (uncontended) common wisdom in High-Performance Computing (HPC) is the applications' need for large amount of double-precision support in hardware. Hardware manufacturers, the TOP500 list, and (rarely revisited) legacy software have without doubt followed and contributed to this view. In this paper, we challenge that wisdom, and we do so by exhaustively comparing a large number of HPC… ▽ More

    Submitted 25 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: IEEE International Parallel and Distributed Processing Symposium 2019

  37. arXiv:1808.09034  [pdf, other

    cs.LG stat.ML

    Importance Weighting and Variational Inference

    Authors: Justin Domke, Daniel Sheldon

    Abstract: Recent work used importance sampling ideas for better variational bounds on likelihoods. We clarify the applicability of these ideas to pure probabilistic inference, by showing the resulting Importance Weighted Variational Inference (IWVI) technique is an instance of augmented variational inference, thus identifying the looseness in previous work. Experiments confirm IWVI's practicality for probab… ▽ More

    Submitted 26 October, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: Neural Information Processing Systems (NIPS) 2018

  38. arXiv:1805.07785  [pdf, other

    stat.ML cs.CV cs.LG

    Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding

    Authors: Ga Wu, Justin Domke, Scott Sanner

    Abstract: Variational Autoencoders (VAEs) are a popular generative model, but one in which conditional inference can be challenging. If the decomposition into query and evidence variables is fixed, conditional VAEs provide an attractive solution. To support arbitrary queries, one is generally reduced to Markov Chain Monte Carlo sampling methods that can suffer from long mixing times. In this paper, we propo… ▽ More

    Submitted 3 October, 2018; v1 submitted 20 May, 2018; originally announced May 2018.

    Comments: 8 pages main content, 4 pages appendix

  39. arXiv:1706.06529  [pdf, other

    cs.LG stat.ML

    A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI

    Authors: Justin Domke

    Abstract: Two popular classes of methods for approximate inference are Markov chain Monte Carlo (MCMC) and variational inference. MCMC tends to be accurate if run for a long enough time, while variational inference tends to give better approximations at shorter time horizons. However, the amount of time needed for MCMC to exceed the performance of variational methods can be quite high, motivating more fine-… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

    Comments: International Conference on Machine Learning (ICML) 2017

  40. arXiv:1510.00087  [pdf, other

    cs.LG cs.AI stat.ML

    Clam** Improves TRW and Mean Field Approximations

    Authors: Adrian Weller, Justin Domke

    Abstract: We examine the effect of clam** variables for approximate inference in undirected graphical models with pairwise relationships and discrete variables. For any number of variable labels, we demonstrate that clam** and summing approximate sub-partition functions can lead only to a decrease in the partition function estimate for TRW, and an increase for the naive mean field method, in each case g… ▽ More

    Submitted 30 September, 2015; originally announced October 2015.

  41. arXiv:1509.08992  [pdf, ps, other

    cs.LG stat.ML

    Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets

    Authors: Justin Domke

    Abstract: Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) infe… ▽ More

    Submitted 30 October, 2015; v1 submitted 29 September, 2015; originally announced September 2015.

    Comments: Advances in Neural Information Processing Systems 2015

  42. arXiv:1411.1119  [pdf, other

    cs.LG stat.ML

    Projecting Markov Random Field Parameters for Fast Mixing

    Authors: Xianghang Liu, Justin Domke

    Abstract: Markov chain Monte Carlo (MCMC) algorithms are simple and extremely powerful techniques to sample from almost arbitrary distributions. The flaw in practice is that it can take a large and/or unknown amount of time to converge to the stationary distribution. This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a pr… ▽ More

    Submitted 11 November, 2014; v1 submitted 4 November, 2014; originally announced November 2014.

    Comments: Neural Information Processing Systems 2014

  43. arXiv:1407.2710  [pdf, other

    cs.LG stat.ML

    Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

    Authors: Aaron J. Defazio, Tibério S. Caetano, Justin Domke

    Abstract: Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement… ▽ More

    Submitted 10 July, 2014; originally announced July 2014.

    Journal ref: International Conference on Machine Learning 2014

  44. arXiv:1407.0754  [pdf, ps, other

    cs.LG stat.ML

    Structured Learning via Logistic Regression

    Authors: Justin Domke

    Abstract: A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is "smoothed" through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem wi… ▽ More

    Submitted 2 July, 2014; originally announced July 2014.

    Comments: Advances in Neural Information Processing Systems 2013

  45. arXiv:1407.0749  [pdf, ps, other

    cs.LG stat.ML

    Projecting Ising Model Parameters for Fast Mixing

    Authors: Justin Domke, Xianghang Liu

    Abstract: Inference in general Ising models is difficult, due to high treewidth making tree-based algorithms intractable. Moreover, when interactions are strong, Gibbs sampling may take exponential time to converge to the stationary distribution. We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. We find that Gibbs… ▽ More

    Submitted 8 October, 2014; v1 submitted 2 July, 2014; originally announced July 2014.

    Comments: Advances in Neural Information Processing Systems 2013

  46. Learning Graphical Model Parameters with Approximate Marginal Inference

    Authors: Justin Domke

    Abstract: Likelihood based-learning of graphical models faces challenges of computational-complexity and robustness to model mis-specification. This paper studies methods that fit parameters directly to maximize a measure of the accuracy of predicted marginals, taking into account both model and inference approximations at training time. Experiments on imaging problems suggest marginalization-based learning… ▽ More

    Submitted 14 January, 2013; originally announced January 2013.

    Comments: To Appear, IEEE Transactions on Pattern Analysis and Machine Intelligence

    ACM Class: I.2.6; I.4.8

  47. arXiv:1206.3247  [pdf

    cs.LG stat.ML

    Learning Convex Inference of Marginals

    Authors: Justin Domke

    Abstract: Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-137-144