Skip to main content

Showing 1–24 of 24 results for author: Huszar, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.14302  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

    Authors: Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

    Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2405.18836  [pdf, other

    stat.ME cs.LG

    Do Finetti: On Causal Effects for Exchangeable Data

    Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.01964  [pdf, other

    stat.ML cs.LG

    Position: Understanding LLMs Requires More Than Statistical Generalization

    Authors: Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

    Abstract: The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statist… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted as a position paper at ICML2024, Code: https://github.com/rpatrik96/llm-non-identifiability

  4. arXiv:2210.10452  [pdf, other

    stat.ML cs.LG

    Rethinking Sharpness-Aware Minimization as Variational Inference

    Authors: Szilvia Ujváry, Zsigmond Telek, Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: Sharpness-aware minimization (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima. In this work, we establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network parameters. We show that both these methods have interpretations as optimizing notions of flatness, and when using the reparametrisation trick, they both boil dow… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  5. arXiv:2203.15756  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data

    Authors: Siyuan Guo, Viktor Tóth, Bernhard Schölkopf, Ferenc Huszár

    Abstract: Constraint-based causal discovery methods leverage conditional independence tests to infer causal relationships in a wide variety of applications. Just as the majority of machine learning methods, existing work focuses on studying $\textit{independent and identically distributed}$ data. However, it is known that even with infinite i.i.d.$\ $ data, constraint-based methods can only identify causal… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: camera-ready NeurIPS 2023

  6. arXiv:2111.11542  [pdf, other

    stat.ML cs.LG

    Depth Without the Magic: Inductive Bias of Natural Gradient Descent

    Authors: Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep lear… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  7. arXiv:2008.00727  [pdf

    cs.LG cs.AI cs.NE stat.ML

    Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

    Authors: Dalin Guo, Sofia Ira Ktena, Ferenc Huszar, Pranay Kumar Myana, Wenzhe Shi, Alykhan Tejani

    Abstract: Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to addres… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  8. arXiv:2007.14523  [pdf, other

    cs.SI cs.LG stat.ML

    Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems

    Authors: Cao** Zhang, Yicun Liu, Yuanpu Xie, Sofia Ira Ktena, Alykhan Tejani, Akshay Gupta, Pranay Kumar Myana, Deepak Dilipkumar, Suvadip Paul, Ikuhiro Ihara, Prasang Upadhyaya, Ferenc Huszar, Wenzhe Shi

    Abstract: Deep Neural Networks (DNNs) with sparse input features have been widely used in recommender systems in industry. These models have large memory requirements and need a huge amount of training data. The large model size usually entails a cost, in the range of millions of dollars, for storage and communication with the inference services. In this paper, we propose a hybrid hashing method to combine… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: Paper is accepted to RecSys 2020

  9. arXiv:1907.06558  [pdf, other

    stat.ML cs.LG

    Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction

    Authors: Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, Wenzhe Shi

    Abstract: One of the challenges in display advertising is that the distribution of features and click through rate (CTR) can exhibit large shifts over time due to seasonality, changes to ad campaigns and other factors. The predominant strategy to keep up with these shifts is to train predictive models continuously, on fresh data, in order to prevent them from becoming stale. However, in many ad systems posi… ▽ More

    Submitted 23 April, 2021; v1 submitted 15 July, 2019; originally announced July 2019.

    Comments: Accepted at RecSys '19

  10. arXiv:1807.02175  [pdf

    stat.AP eess.IV

    Adaptive Paired-Comparison Method for Subjective Video Quality Assessment on Mobile Devices

    Authors: Katherine Storrs, Sebastiaan Van Leuven, Steve Kojder, Lucas Theis, Ferenc Huszár

    Abstract: To effectively evaluate subjective visual quality in weakly-controlled environments, we propose an Adaptive Paired Comparison method based on particle filtering. As our approach requires each sample to be rated only once, the test time compared to regular paired comparison can be reduced. The method works with non-experts and improves reliability compared to MOS and DS-MOS methods.

    Submitted 5 July, 2018; originally announced July 2018.

    Journal ref: Picture Coding Symposium, 2018

  11. arXiv:1802.07535  [pdf, other

    stat.ML

    BRUNO: A Deep Recurrent Model for Exchangeable Data

    Authors: Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre

    Abstract: We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train… ▽ More

    Submitted 16 October, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: NIPS 2018

  12. arXiv:1801.05787  [pdf, other

    cs.CV stat.ML

    Faster gaze prediction with dense networks and Fisher pruning

    Authors: Lucas Theis, Iryna Korshunova, Alykhan Tejani, Ferenc Huszár

    Abstract: Predicting human fixations from images has recently seen large improvements by leveraging deep representations which were pretrained for object recognition. However, as we show in this paper, these networks are highly overparameterized for the task of fixation prediction. We first present a simple yet principled greedy pruning method which we call Fisher pruning. Through a combination of knowledge… ▽ More

    Submitted 9 July, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

  13. On Quadratic Penalties in Elastic Weight Consolidation

    Authors: Ferenc Huszár

    Abstract: Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation (Eskin et al, 2004), and this view is consistent with the motivation given by Kirkpatrick et al (2017). In this note, I present an extended derivation that covers the case when there are… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

  14. arXiv:1703.00395  [pdf, other

    stat.ML cs.CV

    Lossy Image Compression with Compressive Autoencoders

    Authors: Lucas Theis, Wenzhe Shi, Andrew Cunningham, Ferenc Huszár

    Abstract: We propose a new approach to the problem of optimizing autoencoders for lossy image compression. New media formats, changing hardware technology, as well as diverse requirements and content types create a need for compression algorithms which are more flexible than existing codecs. Autoencoders have the potential to address this need, but are difficult to optimize directly due to the inherent non-… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

  15. arXiv:1702.08235  [pdf, other

    stat.ML cs.LG

    Variational Inference using Implicit Distributions

    Authors: Ferenc Huszár

    Abstract: Generative adversarial networks (GANs) have given us a great tool to fit implicit generative models to data. Implicit distributions are ones we can sample from easily, and take derivatives of samples with respect to model parameters. These models are highly expressive and we argue they can prove just as useful for variational inference (VI) as they are for generative modelling. Several papers have… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  16. arXiv:1610.04490  [pdf, other

    cs.CV cs.LG stat.ML

    Amortised MAP Inference for Image Super-resolution

    Authors: Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, Ferenc Huszár

    Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A mor… ▽ More

    Submitted 21 February, 2017; v1 submitted 14 October, 2016; originally announced October 2016.

  17. arXiv:1609.05158  [pdf, other

    cs.CV stat.ML

    Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

    Authors: Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

    Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolut… ▽ More

    Submitted 23 September, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: CVPR 2016 paper with updated affiliations and supplemental material, fixed typo in equation 4

  18. arXiv:1609.04802  [pdf, other

    cs.CV stat.ML

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

    Authors: Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

    Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. R… ▽ More

    Submitted 25 May, 2017; v1 submitted 15 September, 2016; originally announced September 2016.

    Comments: 19 pages, 15 figures, 2 tables, accepted for oral presentation at CVPR, main paper + some supplementary material

  19. arXiv:1511.05101  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

    Authors: Ferenc Huszár

    Abstract: Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to t… ▽ More

    Submitted 16 November, 2015; originally announced November 2015.

  20. arXiv:1408.2049   

    cs.LG stat.ML

    Optimally-Weighted Herding is Bayesian Quadrature

    Authors: Ferenc Huszar, David Duvenaud

    Abstract: Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can b… ▽ More

    Submitted 13 July, 2016; v1 submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012). This copy was withdrawn since it's a duplicate of arXiv:1204.1664

    Report number: UAI-P-2012-PG-377-386

  21. arXiv:1204.1664  [pdf, other

    stat.ML math.NA

    Optimally-Weighted Herding is Bayesian Quadrature

    Authors: Ferenc Huszár, David Duvenaud

    Abstract: Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can b… ▽ More

    Submitted 15 July, 2016; v1 submitted 7 April, 2012; originally announced April 2012.

    Comments: Accepted as an oral presentation at Uncertainty in Artificial Intelligence 2012. Updated to fix several typos

    ACM Class: G.1.4

  22. arXiv:1112.5745  [pdf, other

    stat.ML cs.LG

    Bayesian Active Learning for Classification and Preference Learning

    Authors: Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, Máté Lengyel

    Abstract: Information theoretic active learning has been widely studied for probabilistic models. For simple regression an optimal myopic policy is easily tractable. However, for other tasks and with more complex models, such as classification with nonparametric models, the optimal solution is harder to compute. Current approaches make approximations to achieve tractability. We propose an approach that expr… ▽ More

    Submitted 24 December, 2011; originally announced December 2011.

  23. Adaptive Bayesian Quantum Tomography

    Authors: Ferenc Huszár, Neil M. T. Houlsby

    Abstract: In this letter we revisit the problem of optimal design of quantum tomographic experiments. In contrast to previous approaches where an optimal set of measurements is decided in advance of the experiment, we allow for measurements to be adaptively and efficiently re-optimised depending on data collected so far. We develop an adaptive statistical framework based on Bayesian inference and Shannon's… ▽ More

    Submitted 5 October, 2011; v1 submitted 5 July, 2011; originally announced July 2011.

    Comments: 4 pages, 3 figures, updated references, clarified exposition

  24. arXiv:1103.1761  [pdf, other

    stat.ML stat.ME

    A Kernel Approach to Tractable Bayesian Nonparametrics

    Authors: Ferenc Huszár, Simon Lacoste-Julien

    Abstract: Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the succe… ▽ More

    Submitted 12 August, 2011; v1 submitted 9 March, 2011; originally announced March 2011.

    Comments: acknowledgements added to previous version, content otherwise unchanged