Skip to main content

Showing 1–25 of 25 results for author: Mozer, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2301.11790  [pdf, other

    cs.CV cs.LG stat.ML

    Leveraging the Third Dimension in Contrastive Learning

    Authors: Sumukh Aithal, Anirudh Goyal, Alex Lamb, Yoshua Bengio, Michael Mozer

    Abstract: Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks. Most SSL methods rely on augmentations obtained by transforming the 2D image pixel map. These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment, and that low-level biological vision relies h… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  2. arXiv:2204.04875  [pdf, other

    stat.ML cs.LG

    Learning to Induce Causal Structure

    Authors: Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Anirudh Goyal, Jorg Bornschein, Melanie Rey, Theophane Weber, Matthew Botvinic, Michael Mozer, Danilo Jimenez Rezende

    Abstract: The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data. Most existing causal induction algorithms operate by generating candidate graphs and evaluating them using either score-based methods (including continuous optimization) or independence tests. In our work, we instead treat the inference process as a black box and… ▽ More

    Submitted 7 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  3. arXiv:2109.05675  [pdf, other

    cs.CV cs.LG stat.ML

    Online Unsupervised Learning of Visual Representations and Categories

    Authors: Mengye Ren, Tyler R. Scott, Michael L. Iuzzolino, Michael C. Mozer, Richard Zemel

    Abstract: Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised mode… ▽ More

    Submitted 28 May, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Technical report, 32 pages

  4. arXiv:2109.02429  [pdf, other

    stat.ML cs.LG

    Learning Neural Causal Models with Active Interventions

    Authors: Nino Scherrer, Olexa Bilaniuk, Yashas Annadani, Anirudh Goyal, Patrick Schwab, Bernhard Schölkopf, Michael C. Mozer, Yoshua Bengio, Stefan Bauer, Nan Rosemary Ke

    Abstract: Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far, differentiable causal discovery has focused on static datasets of observational or fixed int… ▽ More

    Submitted 5 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

  5. arXiv:2107.00848  [pdf, other

    stat.ML cs.LG

    Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

    Authors: Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

    Abstract: Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables,… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  6. arXiv:2103.01937  [pdf, other

    cs.AI cs.LG stat.ML

    Neural Production Systems: Learning Rule-Governed Visual Dynamics

    Authors: Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

    Abstract: Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent -- that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph… ▽ More

    Submitted 23 March, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: NeurIPS'21

  7. arXiv:2103.01197  [pdf, other

    cs.LG cs.AI stat.ML

    Coordination Among Neural Modules Through a Shared Global Workspace

    Authors: Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio

    Abstract: Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorp… ▽ More

    Submitted 22 March, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: ICLR'22 accepted paper

  8. arXiv:2012.08668  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Mitigating Bias in Calibration Error Estimation

    Authors: Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

    Abstract: For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared. Most research in calibration focuses on techniques to reduce this empirical measure of calibration error, ECE_bin. We instead focus on assessing statistical bias i… ▽ More

    Submitted 10 February, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: To be published in AISTATS 2022. Code is available https://github.com/google-research/google-research/tree/master/caltrain

  9. arXiv:2010.08012  [pdf, other

    cs.LG stat.ML

    Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

    Authors: Alex Lamb, Anirudh Goyal, Agnieszka Słowik, Michael Mozer, Philippe Beaudoin, Yoshua Bengio

    Abstract: Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods whi… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  10. arXiv:2007.04546  [pdf, other

    cs.LG cs.CV stat.ML

    Wandering Within a World: Online Contextualized Few-Shot Learning

    Authors: Mengye Ren, Michael L. Iuzzolino, Michael C. Mozer, Richard S. Zemel

    Abstract: We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in the real world, where the presence of spatiotemporal context helps us retriev… ▽ More

    Submitted 22 April, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: ICLR 2021

  11. arXiv:2006.16981  [pdf, other

    cs.LG cs.NE stat.ML

    Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

    Authors: Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

    Abstract: Robust perception relies on both bottom-up and top-down signals. Bottom-up signals consist of what's directly observed through sensation. Top-down signals consist of beliefs and expectations based on past experience and short-term memory, such as how the phrase `peanut butter and~...' will be completed. The optimal combination of bottom-up and top-down information remains an open question, but the… ▽ More

    Submitted 15 November, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  12. arXiv:2006.16225  [pdf, other

    cs.LG stat.ML

    Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

    Authors: Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, Michael Mozer

    Abstract: Modeling a structured, dynamic environment like a video game requires kee** track of the objects and their states declarative knowledge) as well as predicting how objects behave (procedural knowledge). Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, i.e., they lack systematicity. For example, in a video game, correct prediction… ▽ More

    Submitted 12 November, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Type/Token Distinction in Deep learning Framework

  13. arXiv:2002.04193  [pdf, other

    cs.LG stat.ML

    Compositional Embeddings for Multi-Label One-Shot Learning

    Authors: Zeqian Li, Michael C. Mozer, Jacob Whitehill

    Abstract: We present a compositional embedding framework that infers not just a single class per input image, but a set of classes, in the setting of one-shot learning. Specifically, we propose and evaluate several novel models consisting of (1) an embedding function f trained jointly with a "composition" function g that computes set union operations between the classes encoded in two embedding vectors; and… ▽ More

    Submitted 13 November, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  14. arXiv:2002.03206  [pdf, other

    cs.LG stat.ML

    Characterizing Structural Regularities of Labeled Data in Overparameterized Models

    Authors: Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer

    Abstract: Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pum**, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how indiv… ▽ More

    Submitted 15 June, 2021; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: 17 pages, 20 figures, ICML 2021

  15. arXiv:1910.01075  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Neural Causal Models from Unknown Interventions

    Authors: Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

    Abstract: Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the… ▽ More

    Submitted 23 August, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  16. arXiv:1909.11702  [pdf, other

    stat.ML cs.LG

    Stochastic Prototype Embeddings

    Authors: Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

    Abstract: Supervised deep-embedding methods project inputs of a domain to a representational space in which same-class instances lie near one another and different-class instances lie far apart. We propose a probabilistic method that treats embeddings as random variables. Extending a state-of-the-art deterministic method, Prototypical Networks (Snell et al., 2017), our approach supposes the existence of a c… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: 15 pages, 8 figures

  17. arXiv:1906.03504  [pdf, other

    cs.LG cs.NE stat.ML

    Convolutional Bipartite Attractor Networks

    Authors: Michael Iuzzolino, Yoram Singer, Michael C. Mozer

    Abstract: In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence. The problem of interpretation is well matched to an early and often overlooked architecture, the attractor network---a recurrent neural net that performs constraint satisfaction, imputation of missing fea… ▽ More

    Submitted 26 September, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

  18. arXiv:1905.11382  [pdf, other

    cs.LG cs.AI stat.ML

    State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

    Authors: Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer

    Abstract: Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We intr… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: ICML 2019 [full oral]. arXiv admin note: text overlap with arXiv:1805.08394

  19. arXiv:1905.10837  [pdf, other

    cs.LG stat.ML

    Sequential mastery of multiple visual tasks: Networks naturally learn to learn and forget to forget

    Authors: Guy Davidson, Michael C. Mozer

    Abstract: We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. This setting corresponds to that which human learners face as they acquire domain expertise serially, for example, as an individual studies a textbook. T… ▽ More

    Submitted 30 March, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

  20. arXiv:1903.01069  [pdf, other

    cs.LG stat.ML

    Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

    Authors: Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

    Abstract: The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity. We use deep-learning methods to investigate whether natural scene statistics might be sufficient to derive the Gestalt laws. We examine the law of closure, which asserts that human visual perception… ▽ More

    Submitted 29 June, 2020; v1 submitted 3 March, 2019; originally announced March 2019.

  21. arXiv:1902.04698  [pdf, other

    stat.ML cs.AI cs.LG

    Identity Crisis: Memorization and Generalization under Extreme Overparameterization

    Authors: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

    Abstract: We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-map** task. We examine fully-connected and convolutional networks (FCN and CNN), both linear and nonlinear, initialized randomly and then trained to minimize the reconstruction error. The trained networks stereotypically take one of two for… ▽ More

    Submitted 8 January, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: ICLR 2020

  22. arXiv:1810.00110  [pdf, other

    cs.LG stat.ML

    Open-Ended Content-Style Recombination Via Leakage Filtering

    Authors: Karl Ridgeway, Michael C. Mozer

    Abstract: We consider visual domains in which a class label specifies the content of an image, and class-irrelevant properties that differentiate instances constitute the style. We present a domain-independent method that permits the open-ended recombination of style of one image with the content of another. Open ended simply means that the method generalizes to style and content not present in the training… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

  23. arXiv:1809.03702  [pdf, other

    cs.LG stat.ML

    Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

    Authors: Nan Rosemary Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

    Abstract: Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes c… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: To appear as a Spotlight presentation at NIPS 2018

  24. arXiv:1805.08402  [pdf, other

    cs.LG stat.ML

    Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning

    Authors: Tyler R. Scott, Karl Ridgeway, Michael C. Mozer

    Abstract: The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained… ▽ More

    Submitted 27 October, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

  25. arXiv:1802.05312  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Deep Disentangled Embeddings with the F-Statistic Loss

    Authors: Karl Ridgeway, Michael C. Mozer

    Abstract: Deep-embedding methods aim to discover representations of a domain that make explicit the domain's class structure and thereby support few-shot learning. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm suitable for both goals. We propose and evaluate a novel loss function based… ▽ More

    Submitted 19 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.