Search | arXiv e-print repository

Learned feature representations are biased by complexity, learning order, position, and more

Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Katherine Hermann

Abstract: Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which… ▽ More Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which we attempt to match the computational role that different features play, while manipulating other properties of the features or the data. We train various deep learning architectures to compute these multiple abstract features about their inputs. We find that their learned feature representations are systematically biased towards representing some features more strongly than others, depending upon extraneous properties such as feature complexity, the order in which features are learned, and the distribution of features over the inputs. For example, features that are simpler to compute or learned first tend to be represented more strongly and densely than features that are more complex or learned later, even if all features are learned equally well. We also explore how these biases are affected by architectures, optimizers, and training regimes (e.g., in transformers, features decoded earlier in the output sequence also tend to be represented more strongly). Our results help to characterize the inductive biases of gradient-based representation learning. These results also highlight a key challenge for interpretability $-$ or for comparing the representations of models and brains $-$ disentangling extraneous biases from the computationally important aspects of a system's internal representations. △ Less

Submitted 6 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2401.13306 [pdf, other]

POSTER: Towards Secure 5G Infrastructures for Production Systems

Authors: Martin Henze, Maximilian Ortmann, Thomas Vogt, Osman Ugus, Kai Hermann, Svenja Nohr, Zeren Lu, Sotiris Michaelides, Angela Massonet, Robert H. Schmitt

Abstract: To meet the requirements of modern production, industrial communication increasingly shifts from wired fieldbus to wireless 5G communication. Besides tremendous benefits, this shift introduces severe novel risks, ranging from limited reliability over new security vulnerabilities to a lack of accountability. To address these risks, we present approaches to (i) prevent attacks through authentication… ▽ More To meet the requirements of modern production, industrial communication increasingly shifts from wired fieldbus to wireless 5G communication. Besides tremendous benefits, this shift introduces severe novel risks, ranging from limited reliability over new security vulnerabilities to a lack of accountability. To address these risks, we present approaches to (i) prevent attacks through authentication and redundant communication, (ii) detect anomalies and jamming, and (iii) respond to detected attacks through device exclusion and accountability measures. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted to the poster session of the 22nd International Conference on Applied Cryptography and Network Security (ACNS 2024)

arXiv:2310.16228 [pdf, other]

On the Foundations of Shortcut Learning

Authors: Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

Abstract: Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example tex… ▽ More Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and quantify a model's shortcut bias-its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in sha** how models solve tasks. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.13018 [pdf, other]

Getting aligned on representational alignment

Authors: Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine L. Hermann, Kerem Oktar, Klaus Greff, Martin N. Hebart, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas P. O'Connell , et al. (5 additional authors not shown)

Abstract: Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an… ▽ More Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and develo** information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions. △ Less

Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Working paper, changes to be made in upcoming revisions

arXiv:2307.13375 [pdf, other]

Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines

Authors: Alexander Jaus, Constantin Seibold, Kelsey Hermann, Alexandra Walter, Kristina Giske, Johannes Haubold, Jens Kleesiek, Rainer Stiefelhagen

Abstract: In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which exper… ▽ More In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which experts have approved. Our proposed procedure does not rely on manual annotation during the label aggregation stage. We examine its plausibility and usefulness using three complementary checks: Human expert evaluation which approved the dataset, a Deep Learning usefulness benchmark on the BTCV dataset in which we achieve 85% dice score without using its training dataset, and medical validity checks. This evaluation procedure combines scalable automated checks with labor-intensive high-quality expert checks. Besides the dataset, we release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 18 pages, 8 figures, 2 tables

arXiv:2307.06017 [pdf]

Nanoparticles with Cubic Symmetry: Classification of Polyhedral Shapes

Authors: Klaus E. Hermann

Abstract: The shape of crystalline nanoparticles (NP) can often be described by polyhedra with flat facet surfaces. Thus, structural studies of polyhedral bodies can help to describe geometric details of NPs. Here we consider compact polyhedra of cubic point symmetry Oh.as simple models. Their surfaces are described by facets with normal vectors along selected directions (a, b, c) together with their symmet… ▽ More The shape of crystalline nanoparticles (NP) can often be described by polyhedra with flat facet surfaces. Thus, structural studies of polyhedral bodies can help to describe geometric details of NPs. Here we consider compact polyhedra of cubic point symmetry Oh.as simple models. Their surfaces are described by facets with normal vectors along selected directions (a, b, c) together with their symmetry equivalents forming a direction family {abc}. For given {abc} this yields generic polyhedra with up to 48 facets where we focus on polyhedra with facets of {abc} = {100}, {110}, and {111}, suggested for metal NPs with cubic lattices. The resulting generic polyhedra, cubic, rhombohedral, and octahedral, can serve for the description of non-generic polyhedra as intersections of corresponding generic species. Their structural properties are shown to be fully determined by only three structure parameters, facet distances R100, R110, and R111 of three types of facets. This provides a phase diagram to completely classify the corresponding Oh symmetry polyhedra. Structural properties of all polyhedra, such as shape, size, and facet geometries, are discussed in analytical and numerical detail with visualization of characteristic examples. The results may be used for respective nanoparticle simulations but also as a repository assisting the interpretation of structures of real compact nanoparticles observed by experiment. △ Less

Submitted 21 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 40/56 pages, 18/9 figures (paper/supplement). arXiv admin note: text overlap with arXiv:2209.08919

arXiv:2306.04507 [pdf, other]

Improving neural network representations using human similarity judgments

Authors: Lukas Muttenthaler, Lorenz Linhardt, Jonas Dippel, Robert A. Vandermeulen, Katherine Hermann, Andrew K. Lampinen, Simon Kornblith

Abstract: Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning i… ▽ More Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning it with human similarity judgments. We find that a naive approach leads to large changes in local representational structure that harm downstream performance. Thus, we propose a novel method that aligns the global structure of representations while preserving their local structure. This global-local transform considerably improves accuracy across a variety of few-shot learning and anomaly detection tasks. Our results indicate that human visual representations are globally organized in a way that facilitates learning from few examples, and incorporating this global structure into neural network representations improves performance on downstream tasks. △ Less

Submitted 26 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Published as a conference paper at NeurIPS 2023

arXiv:2303.17651 [pdf, other]

Self-Refine: Iterative Refinement with Self-Feedback

Authors: Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark

Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it… ▽ More Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach. △ Less

Submitted 25 May, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: Code, data, and demo at https://selfrefine.info/

arXiv:2209.08919 [pdf]

Compact Polyhedra of Cubic Symmetry: Geometrical Analysis and Classification

Authors: KLaus E. Hermann

Abstract: Compact polyhedra of cubic point symmetry Oh, exhibit surfaces of planar sections (facets) characterized by normal vector families {abc} with up to 48 members each, compatible with Oh symmetry. We focus first on polyhedra confined by facets with normal vectors of one family, {100}, {110}, and {111}, separately. This yields generic polyhedra which serve for the definition of general compact polyhed… ▽ More Compact polyhedra of cubic point symmetry Oh, exhibit surfaces of planar sections (facets) characterized by normal vector families {abc} with up to 48 members each, compatible with Oh symmetry. We focus first on polyhedra confined by facets with normal vectors of one family, {100}, {110}, and {111}, separately. This yields generic polyhedra which serve for the definition of general compact polyhedra as intersections of the three generic species. Their structural properties, such as shape, size, volume, and surface facets, are found to be described by only three polyhedral structure parameters A, B, C. In addition, we examine compact polyhedra exhibiting facets defined by normal vectors of only one general {abc} family resulting in up to 48 facets. These polyhedra can be described by four polyhedral structure parameters Q, a, b, c. Geometrical properties of all polyhedra are discussed in analytical detail and substantiated by visualization of characteristic examples. Corresponding relationships can also be used to classify shapes and estimate sizes of real compact metal nanoparticles observed e.g. in electron microscopy experiments. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 43 pages, 21 figures

arXiv:2207.05632 [pdf]

doi 10.1088/1538-3873/acb293

The Science Performance of JWST as Characterized in Commissioning

Authors: Jane Rigby, Marshall Perrin, Michael McElwain, Randy Kimble, Scott Friedman, Matt Lallo, René Doyon, Lee Feinberg, Pierre Ferruit, Alistair Glasse, Marcia Rieke, George Rieke, Gillian Wright, Chris Willott, Knicole Colon, Stefanie Milam, Susan Neff, Christopher Stark, Jeff Valenti, Jim Abell, Faith Abney, Yasin Abul-Huda, D. Scott Acton, Evan Adams, David Adler , et al. (601 additional authors not shown)

Abstract: This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries f… ▽ More This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries for which it was built. Moreover, almost across the board, the science performance of JWST is better than expected; in most cases, JWST will go deeper faster than expected. The telescope and instrument suite have demonstrated the sensitivity, stability, image quality, and spectral range that are necessary to transform our understanding of the cosmos through observations spanning from near-earth asteroids to the most distant galaxies. △ Less

Submitted 10 April, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: 5th version as accepted to PASP; 31 pages, 18 figures; https://iopscience.iop.org/article/10.1088/1538-3873/acb293

Journal ref: PASP 135 048001 (2023)

arXiv:2201.07038 [pdf]

Polyhedral Metal Nanoparticles with Cubic Lattice: Theory of Structural Properties

Authors: Klaus E. Hermann

Abstract: We examine the structure of compact metal nanoparticles (NPs) forming polyhedral sections of face centered (fcc) and body centered (bcc) cubic lattices, which are confined by facets characterized by highly dense {100}, {110}, and {111} monolayers. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for… ▽ More We examine the structure of compact metal nanoparticles (NPs) forming polyhedral sections of face centered (fcc) and body centered (bcc) cubic lattices, which are confined by facets characterized by highly dense {100}, {110}, and {111} monolayers. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for the definition of general compact polyhedral cubic NPs. Their structural properties, such as shape, size, and surface facets, can be described by only three integer valued polyhedral NP parameters N, M, K. Corresponding analytical details are discussed with visualization of characteristic examples. While the overall NP shapes are quite similar between the different cubic lattice types, structural fine details differ. In particular, monolayer planes of adjacent NP facets can join at corners and edges which are not occupied by atoms of the ideal lattice. This gives rise to microfacets and narrow facet strips depending on the lattice type. The discussion illustrates the complexity of seemingly simple nanoparticles in a quantitative account. The geometric relationships of the model particles can also be used to classify shapes and estimate sizes of real compact metal nanoparticles observed by experiment. △ Less

Submitted 18 March, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: 95pages, 59 figures

arXiv:2101.04385 [pdf]

Structure and Morphology of Crystalline Metal Nanoparticles: Polyhedral Cubic Particles

Authors: Klaus E. Hermann

Abstract: We examine nanoparticles (NPs) forming polyhedral sections of the ideal cubic lattice, simple (sc), body centered (bcc), and face centered (fcc) cubic, which are confined by facets characterized by densest and second densest {h k l} monolayers of the lattice. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NP… ▽ More We examine nanoparticles (NPs) forming polyhedral sections of the ideal cubic lattice, simple (sc), body centered (bcc), and face centered (fcc) cubic, which are confined by facets characterized by densest and second densest {h k l} monolayers of the lattice. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for the definition of general polyhedral cubic NPs. Their structural properties, such as shape, size, and surface facets, are discussed in analytical and numerical detail with visualization of characteristic examples. The geometric relationships of the model particles expressed by corresponding formulas and numerical tables can be used to estimate shapes and sizes of real compact metal nanoparticles observed by experiment. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 58 pages, all figures (jpeg) included in the text

arXiv:2009.01418 [pdf, ps, other]

doi 10.1063/5.0028706

Limit theorems and soft edge of freezing random matrix models via dual orthogonal polynomials

Authors: Sergio Andraus, Kilian Hermann, Michael Voit

Abstract: $N$-dimensional Bessel and Jacobi processes describe interacting particle systems with $N$ particles and are related to $β$-Hermite, $β$-Laguerre, and $β$-Jacobi ensembles. For fixed $N$ there exist associated weak limit theorems (WLTs) in the freezing regime $β\to\infty$ in the $β$-Hermite and $β$-Laguerre case by Dumitriu and Edelman (2005) with explicit formulas for the covariance matrices $Σ_N… ▽ More $N$-dimensional Bessel and Jacobi processes describe interacting particle systems with $N$ particles and are related to $β$-Hermite, $β$-Laguerre, and $β$-Jacobi ensembles. For fixed $N$ there exist associated weak limit theorems (WLTs) in the freezing regime $β\to\infty$ in the $β$-Hermite and $β$-Laguerre case by Dumitriu and Edelman (2005) with explicit formulas for the covariance matrices $Σ_N$ in terms of the zeros of associated orthogonal polynomials. Recently, the authors derived these WLTs in a different way and computed $Σ_N^{-1}$ with formulas for the eigenvalues and eigenvectors of $Σ_N^{-1}$ and thus of $Σ_N$. In the present paper we use these data and the theory of finite dual orthogonal polynomials of de Boor and Saff to derive formulas for $Σ_N$ from $Σ_N^{-1}$ where, for $β$-Hermite and $β$-Laguerre ensembles, our formulas are simpler than those of Dumitriu and Edelman. We use these polynomials to derive asymptotic results for the soft edge in the freezing regime for $N\to\infty$ in terms of the Airy function. For $β$-Hermite ensembles, our limit expressions are different from those of Dumitriu and Edelman. △ Less

Submitted 29 June, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: 32 pages, made small improvements and added references

MSC Class: Primary 60F05; Secondary 60B20; 70F10; 82C22; 33C45; 33C10; 60J60

Journal ref: J. Math. Phys. 62, 083303 (2021)

arXiv:2006.12433 [pdf, other]

What shapes feature representations? Exploring datasets, architectures, and training

Authors: Katherine L. Hermann, Andrew K. Lampinen

Abstract: In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versat… ▽ More In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but more difficult one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have greater representational similarity to an untrained model than to models trained on a different task. Our results highlight the complex processes that determine which features a model represents. △ Less

Submitted 22 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: 22 pages

arXiv:1911.09071 [pdf, other]

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

Authors: Katherine L. Hermann, Ting Chen, Simon Kornblith

Abstract: Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on Im… ▽ More Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see. △ Less

Submitted 3 November, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: NeurIPS'2020

arXiv:1905.07983 [pdf, ps, other]

doi 10.2140/tunis.2021.3.843

Limit theorems for Jacobi ensembles with large parameters

Authors: Kilian Hermann, Michael Voit

Abstract: Consider Jacobi random matrix ensembles with the distributions $$c_{k_1,k_2,k_3}\prod_{1\leq i< j \leq N}\left(x_j-x_i\right)^{k_3}\prod_{i=1}^N \left(1-x_i\right)^{\frac{k_1+k_2}{2}-\frac{1}{2}}\left(1+x_i\right)^{\frac{k_2}{2}-\frac{1}{2}} dx$$ of the eigenvalues on the alcoves $$A:=\{x\in\mathbb R^N| \> -1\leq x_1\le ...\le x_N\leq 1\}.$$ For $(k_1,k_2,k_3)=κ\cdot (a,b,1)$ with $a,b>0$ fixed, w… ▽ More Consider Jacobi random matrix ensembles with the distributions $$c_{k_1,k_2,k_3}\prod_{1\leq i< j \leq N}\left(x_j-x_i\right)^{k_3}\prod_{i=1}^N \left(1-x_i\right)^{\frac{k_1+k_2}{2}-\frac{1}{2}}\left(1+x_i\right)^{\frac{k_2}{2}-\frac{1}{2}} dx$$ of the eigenvalues on the alcoves $$A:=\{x\in\mathbb R^N| \> -1\leq x_1\le ...\le x_N\leq 1\}.$$ For $(k_1,k_2,k_3)=κ\cdot (a,b,1)$ with $a,b>0$ fixed, we derive a central limit theorem for the distributions above for $κ\to\infty$. The drift and the inverse of the limit covariance matrix are expressed in terms of the zeros of classical Jacobi polynomials. We also rewrite the CLT in trigonometric form and determine the eigenvalues and eigenvectors of the limit covariance matrices. These results are related to corresponding limits for $β$-Hermite and $β$-Laguerre ensembles for $β\to\infty$ by Dumitriu and Edelman and by Voit. △ Less

Submitted 15 October, 2020; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: The presentation of the results is improved, and additional references are added

MSC Class: 60F05; 60B20; 70F10; 82C22; 33C45; 33C67

Journal ref: Tunisian J. Math. 3 (2021) 843-860

arXiv:1903.01292 [pdf, other]

The StreetLearn Environment and Dataset

Authors: Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Denis Teplyashin, Karl Moritz Hermann, Mateusz Malinowski, Matthew Koichi Grimes, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

Abstract: Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for dec… ▽ More Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 13 pages, 6 figures, 4 tables. arXiv admin note: text overlap with arXiv:1804.00168

arXiv:1903.00401 [pdf, other]

Learning To Follow Directions in Street View

Authors: Karl Moritz Hermann, Mateusz Malinowski, Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Raia Hadsell

Abstract: Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data.… ▽ More Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents. △ Less

Submitted 21 November, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

Journal ref: AAAI 2020

arXiv:1807.01670 [pdf, other]

Encoding Spatial Relations from Natural Language

Authors: Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

Abstract: Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.… ▽ More Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system. △ Less

Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

arXiv:1805.09786 [pdf, other]

Hyperbolic Attention Networks

Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

Abstract: We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks… ▽ More We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while kee** the neural representations compact. △ Less

Submitted 24 May, 2018; originally announced May 2018.

arXiv:1805.09208 [pdf, other]

Pushing the bounds of dropout

Authors: Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

Abstract: We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that… ▽ More We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective. △ Less

Submitted 27 September, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

arXiv:1804.03984 [pdf, other]

Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

Authors: Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark

Abstract: The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, i… ▽ More The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by develo** agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: To appear at ICLR 2018

arXiv:1804.00168 [pdf, other]

Learning to Navigate in Cities Without a Map

Authors: Piotr Mirowski, Matthew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

Abstract: Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support cont… ▽ More Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn △ Less

Submitted 9 January, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

Comments: 17 pages, 16 figures, published at NeurIPS 2018

Journal ref: Neural Information Processing Systems 2018

arXiv:1802.01802 [pdf]

Interlocking mechanism between molecular gears attached to surfaces

Authors: Rundong Zhao, Yan-Ling Zhao, Fei Qi, Klaus Hermann, Rui-Qin Zhang, Michel A. Van Hove

Abstract: While molecular machines play an increasingly significant role in nanoscience research and applications, there remains a shortage of investigations and understanding of the molecular gear (cogwheel), which is an indispensable and fundamental component to drive a larger correlated molecular machine system. Employing ab initio calculations, we investigate model systems consisting of molecules adsorb… ▽ More While molecular machines play an increasingly significant role in nanoscience research and applications, there remains a shortage of investigations and understanding of the molecular gear (cogwheel), which is an indispensable and fundamental component to drive a larger correlated molecular machine system. Employing ab initio calculations, we investigate model systems consisting of molecules adsorbed on metal or graphene surfaces, ranging from very simple triple-arm gears such as PF3 and NH3 to larger multi-arm gears based on carbon rings. We explore in detail the transmission of slow rotational motion from one gear to the next by these relatively simple molecules, so as to isolate and reveal the mechanisms of the relevant intermolecular interactions. Several characteristics of molecular gears are discussed, in particular the flexibility of the arms and the slip** and skip** between interlocking arms of adjacent gears, which differ from familiar macroscopic rigid gears. The underlying theoretical concepts suggest strongly that other analogous structures may also exhibit similar behavior which may inspire future exploration in designing large correlated molecular machines. △ Less

Submitted 6 February, 2018; originally announced February 2018.

arXiv:1712.07040 [pdf, other]

The NarrativeQA Reading Comprehension Challenge

Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents. △ Less

Submitted 19 December, 2017; originally announced December 2017.

arXiv:1710.09867 [pdf, other]

Understanding Early Word Learning in Situated Artificial Agents

Authors: Felix Hill, Stephen Clark, Karl Moritz Hermann, Phil Blunsom

Abstract: Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with… ▽ More Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with no meaningful prior knowledge overcome these obstacles, researchers currently lack a clear understanding of how they do so, a problem that we attempt to address in this paper. For maximum control and generality, we focus on a simple neural network-based language learning agent, trained via policy-gradient methods, which can interpret single-word instructions in a simulated 3D world. Whilst the goal is not to explicitly model infant word learning, we take inspiration from experimental paradigms in developmental psychology and apply some of these to the artificial agent, exploring the conditions under which established human biases and learning effects emerge. We further propose a novel method for visualising semantic representations in the agent. △ Less

Submitted 1 October, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

arXiv:1706.06551 [pdf, other]

Grounded Language Learning in a Simulated 3D World

Authors: Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom

Abstract: We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to… ▽ More We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent's comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrap** semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world. △ Less

Submitted 26 June, 2017; v1 submitted 20 June, 2017; originally announced June 2017.

Comments: 16 pages, 8 figures

arXiv:1609.09315 [pdf, other]

Semantic Parsing with Semi-Supervised Sequential Autoencoders

Authors: Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

Abstract: We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener… ▽ More We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms. △ Less

Submitted 29 September, 2016; originally announced September 2016.

arXiv:1607.00593 [pdf, ps, other]

doi 10.1039/C6CP05996A

Intramolecular Torque, an Indicator of the Internal Rotation Direction of Rotor Molecules and Similar Systems

Authors: Rui-Qin Zhang, Yan-Ling Zhao, Fei Qi, Klaus Hermann, Michel A. Van Hove

Abstract: Torque is ubiquitous in many molecular systems, including collisions, chemical reactions, vibrations, electronic excitations and especially rotor molecules. We present a straightforward theoretical method based on forces acting on atoms and obtained from atomistic quantum mechanics calculations, to quickly and qualitatively determine whether a molecule or sub-unit thereof has a tendency to rotatio… ▽ More Torque is ubiquitous in many molecular systems, including collisions, chemical reactions, vibrations, electronic excitations and especially rotor molecules. We present a straightforward theoretical method based on forces acting on atoms and obtained from atomistic quantum mechanics calculations, to quickly and qualitatively determine whether a molecule or sub-unit thereof has a tendency to rotation and, if so, around which axis and in which sense: clockwise or counterclockwise. The method also indicates which atoms, if any, are predominant in causing the rotation. Our computational approach can in general efficiently provide insights into the rotational ability of many molecules and help to theoretically screen or modify them in advance of experiments or before analyzing their rotational behavior in more detail with more extensive computations guided by the results from the torque approach. As an example, we demonstrate the effectiveness of the approach using a specific light-driven molecular rotary motor which was successfully synthesized and analyzed in prior experiments and simulations. △ Less

Submitted 3 July, 2016; originally announced July 2016.

Comments: 11 pages, 4 figures, 1 SI file

arXiv:1603.06744 [pdf, other]

Latent Predictor Networks for Code Generation

Authors: Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom

Abstract: Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be… ▽ More Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training. Using this framework, we address the problem of generating programming code from a mixed natural language and structured specification. We create two new data sets for this paradigm derived from the collectible trading card games Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, we demonstrate that marginalising multiple predictors allows our model to outperform strong benchmarks. △ Less

Submitted 8 June, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

arXiv:1509.06664 [pdf, other]

Reasoning about Entailment with Neural Attention

Authors: Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entai… ▽ More While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset. △ Less

Submitted 1 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

Comments: ICLR 2016 camera-ready, 9 pages, 10 figures (incl. subfigures)

MSC Class: 68T50 ACM Class: I.2.6; I.2.7

arXiv:1506.03340 [pdf, other]

Teaching Machines to Read and Comprehend

Authors: Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

Abstract: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides la… ▽ More Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure. △ Less

Submitted 19 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 14 pages, 13 figures

arXiv:1506.02516 [pdf, other]

Learning to Transduce with Unbounded Memory

Authors: Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom

Abstract: Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networ… ▽ More Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networks that implement continuously differentiable analogues of traditional data structures such as Stacks, Queues, and DeQues. We show that these architectures exhibit superior generalisation performance to Deep RNNs and are often able to learn the underlying generating algorithms in our transduction experiments. △ Less

Submitted 3 November, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

Comments: 14 pages, 4 figures, NIPS 2015

MSC Class: 68T05 ACM Class: I.5.1; I.2.6; I.2.7

arXiv:1412.1632 [pdf, ps, other]

Deep Learning for Answer Sentence Selection

Authors: Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman

Abstract: Answer sentence selection is the task of identifying sentences that contain the answer to a given question. This is an important problem in its own right as well as in the larger context of open domain question answering. We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding. This… ▽ More Answer sentence selection is the task of identifying sentences that contain the answer to a given question. This is an important problem in its own right as well as in the larger context of open domain question answering. We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding. This contrasts prior work on this task, which typically relies on classifiers with large numbers of hand-crafted syntactic and semantic features and various external resources. Our approach does not require any feature engineering nor does it involve specialist linguistic data, making this model easily applicable to a wide range of domains and languages. Experimental results on a standard benchmark dataset from TREC demonstrate that---despite its simplicity---our model matches state of the art performance on the answer sentence selection task. △ Less

Submitted 4 December, 2014; originally announced December 2014.

Comments: 9 pages, accepted by NIPS deep learning workshop

arXiv:1411.3146 [pdf, other]

Distributed Representations for Compositional Semantics

Authors: Karl Moritz Hermann

Abstract: The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches --- meaning distributed representations that exploit co-occurrence statistics of large corpora --- have proved popular and successful across a number of tas… ▽ More The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches --- meaning distributed representations that exploit co-occurrence statistics of large corpora --- have proved popular and successful across a number of tasks. However, natural language usually comes in structures beyond the word level, with meaning arising not only from the individual words but also the structure they are contained in at the phrasal or sentential level. Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is an equally fundamental task of NLP. This dissertation explores methods for learning distributed semantic representations and models for composing these into representations for larger linguistic units. Our underlying hypothesis is that neural models are a suitable vehicle for learning semantically rich representations and that such representations in turn are suitable vehicles for solving important tasks in natural language processing. The contribution of this thesis is a thorough evaluation of our hypothesis, as part of which we introduce several new approaches to representation learning and compositional semantics, as well as multiple state-of-the-art models which apply distributed semantic representations to various tasks in NLP. △ Less

Submitted 12 November, 2014; originally announced November 2014.

Comments: DPhil Thesis, University of Oxford, Submitted and accepted in 2014

arXiv:1405.0947 [pdf, other]

Learning Bilingual Word Representations by Marginalizing Alignments

Authors: Tomáš Kočiský, Karl Moritz Hermann, Phil Blunsom

Abstract: We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art. We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art. △ Less

Submitted 5 May, 2014; originally announced May 2014.

Comments: Proceedings of ACL 2014 (Short Papers)

arXiv:1404.7296 [pdf, other]

A Deep Architecture for Semantic Parsing

Authors: Edward Grefenstette, Phil Blunsom, Nando de Freitas, Karl Moritz Hermann

Abstract: Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of… ▽ More Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of ontology-specific queries from natural language statements and questions without the need for parsing, which makes it especially suitable to grammatically malformed or syntactically atypical text, such as tweets, as well as permitting the development of semantic parsers for resource-poor languages. △ Less

Submitted 29 April, 2014; originally announced April 2014.

Comments: In Proceedings of the Semantic Parsing Workshop at ACL 2014 (forthcoming)

arXiv:1404.4641 [pdf, other]

Multilingual Models for Compositional Distributed Semantics

Authors: Karl Moritz Hermann, Phil Blunsom

Abstract: We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or an… ▽ More We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or any syntactic information and are successfully applied to a number of diverse languages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data. △ Less

Submitted 17 April, 2014; originally announced April 2014.

Comments: Proceedings of ACL 2014 (Long papers)

arXiv:1312.6173 [pdf, other]

Multilingual Distributed Representations without Word Alignment

Authors: Karl Moritz Hermann, Phil Blunsom

Abstract: Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic represent… ▽ More Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic representations can successfully be applied to a number of monolingual applications such as sentiment analysis. At the same time, there has been some initial success in work on learning shared word-level representations across languages. We combine these two approaches by proposing a method for learning distributed representations in a multilingual setup. Our model learns to assign similar embeddings to aligned sentences and dissimilar ones to sentence which are not aligned while not requiring word alignments. We show that our representations are semantically informative and apply them to a cross-lingual document classification task where we outperform the previous state of the art. Further, by employing parallel corpora of multiple language pairs we find that our model learns representations that capture semantic relationships across languages for which no parallel data was used. △ Less

Submitted 20 March, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

Comments: To appear at ICLR 2014

arXiv:1306.2158 [pdf, other]

"Not not bad" is not "bad": A distributional account of negation

Authors: Karl Moritz Hermann, Edward Grefenstette, Phil Blunsom

Abstract: With the increasing empirical success of distributional models of compositional semantics, it is timely to consider the types of textual logic that such models are capable of capturing. In this paper, we address shortcomings in the ability of current models to capture logical operations such as negation. As a solution we propose a tripartite formulation for a continuous vector space representation… ▽ More With the increasing empirical success of distributional models of compositional semantics, it is timely to consider the types of textual logic that such models are capable of capturing. In this paper, we address shortcomings in the ability of current models to capture logical operations such as negation. As a solution we propose a tripartite formulation for a continuous vector space representation of semantics and subsequently use this representation to develop a formal compositional notion of negation within such models. △ Less

Submitted 10 June, 2013; originally announced June 2013.

Comments: 9 pages, to appear in Proceedings of the 2013 Workshop on Continuous Vector Space Models and their Compositionality

MSC Class: 68T50 ACM Class: I.2.7

arXiv:mtrl-th/9609012 [pdf, ps, other]

doi 10.1142/S0218625X97001553

Theory of Alkali Induced Reconstruction of the Cu(100) Surface

Authors: S. Quassowski, K. Hermann

Abstract: LEED experiments show that Li adsorbed at Cu(100) surfaces at room temperature induces a (2x1) missing row substrate reconstruction while adsorption at lower temperatures, T=180 K, results in an unreconstructed Cu(100)+c(2x2)--Li overlayer structure. Substrate reconstruction has not been observed for Na nor for K adsorption. In order to study the specific reconstruction behavior of the Li adsorb… ▽ More LEED experiments show that Li adsorbed at Cu(100) surfaces at room temperature induces a (2x1) missing row substrate reconstruction while adsorption at lower temperatures, T=180 K, results in an unreconstructed Cu(100)+c(2x2)--Li overlayer structure. Substrate reconstruction has not been observed for Na nor for K adsorption. In order to study the specific reconstruction behavior of the Li adsorbate ab initio DFT calculations have been performed on Cu(100)+Ad, Ad = Li, Na, K systems at coverages Theta_Ad=0.25-0.5 with and without reconstruction. The calculations show that the (2x1) MR reconstructed surface lies energetically above the ideal (1x1) surface by 0.2 eV per unit cell. However, alkali binding is stronger in the MR geometry as compared to that of the ideal surface where the increase in bond strength becomes smaller in going from Li to Na to K. As a result, the MR reconstructed and the overlayer adsorbate systems are energetically very close for Cu(100)+Li while for Na and K the overlayer geometry is always favored. △ Less

Submitted 26 September, 1996; originally announced September 1996.

Comments: 8 pages, 5 figures, submitted to Surf. Rev. Lett

Showing 1–41 of 41 results for author: Hermann, K