-
Learned feature representations are biased by complexity, learning order, position, and more
Authors:
Andrew Kyle Lampinen,
Stephanie C. Y. Chan,
Katherine Hermann
Abstract:
Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which…
▽ More
Representation learning, and interpreting learned representations, are key areas of focus in machine learning and neuroscience. Both fields generally use representations as a means to understand or improve a system's computations. In this work, however, we explore surprising dissociations between representation and computation that may pose challenges for such efforts. We create datasets in which we attempt to match the computational role that different features play, while manipulating other properties of the features or the data. We train various deep learning architectures to compute these multiple abstract features about their inputs. We find that their learned feature representations are systematically biased towards representing some features more strongly than others, depending upon extraneous properties such as feature complexity, the order in which features are learned, and the distribution of features over the inputs. For example, features that are simpler to compute or learned first tend to be represented more strongly and densely than features that are more complex or learned later, even if all features are learned equally well. We also explore how these biases are affected by architectures, optimizers, and training regimes (e.g., in transformers, features decoded earlier in the output sequence also tend to be represented more strongly). Our results help to characterize the inductive biases of gradient-based representation learning. These results also highlight a key challenge for interpretability $-$ or for comparing the representations of models and brains $-$ disentangling extraneous biases from the computationally important aspects of a system's internal representations.
△ Less
Submitted 6 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
POSTER: Towards Secure 5G Infrastructures for Production Systems
Authors:
Martin Henze,
Maximilian Ortmann,
Thomas Vogt,
Osman Ugus,
Kai Hermann,
Svenja Nohr,
Zeren Lu,
Sotiris Michaelides,
Angela Massonet,
Robert H. Schmitt
Abstract:
To meet the requirements of modern production, industrial communication increasingly shifts from wired fieldbus to wireless 5G communication. Besides tremendous benefits, this shift introduces severe novel risks, ranging from limited reliability over new security vulnerabilities to a lack of accountability. To address these risks, we present approaches to (i) prevent attacks through authentication…
▽ More
To meet the requirements of modern production, industrial communication increasingly shifts from wired fieldbus to wireless 5G communication. Besides tremendous benefits, this shift introduces severe novel risks, ranging from limited reliability over new security vulnerabilities to a lack of accountability. To address these risks, we present approaches to (i) prevent attacks through authentication and redundant communication, (ii) detect anomalies and jamming, and (iii) respond to detected attacks through device exclusion and accountability measures.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
On the Foundations of Shortcut Learning
Authors:
Katherine L. Hermann,
Hossein Mobahi,
Thomas Fel,
Michael C. Mozer
Abstract:
Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example tex…
▽ More
Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and quantify a model's shortcut bias-its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in sha** how models solve tasks.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Getting aligned on representational alignment
Authors:
Ilia Sucholutsky,
Lukas Muttenthaler,
Adrian Weller,
Andi Peng,
Andreea Bobu,
Been Kim,
Bradley C. Love,
Erin Grant,
Iris Groen,
Jascha Achterberg,
Joshua B. Tenenbaum,
Katherine M. Collins,
Katherine L. Hermann,
Kerem Oktar,
Klaus Greff,
Martin N. Hebart,
Nori Jacoby,
Qiuyi Zhang,
Raja Marjieh,
Robert Geirhos,
Sherol Chen,
Simon Kornblith,
Sunayana Rane,
Talia Konkle,
Thomas P. O'Connell
, et al. (5 additional authors not shown)
Abstract:
Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of an…
▽ More
Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and develo** information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions.
△ Less
Submitted 2 November, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines
Authors:
Alexander Jaus,
Constantin Seibold,
Kelsey Hermann,
Alexandra Walter,
Kristina Giske,
Johannes Haubold,
Jens Kleesiek,
Rainer Stiefelhagen
Abstract:
In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which exper…
▽ More
In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which experts have approved. Our proposed procedure does not rely on manual annotation during the label aggregation stage. We examine its plausibility and usefulness using three complementary checks: Human expert evaluation which approved the dataset, a Deep Learning usefulness benchmark on the BTCV dataset in which we achieve 85% dice score without using its training dataset, and medical validity checks. This evaluation procedure combines scalable automated checks with labor-intensive high-quality expert checks. Besides the dataset, we release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Nanoparticles with Cubic Symmetry: Classification of Polyhedral Shapes
Authors:
Klaus E. Hermann
Abstract:
The shape of crystalline nanoparticles (NP) can often be described by polyhedra with flat facet surfaces. Thus, structural studies of polyhedral bodies can help to describe geometric details of NPs. Here we consider compact polyhedra of cubic point symmetry Oh.as simple models. Their surfaces are described by facets with normal vectors along selected directions (a, b, c) together with their symmet…
▽ More
The shape of crystalline nanoparticles (NP) can often be described by polyhedra with flat facet surfaces. Thus, structural studies of polyhedral bodies can help to describe geometric details of NPs. Here we consider compact polyhedra of cubic point symmetry Oh.as simple models. Their surfaces are described by facets with normal vectors along selected directions (a, b, c) together with their symmetry equivalents forming a direction family {abc}. For given {abc} this yields generic polyhedra with up to 48 facets where we focus on polyhedra with facets of {abc} = {100}, {110}, and {111}, suggested for metal NPs with cubic lattices. The resulting generic polyhedra, cubic, rhombohedral, and octahedral, can serve for the description of non-generic polyhedra as intersections of corresponding generic species. Their structural properties are shown to be fully determined by only three structure parameters, facet distances R100, R110, and R111 of three types of facets. This provides a phase diagram to completely classify the corresponding Oh symmetry polyhedra. Structural properties of all polyhedra, such as shape, size, and facet geometries, are discussed in analytical and numerical detail with visualization of characteristic examples. The results may be used for respective nanoparticle simulations but also as a repository assisting the interpretation of structures of real compact nanoparticles observed by experiment.
△ Less
Submitted 21 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Improving neural network representations using human similarity judgments
Authors:
Lukas Muttenthaler,
Lorenz Linhardt,
Jonas Dippel,
Robert A. Vandermeulen,
Katherine Hermann,
Andrew K. Lampinen,
Simon Kornblith
Abstract:
Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning i…
▽ More
Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning it with human similarity judgments. We find that a naive approach leads to large changes in local representational structure that harm downstream performance. Thus, we propose a novel method that aligns the global structure of representations while preserving their local structure. This global-local transform considerably improves accuracy across a variety of few-shot learning and anomaly detection tasks. Our results indicate that human visual representations are globally organized in a way that facilitates learning from few examples, and incorporating this global structure into neural network representations improves performance on downstream tasks.
△ Less
Submitted 26 September, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Self-Refine: Iterative Refinement with Self-Feedback
Authors:
Aman Madaan,
Niket Tandon,
Prakhar Gupta,
Skyler Hallinan,
Luyu Gao,
Sarah Wiegreffe,
Uri Alon,
Nouha Dziri,
Shrimai Prabhumoye,
Yiming Yang,
Shashank Gupta,
Bodhisattwa Prasad Majumder,
Katherine Hermann,
Sean Welleck,
Amir Yazdanbakhsh,
Peter Clark
Abstract:
Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it…
▽ More
Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.
△ Less
Submitted 25 May, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Compact Polyhedra of Cubic Symmetry: Geometrical Analysis and Classification
Authors:
KLaus E. Hermann
Abstract:
Compact polyhedra of cubic point symmetry Oh, exhibit surfaces of planar sections (facets) characterized by normal vector families {abc} with up to 48 members each, compatible with Oh symmetry. We focus first on polyhedra confined by facets with normal vectors of one family, {100}, {110}, and {111}, separately. This yields generic polyhedra which serve for the definition of general compact polyhed…
▽ More
Compact polyhedra of cubic point symmetry Oh, exhibit surfaces of planar sections (facets) characterized by normal vector families {abc} with up to 48 members each, compatible with Oh symmetry. We focus first on polyhedra confined by facets with normal vectors of one family, {100}, {110}, and {111}, separately. This yields generic polyhedra which serve for the definition of general compact polyhedra as intersections of the three generic species. Their structural properties, such as shape, size, volume, and surface facets, are found to be described by only three polyhedral structure parameters A, B, C. In addition, we examine compact polyhedra exhibiting facets defined by normal vectors of only one general {abc} family resulting in up to 48 facets. These polyhedra can be described by four polyhedral structure parameters Q, a, b, c. Geometrical properties of all polyhedra are discussed in analytical detail and substantiated by visualization of characteristic examples. Corresponding relationships can also be used to classify shapes and estimate sizes of real compact metal nanoparticles observed e.g. in electron microscopy experiments.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
The Science Performance of JWST as Characterized in Commissioning
Authors:
Jane Rigby,
Marshall Perrin,
Michael McElwain,
Randy Kimble,
Scott Friedman,
Matt Lallo,
René Doyon,
Lee Feinberg,
Pierre Ferruit,
Alistair Glasse,
Marcia Rieke,
George Rieke,
Gillian Wright,
Chris Willott,
Knicole Colon,
Stefanie Milam,
Susan Neff,
Christopher Stark,
Jeff Valenti,
Jim Abell,
Faith Abney,
Yasin Abul-Huda,
D. Scott Acton,
Evan Adams,
David Adler
, et al. (601 additional authors not shown)
Abstract:
This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries f…
▽ More
This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries for which it was built. Moreover, almost across the board, the science performance of JWST is better than expected; in most cases, JWST will go deeper faster than expected. The telescope and instrument suite have demonstrated the sensitivity, stability, image quality, and spectral range that are necessary to transform our understanding of the cosmos through observations spanning from near-earth asteroids to the most distant galaxies.
△ Less
Submitted 10 April, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Polyhedral Metal Nanoparticles with Cubic Lattice: Theory of Structural Properties
Authors:
Klaus E. Hermann
Abstract:
We examine the structure of compact metal nanoparticles (NPs) forming polyhedral sections of face centered (fcc) and body centered (bcc) cubic lattices, which are confined by facets characterized by highly dense {100}, {110}, and {111} monolayers. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for…
▽ More
We examine the structure of compact metal nanoparticles (NPs) forming polyhedral sections of face centered (fcc) and body centered (bcc) cubic lattices, which are confined by facets characterized by highly dense {100}, {110}, and {111} monolayers. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for the definition of general compact polyhedral cubic NPs. Their structural properties, such as shape, size, and surface facets, can be described by only three integer valued polyhedral NP parameters N, M, K. Corresponding analytical details are discussed with visualization of characteristic examples. While the overall NP shapes are quite similar between the different cubic lattice types, structural fine details differ. In particular, monolayer planes of adjacent NP facets can join at corners and edges which are not occupied by atoms of the ideal lattice. This gives rise to microfacets and narrow facet strips depending on the lattice type. The discussion illustrates the complexity of seemingly simple nanoparticles in a quantitative account. The geometric relationships of the model particles can also be used to classify shapes and estimate sizes of real compact metal nanoparticles observed by experiment.
△ Less
Submitted 18 March, 2022; v1 submitted 18 January, 2022;
originally announced January 2022.
-
Structure and Morphology of Crystalline Metal Nanoparticles: Polyhedral Cubic Particles
Authors:
Klaus E. Hermann
Abstract:
We examine nanoparticles (NPs) forming polyhedral sections of the ideal cubic lattice, simple (sc), body centered (bcc), and face centered (fcc) cubic, which are confined by facets characterized by densest and second densest {h k l} monolayers of the lattice. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NP…
▽ More
We examine nanoparticles (NPs) forming polyhedral sections of the ideal cubic lattice, simple (sc), body centered (bcc), and face centered (fcc) cubic, which are confined by facets characterized by densest and second densest {h k l} monolayers of the lattice. Together with the constraint that the NPs exhibit the same point symmetry as the ideal cubic lattice, i.e. Oh, different types of generic NPs serve for the definition of general polyhedral cubic NPs. Their structural properties, such as shape, size, and surface facets, are discussed in analytical and numerical detail with visualization of characteristic examples. The geometric relationships of the model particles expressed by corresponding formulas and numerical tables can be used to estimate shapes and sizes of real compact metal nanoparticles observed by experiment.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Limit theorems and soft edge of freezing random matrix models via dual orthogonal polynomials
Authors:
Sergio Andraus,
Kilian Hermann,
Michael Voit
Abstract:
$N$-dimensional Bessel and Jacobi processes describe interacting particle systems with $N$ particles and are related to $β$-Hermite, $β$-Laguerre, and $β$-Jacobi ensembles. For fixed $N$ there exist associated weak limit theorems (WLTs) in the freezing regime $β\to\infty$ in the $β$-Hermite and $β$-Laguerre case by Dumitriu and Edelman (2005) with explicit formulas for the covariance matrices $Σ_N…
▽ More
$N$-dimensional Bessel and Jacobi processes describe interacting particle systems with $N$ particles and are related to $β$-Hermite, $β$-Laguerre, and $β$-Jacobi ensembles. For fixed $N$ there exist associated weak limit theorems (WLTs) in the freezing regime $β\to\infty$ in the $β$-Hermite and $β$-Laguerre case by Dumitriu and Edelman (2005) with explicit formulas for the covariance matrices $Σ_N$ in terms of the zeros of associated orthogonal polynomials. Recently, the authors derived these WLTs in a different way and computed $Σ_N^{-1}$ with formulas for the eigenvalues and eigenvectors of $Σ_N^{-1}$ and thus of $Σ_N$. In the present paper we use these data and the theory of finite dual orthogonal polynomials of de Boor and Saff to derive formulas for $Σ_N$ from $Σ_N^{-1}$ where, for $β$-Hermite and $β$-Laguerre ensembles, our formulas are simpler than those of Dumitriu and Edelman. We use these polynomials to derive asymptotic results for the soft edge in the freezing regime for $N\to\infty$ in terms of the Airy function. For $β$-Hermite ensembles, our limit expressions are different from those of Dumitriu and Edelman.
△ Less
Submitted 29 June, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.
-
What shapes feature representations? Exploring datasets, architectures, and training
Authors:
Katherine L. Hermann,
Andrew K. Lampinen
Abstract:
In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versat…
▽ More
In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but more difficult one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have greater representational similarity to an untrained model than to models trained on a different task. Our results highlight the complex processes that determine which features a model represents.
△ Less
Submitted 22 October, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Authors:
Katherine L. Hermann,
Ting Chen,
Simon Kornblith
Abstract:
Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on Im…
▽ More
Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see.
△ Less
Submitted 3 November, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Limit theorems for Jacobi ensembles with large parameters
Authors:
Kilian Hermann,
Michael Voit
Abstract:
Consider Jacobi random matrix ensembles with the distributions $$c_{k_1,k_2,k_3}\prod_{1\leq i< j \leq N}\left(x_j-x_i\right)^{k_3}\prod_{i=1}^N \left(1-x_i\right)^{\frac{k_1+k_2}{2}-\frac{1}{2}}\left(1+x_i\right)^{\frac{k_2}{2}-\frac{1}{2}} dx$$ of the eigenvalues on the alcoves $$A:=\{x\in\mathbb R^N| \> -1\leq x_1\le ...\le x_N\leq 1\}.$$ For $(k_1,k_2,k_3)=κ\cdot (a,b,1)$ with $a,b>0$ fixed, w…
▽ More
Consider Jacobi random matrix ensembles with the distributions $$c_{k_1,k_2,k_3}\prod_{1\leq i< j \leq N}\left(x_j-x_i\right)^{k_3}\prod_{i=1}^N \left(1-x_i\right)^{\frac{k_1+k_2}{2}-\frac{1}{2}}\left(1+x_i\right)^{\frac{k_2}{2}-\frac{1}{2}} dx$$ of the eigenvalues on the alcoves $$A:=\{x\in\mathbb R^N| \> -1\leq x_1\le ...\le x_N\leq 1\}.$$ For $(k_1,k_2,k_3)=κ\cdot (a,b,1)$ with $a,b>0$ fixed, we derive a central limit theorem for the distributions above for $κ\to\infty$. The drift and the inverse of the limit covariance matrix are expressed in terms of the zeros of classical Jacobi polynomials. We also rewrite the CLT in trigonometric form and determine the eigenvalues and eigenvectors of the limit covariance matrices. These results are related to corresponding limits for $β$-Hermite and $β$-Laguerre ensembles for $β\to\infty$ by Dumitriu and Edelman and by Voit.
△ Less
Submitted 15 October, 2020; v1 submitted 20 May, 2019;
originally announced May 2019.
-
The StreetLearn Environment and Dataset
Authors:
Piotr Mirowski,
Andras Banki-Horvath,
Keith Anderson,
Denis Teplyashin,
Karl Moritz Hermann,
Mateusz Malinowski,
Matthew Koichi Grimes,
Karen Simonyan,
Koray Kavukcuoglu,
Andrew Zisserman,
Raia Hadsell
Abstract:
Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for dec…
▽ More
Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
Learning To Follow Directions in Street View
Authors:
Karl Moritz Hermann,
Mateusz Malinowski,
Piotr Mirowski,
Andras Banki-Horvath,
Keith Anderson,
Raia Hadsell
Abstract:
Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data.…
▽ More
Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents.
△ Less
Submitted 21 November, 2019; v1 submitted 1 March, 2019;
originally announced March 2019.
-
Encoding Spatial Relations from Natural Language
Authors:
Tiago Ramalho,
Tomáš Kočiský,
Frederic Besse,
S. M. Ali Eslami,
Gábor Melis,
Fabio Viola,
Phil Blunsom,
Karl Moritz Hermann
Abstract:
Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.…
▽ More
Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system.
△ Less
Submitted 5 July, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Hyperbolic Attention Networks
Authors:
Caglar Gulcehre,
Misha Denil,
Mateusz Malinowski,
Ali Razavi,
Razvan Pascanu,
Karl Moritz Hermann,
Peter Battaglia,
Victor Bapst,
David Raposo,
Adam Santoro,
Nando de Freitas
Abstract:
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks…
▽ More
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while kee** the neural representations compact.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Pushing the bounds of dropout
Authors:
Gábor Melis,
Charles Blundell,
Tomáš Kočiský,
Karl Moritz Hermann,
Chris Dyer,
Phil Blunsom
Abstract:
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that…
▽ More
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.
△ Less
Submitted 27 September, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input
Authors:
Angeliki Lazaridou,
Karl Moritz Hermann,
Karl Tuyls,
Stephen Clark
Abstract:
The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, i…
▽ More
The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by develo** agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Learning to Navigate in Cities Without a Map
Authors:
Piotr Mirowski,
Matthew Koichi Grimes,
Mateusz Malinowski,
Karl Moritz Hermann,
Keith Anderson,
Denis Teplyashin,
Karen Simonyan,
Koray Kavukcuoglu,
Andrew Zisserman,
Raia Hadsell
Abstract:
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support cont…
▽ More
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
△ Less
Submitted 9 January, 2019; v1 submitted 31 March, 2018;
originally announced April 2018.
-
Interlocking mechanism between molecular gears attached to surfaces
Authors:
Rundong Zhao,
Yan-Ling Zhao,
Fei Qi,
Klaus Hermann,
Rui-Qin Zhang,
Michel A. Van Hove
Abstract:
While molecular machines play an increasingly significant role in nanoscience research and applications, there remains a shortage of investigations and understanding of the molecular gear (cogwheel), which is an indispensable and fundamental component to drive a larger correlated molecular machine system. Employing ab initio calculations, we investigate model systems consisting of molecules adsorb…
▽ More
While molecular machines play an increasingly significant role in nanoscience research and applications, there remains a shortage of investigations and understanding of the molecular gear (cogwheel), which is an indispensable and fundamental component to drive a larger correlated molecular machine system. Employing ab initio calculations, we investigate model systems consisting of molecules adsorbed on metal or graphene surfaces, ranging from very simple triple-arm gears such as PF3 and NH3 to larger multi-arm gears based on carbon rings. We explore in detail the transmission of slow rotational motion from one gear to the next by these relatively simple molecules, so as to isolate and reveal the mechanisms of the relevant intermolecular interactions. Several characteristics of molecular gears are discussed, in particular the flexibility of the arms and the slip** and skip** between interlocking arms of adjacent gears, which differ from familiar macroscopic rigid gears. The underlying theoretical concepts suggest strongly that other analogous structures may also exhibit similar behavior which may inspire future exploration in designing large correlated molecular machines.
△ Less
Submitted 6 February, 2018;
originally announced February 2018.
-
The NarrativeQA Reading Comprehension Challenge
Authors:
Tomáš Kočiský,
Jonathan Schwarz,
Phil Blunsom,
Chris Dyer,
Karl Moritz Hermann,
Gábor Melis,
Edward Grefenstette
Abstract:
Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti…
▽ More
Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Understanding Early Word Learning in Situated Artificial Agents
Authors:
Felix Hill,
Stephen Clark,
Karl Moritz Hermann,
Phil Blunsom
Abstract:
Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with…
▽ More
Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with no meaningful prior knowledge overcome these obstacles, researchers currently lack a clear understanding of how they do so, a problem that we attempt to address in this paper. For maximum control and generality, we focus on a simple neural network-based language learning agent, trained via policy-gradient methods, which can interpret single-word instructions in a simulated 3D world. Whilst the goal is not to explicitly model infant word learning, we take inspiration from experimental paradigms in developmental psychology and apply some of these to the artificial agent, exploring the conditions under which established human biases and learning effects emerge. We further propose a novel method for visualising semantic representations in the agent.
△ Less
Submitted 1 October, 2019; v1 submitted 26 October, 2017;
originally announced October 2017.
-
Grounded Language Learning in a Simulated 3D World
Authors:
Karl Moritz Hermann,
Felix Hill,
Simon Green,
Fumin Wang,
Ryan Faulkner,
Hubert Soyer,
David Szepesvari,
Wojciech Marian Czarnecki,
Max Jaderberg,
Denis Teplyashin,
Marcus Wainwright,
Chris Apps,
Demis Hassabis,
Phil Blunsom
Abstract:
We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to…
▽ More
We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions. The agent's comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Moreover, the speed with which this agent learns new words increases as its semantic knowledge grows. This facility for generalising and bootstrap** semantic knowledge indicates the potential of the present approach for reconciling ambiguous natural language with the complexity of the physical world.
△ Less
Submitted 26 June, 2017; v1 submitted 20 June, 2017;
originally announced June 2017.
-
Semantic Parsing with Semi-Supervised Sequential Autoencoders
Authors:
Tomáš Kočiský,
Gábor Melis,
Edward Grefenstette,
Chris Dyer,
Wang Ling,
Phil Blunsom,
Karl Moritz Hermann
Abstract:
We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener…
▽ More
We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.
-
Intramolecular Torque, an Indicator of the Internal Rotation Direction of Rotor Molecules and Similar Systems
Authors:
Rui-Qin Zhang,
Yan-Ling Zhao,
Fei Qi,
Klaus Hermann,
Michel A. Van Hove
Abstract:
Torque is ubiquitous in many molecular systems, including collisions, chemical reactions, vibrations, electronic excitations and especially rotor molecules. We present a straightforward theoretical method based on forces acting on atoms and obtained from atomistic quantum mechanics calculations, to quickly and qualitatively determine whether a molecule or sub-unit thereof has a tendency to rotatio…
▽ More
Torque is ubiquitous in many molecular systems, including collisions, chemical reactions, vibrations, electronic excitations and especially rotor molecules. We present a straightforward theoretical method based on forces acting on atoms and obtained from atomistic quantum mechanics calculations, to quickly and qualitatively determine whether a molecule or sub-unit thereof has a tendency to rotation and, if so, around which axis and in which sense: clockwise or counterclockwise. The method also indicates which atoms, if any, are predominant in causing the rotation. Our computational approach can in general efficiently provide insights into the rotational ability of many molecules and help to theoretically screen or modify them in advance of experiments or before analyzing their rotational behavior in more detail with more extensive computations guided by the results from the torque approach. As an example, we demonstrate the effectiveness of the approach using a specific light-driven molecular rotary motor which was successfully synthesized and analyzed in prior experiments and simulations.
△ Less
Submitted 3 July, 2016;
originally announced July 2016.
-
Latent Predictor Networks for Code Generation
Authors:
Wang Ling,
Edward Grefenstette,
Karl Moritz Hermann,
Tomáš Kočiský,
Andrew Senior,
Fumin Wang,
Phil Blunsom
Abstract:
Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be…
▽ More
Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training. Using this framework, we address the problem of generating programming code from a mixed natural language and structured specification. We create two new data sets for this paradigm derived from the collectible trading card games Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, we demonstrate that marginalising multiple predictors allows our model to outperform strong benchmarks.
△ Less
Submitted 8 June, 2016; v1 submitted 22 March, 2016;
originally announced March 2016.
-
Reasoning about Entailment with Neural Attention
Authors:
Tim Rocktäschel,
Edward Grefenstette,
Karl Moritz Hermann,
Tomáš Kočiský,
Phil Blunsom
Abstract:
While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entai…
▽ More
While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.
△ Less
Submitted 1 March, 2016; v1 submitted 22 September, 2015;
originally announced September 2015.
-
Teaching Machines to Read and Comprehend
Authors:
Karl Moritz Hermann,
Tomáš Kočiský,
Edward Grefenstette,
Lasse Espeholt,
Will Kay,
Mustafa Suleyman,
Phil Blunsom
Abstract:
Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides la…
▽ More
Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.
△ Less
Submitted 19 November, 2015; v1 submitted 10 June, 2015;
originally announced June 2015.
-
Learning to Transduce with Unbounded Memory
Authors:
Edward Grefenstette,
Karl Moritz Hermann,
Mustafa Suleyman,
Phil Blunsom
Abstract:
Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networ…
▽ More
Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networks that implement continuously differentiable analogues of traditional data structures such as Stacks, Queues, and DeQues. We show that these architectures exhibit superior generalisation performance to Deep RNNs and are often able to learn the underlying generating algorithms in our transduction experiments.
△ Less
Submitted 3 November, 2015; v1 submitted 8 June, 2015;
originally announced June 2015.
-
Deep Learning for Answer Sentence Selection
Authors:
Lei Yu,
Karl Moritz Hermann,
Phil Blunsom,
Stephen Pulman
Abstract:
Answer sentence selection is the task of identifying sentences that contain the answer to a given question. This is an important problem in its own right as well as in the larger context of open domain question answering. We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding. This…
▽ More
Answer sentence selection is the task of identifying sentences that contain the answer to a given question. This is an important problem in its own right as well as in the larger context of open domain question answering. We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding. This contrasts prior work on this task, which typically relies on classifiers with large numbers of hand-crafted syntactic and semantic features and various external resources. Our approach does not require any feature engineering nor does it involve specialist linguistic data, making this model easily applicable to a wide range of domains and languages. Experimental results on a standard benchmark dataset from TREC demonstrate that---despite its simplicity---our model matches state of the art performance on the answer sentence selection task.
△ Less
Submitted 4 December, 2014;
originally announced December 2014.
-
Distributed Representations for Compositional Semantics
Authors:
Karl Moritz Hermann
Abstract:
The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches --- meaning distributed representations that exploit co-occurrence statistics of large corpora --- have proved popular and successful across a number of tas…
▽ More
The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches --- meaning distributed representations that exploit co-occurrence statistics of large corpora --- have proved popular and successful across a number of tasks. However, natural language usually comes in structures beyond the word level, with meaning arising not only from the individual words but also the structure they are contained in at the phrasal or sentential level. Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is an equally fundamental task of NLP.
This dissertation explores methods for learning distributed semantic representations and models for composing these into representations for larger linguistic units. Our underlying hypothesis is that neural models are a suitable vehicle for learning semantically rich representations and that such representations in turn are suitable vehicles for solving important tasks in natural language processing. The contribution of this thesis is a thorough evaluation of our hypothesis, as part of which we introduce several new approaches to representation learning and compositional semantics, as well as multiple state-of-the-art models which apply distributed semantic representations to various tasks in NLP.
△ Less
Submitted 12 November, 2014;
originally announced November 2014.
-
Learning Bilingual Word Representations by Marginalizing Alignments
Authors:
Tomáš Kočiský,
Karl Moritz Hermann,
Phil Blunsom
Abstract:
We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art.
We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art.
△ Less
Submitted 5 May, 2014;
originally announced May 2014.
-
A Deep Architecture for Semantic Parsing
Authors:
Edward Grefenstette,
Phil Blunsom,
Nando de Freitas,
Karl Moritz Hermann
Abstract:
Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of…
▽ More
Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of ontology-specific queries from natural language statements and questions without the need for parsing, which makes it especially suitable to grammatically malformed or syntactically atypical text, such as tweets, as well as permitting the development of semantic parsers for resource-poor languages.
△ Less
Submitted 29 April, 2014;
originally announced April 2014.
-
Multilingual Models for Compositional Distributed Semantics
Authors:
Karl Moritz Hermann,
Phil Blunsom
Abstract:
We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or an…
▽ More
We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or any syntactic information and are successfully applied to a number of diverse languages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data.
△ Less
Submitted 17 April, 2014;
originally announced April 2014.
-
Multilingual Distributed Representations without Word Alignment
Authors:
Karl Moritz Hermann,
Phil Blunsom
Abstract:
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic represent…
▽ More
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic representations can successfully be applied to a number of monolingual applications such as sentiment analysis. At the same time, there has been some initial success in work on learning shared word-level representations across languages. We combine these two approaches by proposing a method for learning distributed representations in a multilingual setup. Our model learns to assign similar embeddings to aligned sentences and dissimilar ones to sentence which are not aligned while not requiring word alignments. We show that our representations are semantically informative and apply them to a cross-lingual document classification task where we outperform the previous state of the art. Further, by employing parallel corpora of multiple language pairs we find that our model learns representations that capture semantic relationships across languages for which no parallel data was used.
△ Less
Submitted 20 March, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
"Not not bad" is not "bad": A distributional account of negation
Authors:
Karl Moritz Hermann,
Edward Grefenstette,
Phil Blunsom
Abstract:
With the increasing empirical success of distributional models of compositional semantics, it is timely to consider the types of textual logic that such models are capable of capturing. In this paper, we address shortcomings in the ability of current models to capture logical operations such as negation. As a solution we propose a tripartite formulation for a continuous vector space representation…
▽ More
With the increasing empirical success of distributional models of compositional semantics, it is timely to consider the types of textual logic that such models are capable of capturing. In this paper, we address shortcomings in the ability of current models to capture logical operations such as negation. As a solution we propose a tripartite formulation for a continuous vector space representation of semantics and subsequently use this representation to develop a formal compositional notion of negation within such models.
△ Less
Submitted 10 June, 2013;
originally announced June 2013.
-
Theory of Alkali Induced Reconstruction of the Cu(100) Surface
Authors:
S. Quassowski,
K. Hermann
Abstract:
LEED experiments show that Li adsorbed at Cu(100) surfaces at room temperature induces a (2x1) missing row substrate reconstruction while adsorption at lower temperatures, T=180 K, results in an unreconstructed Cu(100)+c(2x2)--Li overlayer structure. Substrate reconstruction has not been observed for Na nor for K adsorption. In order to study the specific reconstruction behavior of the Li adsorb…
▽ More
LEED experiments show that Li adsorbed at Cu(100) surfaces at room temperature induces a (2x1) missing row substrate reconstruction while adsorption at lower temperatures, T=180 K, results in an unreconstructed Cu(100)+c(2x2)--Li overlayer structure. Substrate reconstruction has not been observed for Na nor for K adsorption. In order to study the specific reconstruction behavior of the Li adsorbate ab initio DFT calculations have been performed on Cu(100)+Ad, Ad = Li, Na, K systems at coverages Theta_Ad=0.25-0.5 with and without reconstruction. The calculations show that the (2x1) MR reconstructed surface lies energetically above the ideal (1x1) surface by 0.2 eV per unit cell. However, alkali binding is stronger in the MR geometry as compared to that of the ideal surface where the increase in bond strength becomes smaller in going from Li to Na to K. As a result, the MR reconstructed and the overlayer adsorbate systems are energetically very close for Cu(100)+Li while for Na and K the overlayer geometry is always favored.
△ Less
Submitted 26 September, 1996;
originally announced September 1996.