-
Controlling Chaos Using Edge Computing Hardware
Authors:
Robert M. Kent,
Wendson A. S. Barbosa,
Daniel J. Gauthier
Abstract:
Machine learning provides a data-driven approach for creating a digital twin of a system - a digital model used to predict the system behavior. Having an accurate digital twin can drive many applications, such as controlling autonomous systems. Often the size, weight, and power consumption of the digital twin or related controller must be minimized, ideally realized on embedded computing hardware…
▽ More
Machine learning provides a data-driven approach for creating a digital twin of a system - a digital model used to predict the system behavior. Having an accurate digital twin can drive many applications, such as controlling autonomous systems. Often the size, weight, and power consumption of the digital twin or related controller must be minimized, ideally realized on embedded computing hardware that can operate without a cloud-computing connection. Here, we show that a nonlinear controller based on next-generation reservoir computing can tackle a difficult control problem: controlling a chaotic system to an arbitrary time-dependent state. The model is accurate, yet it is small enough to be evaluated on a field-programmable gate array typically found in embedded devices. Furthermore, the model only requires 25.0 $\pm$ 7.0 nJ per evaluation, well below other algorithms, even without systematic power optimization. Our work represents the first step in deploying efficient machine learning algorithms to the computing "edge."
△ Less
Submitted 8 May, 2024;
originally announced June 2024.
-
Small jet engine reservoir computing digital twin
Authors:
C. J. Wright,
N. Biederman,
B. Gyovai,
D. J. Gauthier,
J. P. Wilhelm
Abstract:
Machine learning was applied to create a digital twin of a numerical simulation of a single-scroll jet engine. A similar model based on the insights gained from this numerical study was used to create a digital twin of a JetCat P100-RX jet engine using only experimental data. Engine data was collected from a custom sensor system measuring parameters such as thrust, exhaust gas temperature, shaft s…
▽ More
Machine learning was applied to create a digital twin of a numerical simulation of a single-scroll jet engine. A similar model based on the insights gained from this numerical study was used to create a digital twin of a JetCat P100-RX jet engine using only experimental data. Engine data was collected from a custom sensor system measuring parameters such as thrust, exhaust gas temperature, shaft speed, weather conditions, etc. Data was gathered while the engine was placed under different test conditions by controlling shaft speed. The machine learning model was generated (trained) using a next-generation reservoir computer, a best-in-class machine learning algorithm for dynamical systems. Once the model was trained, it was used to predict behavior it had never seen with an accuracy of better than 1.8% when compared to the testing data.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Controlling Chaotic Maps using Next-Generation Reservoir Computing
Authors:
Robert M. Kent,
Wendson A. S. Barbosa,
Daniel J. Gauthier
Abstract:
In this work, we combine nonlinear system control techniques with next-generation reservoir computing, a best-in-class machine learning approach for predicting the behavior of dynamical systems. We demonstrate the performance of the controller in a series of control tasks for the chaotic Hénon map, including controlling the system between unstable fixed-points, stabilizing the system to higher ord…
▽ More
In this work, we combine nonlinear system control techniques with next-generation reservoir computing, a best-in-class machine learning approach for predicting the behavior of dynamical systems. We demonstrate the performance of the controller in a series of control tasks for the chaotic Hénon map, including controlling the system between unstable fixed-points, stabilizing the system to higher order periodic orbits, and to an arbitrary desired state. We show that our controller succeeds in these tasks, requires only 10 data points for training, can control the system to a desired trajectory in a single iteration, and is robust to noise and modeling error.
△ Less
Submitted 2 February, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration
Authors:
Kinan Martin,
Jon Gauthier,
Canaan Breiss,
Roger Levy
Abstract:
Textless self-supervised speech models have grown in capabilities in recent years, but the nature of the linguistic information they encode has not yet been thoroughly examined. We evaluate the extent to which these models' learned representations align with basic representational distinctions made by humans, focusing on a set of phonetic (low-level) and phonemic (more abstract) contrasts instanti…
▽ More
Textless self-supervised speech models have grown in capabilities in recent years, but the nature of the linguistic information they encode has not yet been thoroughly examined. We evaluate the extent to which these models' learned representations align with basic representational distinctions made by humans, focusing on a set of phonetic (low-level) and phonemic (more abstract) contrasts instantiated in word-initial stops. We find that robust representations of both phonetic and phonemic distinctions emerge in early layers of these models' architectures, and are preserved in the principal components of deeper layer representations. Our analyses suggest two sources for this success: some can only be explained by the optimization of the models on speech data, while some can be attributed to these models' high-dimensional architectures. Our findings show that speech-trained HuBERT derives a low-noise and low-dimensional subspace corresponding to abstract phonological distinctions.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
The neural dynamics of auditory word recognition and integration
Authors:
Jon Gauthier,
Roger Levy
Abstract:
Listeners recognize and integrate words in rapid and noisy everyday speech by combining expectations about upcoming content with incremental sensory evidence. We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory. We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story, revealing…
▽ More
Listeners recognize and integrate words in rapid and noisy everyday speech by combining expectations about upcoming content with incremental sensory evidence. We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory. We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story, revealing both the dynamics of the online auditory word recognition process and the neural correlates of the recognition and integration of words.
The model reveals distinct neural processing of words depending on whether or not they can be quickly recognized. While all words trigger a neural response characteristic of probabilistic integration -- voltage modulations predicted by a word's surprisal in context -- these modulations are amplified for words which require more than roughly 150 ms of input to be recognized. We observe no difference in the latency of these neural responses according to words' recognition times. Our results are consistent with a two-part model of speech comprehension, combining an eager and rapid process of word recognition with a temporally independent process of word integration. However, we also developed alternative models of the scalp EEG signal not incorporating word recognition dynamics which showed similar performance improvements. We discuss potential future modeling steps which may help to separate these hypotheses.
△ Less
Submitted 5 December, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Language model acceptability judgements are not always robust to context
Authors:
Koustuv Sinha,
Jon Gauthier,
Aaron Mueller,
Kanishka Misra,
Keren Fuentes,
Roger Levy,
Adina Williams
Abstract:
Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly con…
▽ More
Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly contextualized by the surrounding corpus. This mismatch raises an important question: how robust are models' syntactic judgements in different contexts? In this paper, we investigate the stability of language models' performance on targeted syntactic evaluations as we vary properties of the input context: the length of the context, the types of syntactic phenomena it contains, and whether or not there are violations of grammaticality. We find that model judgements are generally robust when placed in randomly sampled linguistic contexts. However, they are substantially unstable for contexts containing syntactic structures matching those in the critical test content. Among all tested models (GPT-2 and five variants of OPT), we significantly improve models' judgements by providing contexts with matching syntactic structures, and conversely significantly worsen them using unacceptable contexts with matching but violated syntactic structures. This effect is amplified by the length of the context, except for unrelated inputs. We show that these changes in model performance are not explainable by simple features matching the context and the test inputs, such as lexical overlap and dependency overlap. This sensitivity to highly specific syntactic features of the context can only be explained by the models' implicit in-context learning abilities.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Learning unseen coexisting attractors
Authors:
Daniel J. Gauthier,
Ingo Fischer,
André Röhm
Abstract:
Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performin…
▽ More
Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performing traditional reservoir computer, thus simplifying training even further. Here, we study a particularly challenging problem of learning a dynamical system that has both disparate time scales and multiple co-existing dynamical states (attractors). We compare the next-generation and traditional reservoir computer using metrics quantifying the geometry of the ground-truth and forecasted attractors. For the studied four-dimensional system, the next-generation reservoir computing approach uses $\sim 1.7 \times$ less training data, requires $10^3 \times$ shorter `warm up' time, has fewer metaparameters, and has an $\sim 100\times$ higher accuracy in predicting the co-existing attractor characteristics in comparison to a traditional reservoir computer. Furthermore, we demonstrate that it predicts the basin of attraction with high accuracy. This work lends further support to the superior learning ability of this new machine learning algorithm for dynamical systems.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Mathematical Model of Strong Physically Unclonable Functions Based on Hybrid Boolean Networks
Authors:
Noeloikeau Charlot,
Daniel J. Gauthier,
Daniel Canaday,
Andrew Pomerance
Abstract:
We introduce a mathematical framework for simulating Hybrid Boolean Network (HBN) Physically Unclonable Functions (PUFs, HBN-PUFs). We verify that the model is able to reproduce the experimentally observed PUF statistics for uniqueness $μ_{inter}$ and reliability $μ_{intra}$ obtained from experiments of HBN-PUFs on Cyclone V FPGAs. Our results suggest that the HBN-PUF is a true `strong' PUF in the…
▽ More
We introduce a mathematical framework for simulating Hybrid Boolean Network (HBN) Physically Unclonable Functions (PUFs, HBN-PUFs). We verify that the model is able to reproduce the experimentally observed PUF statistics for uniqueness $μ_{inter}$ and reliability $μ_{intra}$ obtained from experiments of HBN-PUFs on Cyclone V FPGAs. Our results suggest that the HBN-PUF is a true `strong' PUF in the sense that its security properties depend exponentially on both the manufacturing variation and the challenge-response space. Our Python simulation methods are open-source and available at https://github.com/Noeloikeau/networkm.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Learning Spatiotemporal Chaos Using Next-Generation Reservoir Computing
Authors:
Wendson A. S. Barbosa,
Daniel J. Gauthier
Abstract:
Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a computational time $10^3-10^4$ times faster for traini…
▽ More
Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a computational time $10^3-10^4$ times faster for training process and training data set $\sim 10^2$ times smaller than other machine learning algorithms. We also take advantage of the translational symmetry of the model to further reduce the computational cost and training data, each by a factor of $\sim$10.
△ Less
Submitted 30 August, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Sensitivity of a Chaotic Logic Gate
Authors:
Noeloikeau Charlot,
Daniel J. Gauthier
Abstract:
Chaotic logic gates or `chaogates' are a promising mixed-signal approach to designing universal computers. However, chaotic systems are exponentially sensitive to small perturbations, and the effects of noise can cause chaotic computers to fail. Here, we examine the sensitivity of a simulated chaogate to noise and other parameter variations (such as differences in supply voltage). We find that the…
▽ More
Chaotic logic gates or `chaogates' are a promising mixed-signal approach to designing universal computers. However, chaotic systems are exponentially sensitive to small perturbations, and the effects of noise can cause chaotic computers to fail. Here, we examine the sensitivity of a simulated chaogate to noise and other parameter variations (such as differences in supply voltage). We find that the regions in parameter space corresponding to chaotic dynamics coincide with the regions of maximum error in the computation. Further, this error grows exponentially within 4-10 iterations of the chaotic map. As such, we discuss the fundamental limitations of chaotic computing, and suggest potential improvements. Our Python simulation methods are open-source and available at https://github.com/Noeloikeau/chaogate.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
High-Resolution Waveform Capture Device on a Cyclone-V FPGA
Authors:
Noeloikeau Charlot,
Daniel J. Gauthier,
Andrew Pomerance
Abstract:
We introduce the waveform capture device (WCD), a flexible measurement system capable of recording complex digital signals on trillionth-of-a-second (ps) time scales. The WCD is implemented via modular code on an off-the-shelf field-programmable gate-array (FPGA, Intel/Altera Cyclone V), and incorporates both time-to-digital converter (TDC) and digital storage oscilloscope (DSO) functionality. The…
▽ More
We introduce the waveform capture device (WCD), a flexible measurement system capable of recording complex digital signals on trillionth-of-a-second (ps) time scales. The WCD is implemented via modular code on an off-the-shelf field-programmable gate-array (FPGA, Intel/Altera Cyclone V), and incorporates both time-to-digital converter (TDC) and digital storage oscilloscope (DSO) functionality. The device captures a waveform by taking snapshots of a signal as it propagates down an ultra-fast transmission line known as a carry chain (CC). It is calibrated via a novel dynamic phase-shifting (DPS) method that requires substantially less data and resources than the state-of-the-art. Using DPS, we find the measurement resolution - or mean propagation delay from one CC element to the next - to be 4.91 +/- 0.04 ps (4.54 +/- 0.02 ps) for a pulse of logic high (low). Similarly, we find the single-shot precision - or mean error on the timing of the waveform - to be 29.52 ps (27.14 ps) for pulses of logic high (low). We verify these findings by reproducing commercial oscilloscope measurements of asynchronous ring-oscillators on FPGAs, finding the mean pulse width to be 0.240 +/- 0.002 ns per inverter gate. Finally, we present a careful analysis of design constraints, introduce a novel error correction algorithm, and sketch a simple extension to the analog domain. We also provide the Verilog code instantiating the our design on an FPGA in an Appendix, and make our methods available as an open-source Python library at https://github.com/Noeloikeau/fpyga.
△ Less
Submitted 16 August, 2021;
originally announced September 2021.
-
Model-free inference of unseen attractors: Reconstructing phase space features from a single noisy trajectory using reservoir computing
Authors:
André Röhm,
Daniel J. Gauthier,
Ingo Fischer
Abstract:
Reservoir computers are powerful tools for chaotic time series prediction. They can be trained to approximate phase space flows and can thus both predict future values to a high accuracy, as well as reconstruct the general properties of a chaotic attractor without requiring a model. In this work, we show that the ability to learn the dynamics of a complex system can be extended to systems with co-…
▽ More
Reservoir computers are powerful tools for chaotic time series prediction. They can be trained to approximate phase space flows and can thus both predict future values to a high accuracy, as well as reconstruct the general properties of a chaotic attractor without requiring a model. In this work, we show that the ability to learn the dynamics of a complex system can be extended to systems with co-existing attractors, here a 4-dimensional extension of the well-known Lorenz chaotic system. We demonstrate that a reservoir computer can infer entirely unexplored parts of the phase space: a properly trained reservoir computer can predict the existence of attractors that were never approached during training and therefore are labelled as unseen. We provide examples where attractor inference is achieved after training solely on a single noisy trajectory.
△ Less
Submitted 30 September, 2021; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Next Generation Reservoir Computing
Authors:
Daniel J. Gauthier,
Erik Bollt,
Aaron Griffith,
Wendson A. S. Barbosa
Abstract:
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a…
▽ More
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices, fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
△ Less
Submitted 22 July, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Reservoir Computing with Superconducting Electronics
Authors:
Graham E. Rowlands,
Minh-Hai Nguyen,
Guilhem J. Ribeill,
Andrew P. Wagner,
Luke C. G. Govia,
Wendson A. S. Barbosa,
Daniel J. Gauthier,
Thomas A. Ohki
Abstract:
The rapidity and low power consumption of superconducting electronics makes them an ideal substrate for physical reservoir computing, which commandeers the computational power inherent to the evolution of a dynamical system for the purposes of performing machine learning tasks. We focus on a subset of superconducting circuits that exhibit soliton-like dynamics in simple transmission line geometrie…
▽ More
The rapidity and low power consumption of superconducting electronics makes them an ideal substrate for physical reservoir computing, which commandeers the computational power inherent to the evolution of a dynamical system for the purposes of performing machine learning tasks. We focus on a subset of superconducting circuits that exhibit soliton-like dynamics in simple transmission line geometries. With numerical simulations we demonstrate the effectiveness of these circuits in performing higher-order parity calculations and channel equalization at rates approaching 100 Gb/s. The availability of a proven superconducting logic scheme considerably simplifies the path to a fully integrated reservoir computing platform and makes superconducting reservoirs an enticing substrate for high rate signal processing applications.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Symmetry-Aware Reservoir Computing
Authors:
Wendson A. S. Barbosa,
Aaron Griffith,
Graham E. Rowlands,
Luke C. G. Govia,
Guilhem J. Ribeill,
Minh-Hai Nguyen,
Thomas A. Ohki,
Daniel J. Gauthier
Abstract:
We demonstrate that matching the symmetry properties of a reservoir computer (RC) to the data being processed dramatically increases its processing power. We apply our method to the parity task, a challenging benchmark problem that highlights inversion and permutation symmetries, and to a chaotic system inference task that presents an inversion symmetry rule. For the parity task, our symmetry-awar…
▽ More
We demonstrate that matching the symmetry properties of a reservoir computer (RC) to the data being processed dramatically increases its processing power. We apply our method to the parity task, a challenging benchmark problem that highlights inversion and permutation symmetries, and to a chaotic system inference task that presents an inversion symmetry rule. For the parity task, our symmetry-aware RC obtains zero error using an exponentially reduced neural network and training data, greatly speeding up the time to result and outperforming hand crafted artificial neural networks. When both symmetries are respected, we find that the network size $N$ necessary to obtain zero error for 50 different RC instances scales linearly with the parity-order $n$. Moreover, some symmetry-aware RC instances perform a zero error classification with only $N=1$ for $n\leq7$. Furthermore, we show that a symmetry-aware RC only needs a training data set with size on the order of $(n+n/2)$ to obtain such performance, an exponential reduction in comparison to a regular RC which requires a training data set with size on the order of $n2^n$ to contain all $2^n$ possible $n-$bit-long sequences. For the inference task, we show that a symmetry-aware RC presents a normalized root-mean-square error three orders-of-magnitude smaller than regular RCs. For both tasks, our RC approach respects the symmetries by adjusting only the input and the output layers, and not by problem-based modifications to the neural network. We anticipate that generalizations of our procedure can be applied in information processing for problems with known symmetries.
△ Less
Submitted 22 September, 2021; v1 submitted 30 January, 2021;
originally announced February 2021.
-
Model-Free Control of Dynamical Systems with Deep Reservoir Computing
Authors:
Daniel Canaday,
Andrew Pomerance,
Daniel J Gauthier
Abstract:
We propose and demonstrate a nonlinear control method that can be applied to unknown, complex systems where the controller is based on a type of artificial neural network known as a reservoir computer. In contrast to many modern neural-network-based control techniques, which are robust to system uncertainties but require a model nonetheless, our technique requires no prior knowledge of the system…
▽ More
We propose and demonstrate a nonlinear control method that can be applied to unknown, complex systems where the controller is based on a type of artificial neural network known as a reservoir computer. In contrast to many modern neural-network-based control techniques, which are robust to system uncertainties but require a model nonetheless, our technique requires no prior knowledge of the system and is thus model-free. Further, our approach does not require an initial system identification step, resulting in a relatively simple and efficient learning process. Reservoir computers are well-suited to the control problem because they require small training data sets and remarkably low training times. By iteratively training and adding layers of reservoir computers to the controller, a precise and efficient control law is identified quickly. With examples on both numerical and high-speed experimental systems, we demonstrate that our approach is capable of controlling highly complex dynamical systems that display deterministic chaos to nontrivial target trajectories.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior
Authors:
Ethan Gotlieb Wilcox,
Jon Gauthier,
Jennifer Hu,
Peng Qian,
Roger Levy
Abstract:
Human reading behavior is tuned to the statistics of natural language: the time it takes human subjects to read a word can be predicted from estimates of the word's probability in context. However, it remains an open question what computational architecture best characterizes the expectations deployed in real time by humans that determine the behavioral signatures of reading. Here we test over two…
▽ More
Human reading behavior is tuned to the statistics of natural language: the time it takes human subjects to read a word can be predicted from estimates of the word's probability in context. However, it remains an open question what computational architecture best characterizes the expectations deployed in real time by humans that determine the behavioral signatures of reading. Here we test over two dozen models, independently manipulating computational architecture and training dataset size, on how well their next-word expectations predict human reading time behavior on naturalistic text corpora. We find that across model architectures and training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evaluate how features of these models determine their psychometric predictive power, or ability to predict human reading behavior. In general, the better a model's next-word expectations, the better its psychometric predictive power. However, we find nontrivial differences across model architectures. For any given perplexity, deep Transformer models and n-gram models generally show superior psychometric predictive power over LSTM or structurally supervised neural models, especially for eye movement data. Finally, we compare models' psychometric predictive power to the depth of their syntactic knowledge, as measured by a battery of syntactic generalization tests developed using methods from controlled psycholinguistic experiments. Once perplexity is controlled for, we find no significant relationship between syntactic knowledge and predictive power. These results suggest that different approaches may be required to best model human real-time language comprehension behavior in naturalistic reading versus behavior for controlled linguistic materials designed for targeted probing of syntactic knowledge.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
A Systematic Assessment of Syntactic Generalization in Neural Language Models
Authors:
Jennifer Hu,
Jon Gauthier,
Peng Qian,
Ethan Wilcox,
Roger P. Levy
Abstract:
While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic knowledge. Furthermore, existing work has not provided a clear picture about the model properties required to produce proper syntactic generalizations. We present a sys…
▽ More
While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic knowledge. Furthermore, existing work has not provided a clear picture about the model properties required to produce proper syntactic generalizations. We present a systematic evaluation of the syntactic knowledge of neural language models, testing 20 combinations of model types and data sizes on a set of 34 English-language syntactic test suites. We find substantial differences in syntactic generalization performance by model architecture, with sequential models underperforming other architectures. Factorially manipulating model architecture and training dataset size (1M--40M words), we find that variability in syntactic generalization performance is substantially greater by architecture than by dataset size for the corpora tested in our experiments. Our results also reveal a dissociation between perplexity and syntactic generalization performance.
△ Less
Submitted 22 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Linking artificial and human neural representations of language
Authors:
Jon Gauthier,
Roger Levy
Abstract:
What information from an act of sentence understanding is robustly represented in the human brain? We investigate this question by comparing sentence encoding models on a brain decoding task, where the sentence that an experimental participant has seen must be predicted from the fMRI signal evoked by the sentence. We take a pre-trained BERT architecture as a baseline sentence encoding model and fi…
▽ More
What information from an act of sentence understanding is robustly represented in the human brain? We investigate this question by comparing sentence encoding models on a brain decoding task, where the sentence that an experimental participant has seen must be predicted from the fMRI signal evoked by the sentence. We take a pre-trained BERT architecture as a baseline sentence encoding model and fine-tune it on a variety of natural language understanding (NLU) tasks, asking which lead to improvements in brain-decoding performance.
We find that none of the sentence encoding tasks tested yield significant increases in brain decoding performance. Through further task ablations and representational analyses, we find that tasks which produce syntax-light representations yield significant improvements in brain decoding performance. Our results constrain the space of NLU models that could best account for human neural representations of language, but also suggest limits on the possibility of decoding fine-grained syntactic information from fMRI human neuroimaging.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Forecasting Chaotic Systems with Very Low Connectivity Reservoir Computers
Authors:
Aaron Griffith,
Andrew Pomerance,
Daniel J. Gauthier
Abstract:
We explore the hyperparameter space of reservoir computers used for forecasting of the chaotic Lorenz '63 attractor with Bayesian optimization. We use a new measure of reservoir performance, designed to emphasize learning the global climate of the forecasted system rather than short-term prediction. We find that optimizing over this measure more quickly excludes reservoirs that fail to reproduce t…
▽ More
We explore the hyperparameter space of reservoir computers used for forecasting of the chaotic Lorenz '63 attractor with Bayesian optimization. We use a new measure of reservoir performance, designed to emphasize learning the global climate of the forecasted system rather than short-term prediction. We find that optimizing over this measure more quickly excludes reservoirs that fail to reproduce the climate. The results of optimization are surprising: the optimized parameters often specify a reservoir network with very low connectivity. Inspired by this observation, we explore reservoir designs with even simpler structure, and find well-performing reservoirs that have zero spectral radius and no recurrence. These simple reservoirs provide counterexamples to widely used heuristics in the field, and may be useful for hardware implementations of reservoir computers.
△ Less
Submitted 15 November, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Hybrid Boolean Networks as Physically Unclonable Functions
Authors:
Noeloikeau Charlot,
Daniel Canaday,
Andrew Pomerance,
Daniel J. Gauthier
Abstract:
We introduce a Physically Unclonable Function (PUF) based on an ultra-fast chaotic network known as a Hybrid Boolean Network (HBN) implemented on a field programmable gate array. The network, consisting of $N$ coupled asynchronous logic gates displaying dynamics on the sub-nanosecond time scale, acts as a `digital fingerprint' by amplifying small manufacturing variations during a period of transie…
▽ More
We introduce a Physically Unclonable Function (PUF) based on an ultra-fast chaotic network known as a Hybrid Boolean Network (HBN) implemented on a field programmable gate array. The network, consisting of $N$ coupled asynchronous logic gates displaying dynamics on the sub-nanosecond time scale, acts as a `digital fingerprint' by amplifying small manufacturing variations during a period of transient chaos. In contrast to other PUF designs, we use both $N$-bits per challenge and obtain $N$-bits per response by considering challenges to be initial states of the $N$-node network and responses to be states captured during the subsequent chaotic transient. We find that the presence of chaos amplifies the frozen-in randomness due to manufacturing differences and that the extractable entropy is approximately $50\%$ of the maximum of $N2^{N}$ bits. We obtain PUF uniqueness and reliability metrics $μ_{inter}$ = 0.40$\pm$0.01 and $μ_{intra}$ = 0.05$\pm$0.00, respectively, for an $N=256$ network. These metrics correspond to an expected Hamming distance of 102.4 bits per response. Moreover, a simple cherry-picking scheme that discards noisy bits yields $μ_{intra} < 0.01$ while still retaining $\sim200$ bits/response (corresponding to a Hamming distance of $\sim80$ bits/response). In addition to characterizing the uniqueness and reliability, we demonstrate super-exponential scaling in the entropy up to $N=512$ and demonstrate that PUFmeter, a recent PUF analysis tool, is unable to model our PUF. Finally, we characterize the temperature variation of the HBN-PUF and propose future improvements.
△ Less
Submitted 6 April, 2021; v1 submitted 29 July, 2019;
originally announced July 2019.
-
Does the brain represent words? An evaluation of brain decoding studies of language understanding
Authors:
Jon Gauthier,
Anna Ivanova
Abstract:
Language decoding studies have identified word representations which can be used to predict brain activity in response to novel words and sentences (Anderson et al., 2016; Pereira et al., 2018). The unspoken assumption of these studies is that, during processing, linguistic information is transformed into some shared semantic space, and those semantic representations are then used for a variety of…
▽ More
Language decoding studies have identified word representations which can be used to predict brain activity in response to novel words and sentences (Anderson et al., 2016; Pereira et al., 2018). The unspoken assumption of these studies is that, during processing, linguistic information is transformed into some shared semantic space, and those semantic representations are then used for a variety of linguistic and non-linguistic tasks. We claim that current studies vastly underdetermine the content of these representations, the algorithms which the brain deploys to produce and consume them, and the computational tasks which they are designed to solve. We illustrate this indeterminacy with an extension of the sentence-decoding experiment of Pereira et al. (2018), showing how standard evaluations fail to distinguish between language processing models which deploy different mechanisms and which are optimized to solve very different tasks. We conclude by suggesting changes to the brain decoding paradigm which can support stronger claims of neural representation.
△ Less
Submitted 2 June, 2018;
originally announced June 2018.
-
Word learning and the acquisition of syntactic--semantic overhypotheses
Authors:
Jon Gauthier,
Roger Levy,
Joshua B. Tenenbaum
Abstract:
Children learning their first language face multiple problems of induction: how to learn the meanings of words, and how to build meaningful phrases from those words according to syntactic rules. We consider how children might solve these problems efficiently by solving them jointly, via a computational model that learns the syntax and semantics of multi-word utterances in a grounded reference game…
▽ More
Children learning their first language face multiple problems of induction: how to learn the meanings of words, and how to build meaningful phrases from those words according to syntactic rules. We consider how children might solve these problems efficiently by solving them jointly, via a computational model that learns the syntax and semantics of multi-word utterances in a grounded reference game. We select a well-studied empirical case in which children are aware of patterns linking the syntactic and semantic properties of words --- that the properties picked out by base nouns tend to be related to shape, while prenominal adjectives tend to refer to other properties such as color. We show that children applying such inductive biases are accurately reflecting the statistics of child-directed speech, and that inducing similar biases in our computational model captures children's behavior in a classic adjective learning experiment. Our model incorporating such biases also demonstrates a clear data efficiency in learning, relative to a baseline model that learns without forming syntax-sensitive overhypotheses of word meaning. Thus solving a more complex joint inference problem may make the full problem of language acquisition easier, not harder.
△ Less
Submitted 13 May, 2018;
originally announced May 2018.
-
Cortical-inspired image reconstruction via sub-Riemannian geometry and hypoelliptic diffusion
Authors:
Ugo Boscain,
Roman Chertovskih,
Jean-Paul Gauthier,
Dario Prandi,
Alexey Remizov
Abstract:
In this paper we review several algorithms for image inpainting based on the hypoelliptic diffusion naturally associated with a mathematical model of the primary visual cortex. In particular, we present one algorithm that does not exploit the information of where the image is corrupted, and others that do it. While the first algorithm is able to reconstruct only images that our visual system is st…
▽ More
In this paper we review several algorithms for image inpainting based on the hypoelliptic diffusion naturally associated with a mathematical model of the primary visual cortex. In particular, we present one algorithm that does not exploit the information of where the image is corrupted, and others that do it. While the first algorithm is able to reconstruct only images that our visual system is still capable of recognize, we show that those of the second type completely transcend such limitation providing reconstructions at the state-of-the-art in image inpainting. This can be interpreted as a validation of the fact that our visual cortex actually encodes the first type of algorithm.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
An advanced active quenching circuit for ultra-fast quantum cryptography
Authors:
Mario Stipčević,
Bradley G. Christensen,
Paul G. Kwiat,
Daniel J. Gauthier
Abstract:
Commercial photon-counting modules based on actively quenched solid-state avalanche photodiode sensors are used in a wide variety of applications. Manufacturers characterize their detectors by specifying a small set of parameters, such as detection efficiency, dead time, dark counts rate, afterpulsing probability and single-photon arrival-time resolution (jitter). However, they usually do not spec…
▽ More
Commercial photon-counting modules based on actively quenched solid-state avalanche photodiode sensors are used in a wide variety of applications. Manufacturers characterize their detectors by specifying a small set of parameters, such as detection efficiency, dead time, dark counts rate, afterpulsing probability and single-photon arrival-time resolution (jitter). However, they usually do not specify the range of conditions over which these parameters are constant or present a sufficient description of the characterization process. In this work, we perform a few novel tests on two commercial detectors and identify an additional set of imperfections that must be specified to sufficiently characterize their behavior. These include rate-dependence of the dead time and jitter, detection delay shift, and "twilighting." We find that these additional non-ideal behaviors can lead to unexpected effects or strong deterioration of the performance of a system using these devices. We explain their origin by an in-depth analysis of the active quenching process. To mitigate the effects of these imperfections, a custom-built detection system is designed using a novel active quenching circuit. Its performance is compared against two commercial detectors in a fast quantum key distribution system with hyper-entangled photons and a random number generator.
△ Less
Submitted 9 September, 2017; v1 submitted 20 June, 2017;
originally announced June 2017.
-
Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning
Authors:
Li Lucy,
Jon Gauthier
Abstract:
Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of…
▽ More
Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.
△ Less
Submitted 31 May, 2017;
originally announced May 2017.
-
A semidiscrete version of the Citti-Petitot-Sarti model as a plausible model for anthropomorphic image reconstruction and pattern recognition
Authors:
Dario Prandi,
Jean-Paul Gauthier
Abstract:
In his beautiful book [66], Jean Petitot proposes a sub-Riemannian model for the primary visual cortex of mammals. This model is neurophysiologically justified. Further developments of this theory lead to efficient algorithms for image reconstruction, based upon the consideration of an associated hypoelliptic diffusion. The sub-Riemannian model of Petitot and Citti-Sarti (or certain of its improve…
▽ More
In his beautiful book [66], Jean Petitot proposes a sub-Riemannian model for the primary visual cortex of mammals. This model is neurophysiologically justified. Further developments of this theory lead to efficient algorithms for image reconstruction, based upon the consideration of an associated hypoelliptic diffusion. The sub-Riemannian model of Petitot and Citti-Sarti (or certain of its improvements) is a left-invariant structure over the group $SE(2)$ of rototranslations of the plane. Here, we propose a semi-discrete version of this theory, leading to a left-invariant structure over the group $SE(2,N)$, restricting to a finite number of rotations. This apparently very simple group is in fact quite atypical: it is maximally almost periodic, which leads to much simpler harmonic analysis compared to $SE(2).$ Based upon this semi-discrete model, we improve on previous image-reconstruction algorithms and we develop a pattern-recognition theory that leads also to very efficient algorithms in practice.
△ Less
Submitted 11 January, 2018; v1 submitted 10 April, 2017;
originally announced April 2017.
-
Generalized Fourier-Bessel operator and almost-periodic interpolation and approximation
Authors:
Jean-Paul Gauthier,
Dario Prandi
Abstract:
We consider functions $f$ of two real variables, given as trigonometric functions over a finite set $F$ of frequencies. This set is assumed to be closed under rotations in the frequency plane of angle $\frac{2kπ}{M}$ for some integer $M$. Firstly, we address the problem of evaluating these functions over a similar finite set $E$ in the space plane and, secondly, we address the problems of interpol…
▽ More
We consider functions $f$ of two real variables, given as trigonometric functions over a finite set $F$ of frequencies. This set is assumed to be closed under rotations in the frequency plane of angle $\frac{2kπ}{M}$ for some integer $M$. Firstly, we address the problem of evaluating these functions over a similar finite set $E$ in the space plane and, secondly, we address the problems of interpolating or approximating a function $g$ of two variables by such an $f$ over the grid $E.$ In particular, for this aim, we establish an abstract factorization theorem for the evaluation function, which is a key point for an efficient numerical solution to these problems. This result is based on the very special structure of the group $SE(2,N)$, subgroup of the group $SE(2)$ of motions of the plane corresponding to discrete rotations, which is a maximally almost periodic group.
Although the motivation of this paper comes from our previous works on biomimetic image reconstruction and pattern recognition, where these questions appear naturally, this topic is related with several classical problems: the FFT in polar coordinates, the Non Uniform FFT, the evaluation of general trigonometric polynomials, and so on.
△ Less
Submitted 23 November, 2016;
originally announced December 2016.
-
A Paradigm for Situated and Goal-Driven Language Learning
Authors:
Jon Gauthier,
Igor Mordatch
Abstract:
A distinguishing property of human intelligence is the ability to flexibly use language in order to communicate complex ideas with other humans in a variety of contexts. Research in natural language dialogue should focus on designing communicative agents which can integrate themselves into these contexts and productively collaborate with humans. In this abstract, we propose a general situated lang…
▽ More
A distinguishing property of human intelligence is the ability to flexibly use language in order to communicate complex ideas with other humans in a variety of contexts. Research in natural language dialogue should focus on designing communicative agents which can integrate themselves into these contexts and productively collaborate with humans. In this abstract, we propose a general situated language learning paradigm which is designed to bring about robust language agents able to cooperate productively with humans.
△ Less
Submitted 11 October, 2016;
originally announced October 2016.
-
A Fast Unified Model for Parsing and Sentence Understanding
Authors:
Samuel R. Bowman,
Jon Gauthier,
Abhinav Rastogi,
Raghav Gupta,
Christopher D. Manning,
Christopher Potts
Abstract:
Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Inte…
▽ More
Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree-structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25 times over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models.
△ Less
Submitted 29 July, 2016; v1 submitted 18 March, 2016;
originally announced March 2016.
-
Fourier descriptors based on the structure of the human primary visual cortex with applications to object recognition
Authors:
Amine Bohi,
Dario Prandi,
Vincente Guis,
Frédéric Bouchara,
Jean-Paul Gauthier
Abstract:
In this paper we propose a supervised object recognition method using new global features and inspired by the model of the human primary visual cortex V1 as the semidiscrete roto-translation group $SE(2,N) = \mathbb Z_N\rtimes \mathbb R^2$. The proposed technique is based on generalized Fourier descriptors on the latter group, which are invariant to natural geometric transformations (rotations, tr…
▽ More
In this paper we propose a supervised object recognition method using new global features and inspired by the model of the human primary visual cortex V1 as the semidiscrete roto-translation group $SE(2,N) = \mathbb Z_N\rtimes \mathbb R^2$. The proposed technique is based on generalized Fourier descriptors on the latter group, which are invariant to natural geometric transformations (rotations, translations). These descriptors are then used to feed an SVM classifier. We have tested our method against the COIL-100 image database and the ORL face database, and compared it with other techniques based on traditional descriptors, global and local. The obtained results have shown that our approach looks extremely efficient and stable to noise, in presence of which it outperforms the other techniques analyzed in the paper.
△ Less
Submitted 28 June, 2016; v1 submitted 23 July, 2015;
originally announced July 2015.
-
Highly corrupted image inpainting through hypoelliptic diffusion
Authors:
Ugo Boscain,
Roman Chertovskih,
Jean-Paul Gauthier,
Dario Prandi,
Alexey Remizov
Abstract:
We present a new image inpainting algorithm, the Averaging and Hypoelliptic Evolution (AHE) algorithm, inspired by the one presented in [SIAM J. Imaging Sci., vol. 7, no. 2, pp. 669--695, 2014] and based upon a semi-discrete variation of the Citti-Petitot-Sarti model of the primary visual cortex V1. The AHE algorithm is based on a suitable combination of sub-Riemannian hypoelliptic diffusion and a…
▽ More
We present a new image inpainting algorithm, the Averaging and Hypoelliptic Evolution (AHE) algorithm, inspired by the one presented in [SIAM J. Imaging Sci., vol. 7, no. 2, pp. 669--695, 2014] and based upon a semi-discrete variation of the Citti-Petitot-Sarti model of the primary visual cortex V1. The AHE algorithm is based on a suitable combination of sub-Riemannian hypoelliptic diffusion and ad-hoc local averaging techniques. In particular, we focus on reconstructing highly corrupted images (i.e. where more than the 80% of the image is missing), for which we obtain reconstructions comparable with the state-of-the-art.
△ Less
Submitted 5 April, 2018; v1 submitted 25 February, 2015;
originally announced February 2015.
-
Reservoir computing with a single time-delay autonomous Boolean node
Authors:
Nicholas D. Haynes,
Miguel C. Soriano,
David P. Rosin,
Ingo Fischer,
Daniel J. Gauthier
Abstract:
We demonstrate reservoir computing with a physical system using a single autonomous Boolean logic element with time-delay feedback. The system generates a chaotic transient with a window of consistency lasting between 30 and 300 ns, which we show is sufficient for reservoir computing. We then characterize the dependence of computational performance on system parameters to find the best operating p…
▽ More
We demonstrate reservoir computing with a physical system using a single autonomous Boolean logic element with time-delay feedback. The system generates a chaotic transient with a window of consistency lasting between 30 and 300 ns, which we show is sufficient for reservoir computing. We then characterize the dependence of computational performance on system parameters to find the best operating point of the reservoir. When the best parameters are chosen, the reservoir is able to classify short input patterns with performance that decreases over time. In particular, we show that four distinct input patterns can be classified for 70 ns, even though the inputs are only provided to the reservoir for 7.5 ns.
△ Less
Submitted 30 January, 2015; v1 submitted 4 November, 2014;
originally announced November 2014.
-
Optimization of Synthesis Oversampled Complex Filter Banks
Authors:
Jerome Gauthier,
Laurent Duval,
Jean-Christophe Pesquet
Abstract:
An important issue with oversampled FIR analysis filter banks (FBs) is to determine inverse synthesis FBs, when they exist. Given any complex oversampled FIR analysis FB, we first provide an algorithm to determine whether there exists an inverse FIR synthesis system. We also provide a method to ensure the Hermitian symmetry property on the synthesis side, which is serviceable to processing real-…
▽ More
An important issue with oversampled FIR analysis filter banks (FBs) is to determine inverse synthesis FBs, when they exist. Given any complex oversampled FIR analysis FB, we first provide an algorithm to determine whether there exists an inverse FIR synthesis system. We also provide a method to ensure the Hermitian symmetry property on the synthesis side, which is serviceable to processing real-valued signals. As an invertible analysis scheme corresponds to a redundant decomposition, there is no unique inverse FB. Given a particular solution, we parameterize the whole family of inverses through a null space projection. The resulting reduced parameter set simplifies design procedures, since the perfect reconstruction constrained optimization problem is recast as an unconstrained optimization problem. The design of optimized synthesis FBs based on time or frequency localization criteria is then investigated, using a simple yet efficient gradient algorithm.
△ Less
Submitted 21 July, 2009;
originally announced July 2009.