-
Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?
Authors:
Aryaman Chobey,
Oliver Smith,
Anzi Wang,
Grusha Prasad
Abstract:
The use of neural language models to model human behavior has met with mixed success. While some work has found that the surprisal estimates from these models can be used to predict a wide range of human neural and behavioral responses, other work studying more complex syntactic phenomena has found that these surprisal estimates generate incorrect behavioral predictions. This paper explores the ex…
▽ More
The use of neural language models to model human behavior has met with mixed success. While some work has found that the surprisal estimates from these models can be used to predict a wide range of human neural and behavioral responses, other work studying more complex syntactic phenomena has found that these surprisal estimates generate incorrect behavioral predictions. This paper explores the extent to which the misalignment between empirical and model-predicted behavior can be minimized by training models on more developmentally plausible data, such as in the BabyLM Challenge. We trained teacher language models on the BabyLM "strict-small" dataset and used sentence level surprisal estimates from these teacher models to create a curriculum. We found tentative evidence that our curriculum made it easier for models to acquire linguistic knowledge from the training data: on the subset of tasks in the BabyLM challenge suite evaluating models' grammatical knowledge of English, models first trained on the BabyLM data curriculum and then on a few randomly ordered training epochs performed slightly better than models trained on randomly ordered epochs alone. This improved linguistic knowledge acquisition did not result in better alignment with human reading behavior, however: models trained on the BabyLM dataset (with or without a curriculum) generated predictions that were as misaligned with human behavior as models trained on larger less curated datasets. This suggests that training on developmentally plausible datasets alone is likely insufficient to generate language models capable of accurately predicting human language processing.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Co-Learning Empirical Games and World Models
Authors:
Max Olan Smith,
Michael P. Wellman
Abstract:
Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a…
▽ More
Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
Authors:
Marc Lanctot,
John Schultz,
Neil Burch,
Max Olan Smith,
Daniel Hennes,
Thomas Anthony,
Julien Perolat
Abstract:
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro…
▽ More
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
△ Less
Submitted 31 October, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Iterative Empirical Game Solving via Single Policy Best Response
Authors:
Max Olan Smith,
Thomas Anthony,
Michael P. Wellman
Abstract:
Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a best response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this a…
▽ More
Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a best response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this algorithm to more complex domains. We introduce two variations of PSRO designed to reduce the amount of simulation required during Deep RL training. Both algorithms modify how PSRO adds new policies to the empirical game, based on learned responses to a single opponent policy. The first, Mixed-Oracles, transfers knowledge from previous iterations of Deep RL, requiring training only against the opponent's newest policy. The second, Mixed-Opponents, constructs a pure-strategy opponent by mixing existing strategy's action-value estimates, instead of their policies. Learning against a single policy mitigates variance in state outcomes that is induced by an unobserved distribution of opponents. We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
A Spectral Enabled GAN for Time Series Data Generation
Authors:
Kaleb E. Smith,
Anthony O. Smith
Abstract:
Time dependent data is a main source of information in today's data driven world. Generating this type of data though has shown its challenges and made it an interesting research area in the field of generative machine learning. One such approach was that by Smith et al. who developed Time Series Generative Adversarial Network (TSGAN) which showed promising performance in generating time dependent…
▽ More
Time dependent data is a main source of information in today's data driven world. Generating this type of data though has shown its challenges and made it an interesting research area in the field of generative machine learning. One such approach was that by Smith et al. who developed Time Series Generative Adversarial Network (TSGAN) which showed promising performance in generating time dependent data and the ability of few shot generation though being flawed in certain aspects of training and learning. This paper looks to improve on the results from TSGAN and address those flaws by unifying the training of the independent networks in TSGAN and creating a dependency both in training and learning. This improvement, called unified TSGAN (uTSGAN) was tested and comapred both quantitatively and qualitatively to its predecessor on 70 benchmark time series data sets used in the community. uTSGAN showed to outperform TSGAN in 80\% of the data sets by the same number of training epochs and 60\% of the data sets in 3/4th the amount of training time or less while maintaining the few shot generation ability with better FID scores across those data sets.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Learning to Play against Any Mixture of Opponents
Authors:
Max Olan Smith,
Thomas Anthony,
Yongzhao Wang,
Michael P. Wellman
Abstract:
Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values.…
▽ More
Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a complicated cyber-security game. We find that Q-Mixing is able to successfully transfer knowledge across any mixture of opponents. We next consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent classifier -- trained in parallel to Q-learning, using the same data -- and use the classifier results to refine the mixing of Q-values. We find that Q-Mixing augmented with the opponent classifier function performs comparably, and with lower variance, than training directly against a mixed-strategy opponent.
△ Less
Submitted 3 June, 2021; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Conditional GAN for timeseries generation
Authors:
Kaleb E Smith,
Anthony O Smith
Abstract:
It is abundantly clear that time dependent data is a vital source of information in the world. The challenge has been for applications in machine learning to gain access to a considerable amount of quality data needed for algorithm development and analysis. Modeling synthetic data using a Generative Adversarial Network (GAN) has been at the heart of providing a viable solution. Our work focuses on…
▽ More
It is abundantly clear that time dependent data is a vital source of information in the world. The challenge has been for applications in machine learning to gain access to a considerable amount of quality data needed for algorithm development and analysis. Modeling synthetic data using a Generative Adversarial Network (GAN) has been at the heart of providing a viable solution. Our work focuses on one dimensional times series and explores the few shot approach, which is the ability of an algorithm to perform well with limited data. This work attempts to ease the frustration by proposing a new architecture, Time Series GAN (TSGAN), to model realistic time series data. We evaluate TSGAN on 70 data sets from a benchmark time series database. Our results demonstrate that TSGAN performs better than the competition both quantitatively using the Frechet Inception Score (FID) metric, and qualitatively when classification is used as the evaluation criteria.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
A Spatial Sampling Approach to Wave Field Synthesis: PBAP and Huygens Arrays
Authors:
Julius O. Smith III
Abstract:
A simple approach to microphone- and speaker-arrays is described in which the microphone array is regarded as a sampling grid for the acoustic field, and the corresponding speaker-array is treated as a "spatial digital to analog converter" that reconstructs the acoustic field from its spatial samples. Advantages of this approach include ease of understanding and teaching, ease of deployment, effec…
▽ More
A simple approach to microphone- and speaker-arrays is described in which the microphone array is regarded as a sampling grid for the acoustic field, and the corresponding speaker-array is treated as a "spatial digital to analog converter" that reconstructs the acoustic field from its spatial samples. Advantages of this approach include ease of understanding and teaching, ease of deployment, effective practical guidelines for deployment, and significant computational savings in special cases. In particular, in the far-field case (acoustic sources many wavelengths away from a linear array of speakers) it is possible to quantize source angles slightly so that no processing per speaker is required beyond pure integer delay. Smoothly moving sources are obtained using well known delay-line interpolation techniques such as linear (cross-fading) and Lagrange (polynomial) interpolation between/among speakers. We call the far-field line-array case Planewave-Based Angle Panning (PBAP), in reference to the well-known Vector-Based Amplitude Panning (VBAP) family of techniques, some of which are derived here as special cases: When speakers undersample the acoustic field, the result may be considered a form of VBAP, and VBAP is also obtained as a limiting case of polygonal PBAP arrays truncated to the polygon perimeter. Spatial samples need not be on a linear array, leading to a simple spatial audio system we call Huygens Arrays (HA). HAs are quite general for sources located behind the speaker array, which no longer needs to be linear, and the sources are no longer restricted to the far field. Multiband and hybrid arrays employing VBAP (or stereo) and subwoofer(s) are discussed, using sampling theory to inform the choices of crossover frequencies.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
No Press Diplomacy: Modeling Multi-Agent Gameplay
Authors:
Philip Paquette,
Yuchen Lu,
Steven Bocco,
Max O. Smith,
Satya Ortiz-Gagne,
Jonathan K. Kummerfeld,
Satinder Singh,
Joelle Pineau,
Aaron Courville
Abstract:
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy wher…
▽ More
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.
△ Less
Submitted 19 November, 2019; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Looking At Situationally-Induced Impairments And Disabilities (SIIDs) With People With Cognitive Brain Injury
Authors:
Osian Smith,
Stephen Lindsay
Abstract:
In this document, we discuss our work into a speaker recognition to support people with prosopagnosia and the limitations of alerting the user of whom they are in discussion with. We will discuss how current research into Situationally Induced Impairments Disabilities (SIIDs) can assist people with disabilities and vice versa and how our work can support people who may find themselves in a situati…
▽ More
In this document, we discuss our work into a speaker recognition to support people with prosopagnosia and the limitations of alerting the user of whom they are in discussion with. We will discuss how current research into Situationally Induced Impairments Disabilities (SIIDs) can assist people with disabilities and vice versa and how our work can support people who may find themselves in a situation where they are impaired with facial recognition.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
A Review on Recommendation Systems: Context-aware to Social-based
Authors:
S. M. Mahdi Seyednezhad,
Kailey Nobuko Cozart,
John Anthony Bowllan,
Anthony O. Smith
Abstract:
The number of Internet users had grown rapidly enticing companies and cooperations to make full use of recommendation infrastructures. Consequently, online advertisement companies emerged to aid us in the presence of numerous items and users. Even as a user, you may find yourself drowned in a set of items that you think you might need, but you are not sure if you should try them. Those items could…
▽ More
The number of Internet users had grown rapidly enticing companies and cooperations to make full use of recommendation infrastructures. Consequently, online advertisement companies emerged to aid us in the presence of numerous items and users. Even as a user, you may find yourself drowned in a set of items that you think you might need, but you are not sure if you should try them. Those items could be online services, products, places or even a person for a friendship. Therefore, we need recommender systems that pave the way and help us making good decisions. This paper provides a review on traditional recommendation systems, recommendation system evaluations and metrics, context-aware recommendation systems, and social-based recommendation systems. While it is hard to include all the information in a brief review paper, we try to have an introductory review over the essentials of recommendation systems. More detailed information on each chapter will be found in the corresponding references. For the purpose of explaining the concept in a different way, we provided slides available on https://www.slideshare.net/MahdiSeyednejad/recommender-systems-97094937.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
Neural Style Transfer for Audio Spectograms
Authors:
Prateek Verma,
Julius O. Smith
Abstract:
There has been fascinating work on creating artistic transformations of images by Gatys. This was revolutionary in how we can in some sense alter the 'style' of an image while generally preserving its 'content'. In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using…
▽ More
There has been fascinating work on creating artistic transformations of images by Gatys. This was revolutionary in how we can in some sense alter the 'style' of an image while generally preserving its 'content'. In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest.
For demonstration, we investigate two different tasks, resulting in bandwidth expansion/compression, and timbral transfer from singing voice to musical instruments. A feature of our method is that a single architecture can generate these different audio-style-transfer types using the same set of parameters which otherwise require different complex hand-tuned diverse signal processing pipelines.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
A Category Space Approach to Supervised Dimensionality Reduction
Authors:
Anthony O. Smith,
Anand Rangarajan
Abstract:
Supervised dimensionality reduction has emerged as an important theme in the last decade. Despite the plethora of models and formulations, there is a lack of a simple model which aims to project the set of patterns into a space defined by the classes (or categories). To this end, we set up a model in which each class is represented as a 1D subspace of the vector space formed by the features. Assum…
▽ More
Supervised dimensionality reduction has emerged as an important theme in the last decade. Despite the plethora of models and formulations, there is a lack of a simple model which aims to project the set of patterns into a space defined by the classes (or categories). To this end, we set up a model in which each class is represented as a 1D subspace of the vector space formed by the features. Assuming the set of classes does not exceed the cardinality of the features, the model results in multi-class supervised learning in which the features of each class are projected into the class subspace. Class discrimination is automatically guaranteed via the imposition of orthogonality of the 1D class sub-spaces. The resulting optimization problem - formulated as the minimization of a sum of quadratic functions on a Stiefel manifold - while being non-convex (due to the constraints), nevertheless has a structure for which we can identify when we have reached a global minimum. After formulating a version with standard inner products, we extend the formulation to reproducing kernel Hilbert spaces in a straightforward manner. The optimization approach also extends in a similar fashion to the kernel version. Results and comparisons with the multi-class Fisher linear (and kernel) discriminants and principal component analysis (linear and kernel) showcase the relative merits of this approach to dimensionality reduction.
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Closed Form Fractional Integration and Differentiation via Real Exponentially Spaced Pole-Zero Pairs
Authors:
Julius Orion Smith,
Harrison Freeman Smith
Abstract:
We derive closed-form expressions for the poles and zeros of approximate fractional integrator/differentiator filters, which correspond to spectral roll-off filters having any desired log-log slope to a controllable degree of accuracy over any bandwidth. The filters can be described as a uniform exponential distribution of poles along the negative-real axis of the s plane, with zeros interleaving…
▽ More
We derive closed-form expressions for the poles and zeros of approximate fractional integrator/differentiator filters, which correspond to spectral roll-off filters having any desired log-log slope to a controllable degree of accuracy over any bandwidth. The filters can be described as a uniform exponential distribution of poles along the negative-real axis of the s plane, with zeros interleaving them. Arbitrary spectral slopes are obtained by sliding the array of zeros relative to the array of poles, where each array maintains periodic spacing on a log scale. The nature of the slope approximation is close to Chebyshev optimal in the interior of the pole-zero array, approaching conjectured Chebyshev optimality over all frequencies in the limit as the order approaches infinity. Practical designs can arbitrarily approach the equal-ripple approximation by enlarging the pole-zero array band beyond the desired frequency band. The spectral roll-off slope can be robustly modulated in real time by varying only the zeros controlled by one slope parameter. Software implementations are provided in matlab and Faust.
△ Less
Submitted 7 June, 2016;
originally announced June 2016.
-
Efficient Synthesis of Room Acoustics via Scattering Delay Networks
Authors:
Enzo De Sena,
Huseyin Hacihabiboglu,
Zoran Cvetkovic,
Julius O. Smith III
Abstract:
An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly…
▽ More
An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible.
△ Less
Submitted 9 July, 2015; v1 submitted 19 February, 2015;
originally announced February 2015.