-
Distributed Speculative Inference of Large Language Models
Authors:
Nadav Timor,
Jonathan Mamou,
Daniel Korat,
Moshe Berchansky,
Oren Pereg,
Moshe Wasserblat,
Tomer Galanti,
Michal Gordon,
David Harel
Abstract:
Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI al…
▽ More
Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI algorithms, DSI works on frozen LLMs, requiring no training or architectural modifications, and it preserves the target distribution.
Prior studies on SI have demonstrated empirical speedups (compared to non-SI) but require a fast and accurate drafter LLM. In practice, off-the-shelf LLMs often do not have matching drafters that are sufficiently fast and accurate. We show a gap: SI gets slower than non-SI when using slower or less accurate drafters. We close this gap by proving that DSI is faster than both SI and non-SI given any drafters. By orchestrating multiple instances of the target and drafters, DSI is not only faster than SI but also supports LLMs that cannot be accelerated with SI.
Our simulations show speedups of off-the-shelf LLMs in realistic settings: DSI is 1.29-1.92x faster than SI.
△ Less
Submitted 28 June, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Localizing Paragraph Memorization in Language Models
Authors:
Niklas Stoehr,
Mitchell Gordon,
Chiyuan Zhang,
Owen Lewis
Abstract:
Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the me…
▽ More
Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the memorized examples can be unlearned by fine-tuning only the high-gradient weights. We localize a low-layer attention head that appears to be especially involved in paragraph memorization. This head is predominantly focusing its attention on distinctive, rare tokens that are least frequent in a corpus-level unigram distribution. Next, we study how localized memorization is across the tokens in the prefix by perturbing tokens and measuring the caused change in the decoding. A few distinctive tokens early in a prefix can often corrupt the entire continuation. Overall, memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
A Roadmap to Pluralistic Alignment
Authors:
Taylor Sorensen,
Jared Moore,
Jillian Fisher,
Mitchell Gordon,
Niloofar Mireshghallah,
Christopher Michael Rytting,
Andre Ye,
Liwei Jiang,
Ximing Lu,
Nouha Dziri,
Tim Althoff,
Ye** Choi
Abstract:
With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formaliz…
▽ More
With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution. We also propose and formalize three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steerable benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Thermodynamic Computing System for AI Applications
Authors:
Denis Melanson,
Mohammad Abu Khater,
Maxwell Aifer,
Kaelan Donatella,
Max Hunter Gordon,
Thomas Ahle,
Gavin Crooks,
Antonio J. Martinez,
Faris Sbahi,
Patrick J. Coles
Abstract:
Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-va…
▽ More
Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-variable thermodynamic computer, which we call the stochastic processing unit (SPU). Our SPU is composed of RLC circuits, as unit cells, on a printed circuit board, with 8 unit cells that are all-to-all coupled via switched capacitances. It can be used for either sampling or linear algebra primitives, and we demonstrate Gaussian sampling and matrix inversion on our hardware. The latter represents the first thermodynamic linear algebra experiment. We also illustrate the applicability of the SPU to uncertainty quantification for neural network classification. We envision that this hardware, when scaled up in size, will have significant impact on accelerating various probabilistic AI applications.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Cura: Curation at Social Media Scale
Authors:
Wanrong He,
Mitchell L. Gordon,
Lindsay Popowski,
Michael S. Bernstein
Abstract:
How can online communities execute a focused vision for their space? Curation offers one approach, where community leaders manually select content to share with the community. Curation enables leaders to shape a space that matches their taste, norms, and values, but the practice is often intractable at social media scale: curators cannot realistically sift through hundreds or thousands of submissi…
▽ More
How can online communities execute a focused vision for their space? Curation offers one approach, where community leaders manually select content to share with the community. Curation enables leaders to shape a space that matches their taste, norms, and values, but the practice is often intractable at social media scale: curators cannot realistically sift through hundreds or thousands of submissions daily. In this paper, we contribute algorithmic and interface foundations enabling curation at scale, and manifest these foundations in a system called Cura. Our approach draws on the observation that, while curators' attention is limited, other community members' upvotes are plentiful and informative of curators' likely opinions. We thus contribute a transformer-based curation model that predicts whether each curator will upvote a post based on previous community upvotes. Cura applies this curation model to create a feed of content that it predicts the curator would want in the community. Evaluations demonstrate that the curation model accurately estimates opinions of diverse curators, that changing curators for a community results in clearly recognizable shifts in the community's content, and that, consequently, curation can reduce anti-social behavior by half without extra moderation effort. By sampling different types of curators, Cura lowers the threshold to genres of curated social media ranging from editorial groups to stakeholder roundtables to democracies.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Thermodynamic Linear Algebra
Authors:
Maxwell Aifer,
Kaelan Donatella,
Max Hunter Gordon,
Samuel Duffield,
Thomas Ahle,
Daniel Simpson,
Gavin E. Crooks,
Patrick J. Coles
Abstract:
Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in…
▽ More
Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in timescale. Here we consider an alternative physics-based computing paradigm based on classical thermodynamics, to provide a near-term approach to accelerating linear algebra.
At first sight, thermodynamics and linear algebra seem to be unrelated fields. In this work, we connect solving linear algebra problems to sampling from the thermodynamic equilibrium distribution of a system of coupled harmonic oscillators. We present simple thermodynamic algorithms for (1) solving linear systems of equations, (2) computing matrix inverses, (3) computing matrix determinants, and (4) solving Lyapunov equations. Under reasonable assumptions, we rigorously establish asymptotic speedups for our algorithms, relative to digital methods, that scale linearly in matrix dimension. Our algorithms exploit thermodynamic principles like ergodicity, entropy, and equilibration, highlighting the deep connection between these two seemingly distinct fields, and opening up algebraic applications for thermodynamic computing hardware.
△ Less
Submitted 10 June, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Decoding Neural Activity to Assess Individual Latent State in Ecologically Valid Contexts
Authors:
Stephen M. Gordon,
Jonathan R. McDaniel,
Kevin W. King,
Vernon J. Lawhern,
Jonathan Touryan
Abstract:
There exist very few ways to isolate cognitive processes, historically defined via highly controlled laboratory studies, in more ecologically valid contexts. Specifically, it remains unclear as to what extent patterns of neural activity observed under such constraints actually manifest outside the laboratory in a manner that can be used to make an accurate inference about the latent state, associa…
▽ More
There exist very few ways to isolate cognitive processes, historically defined via highly controlled laboratory studies, in more ecologically valid contexts. Specifically, it remains unclear as to what extent patterns of neural activity observed under such constraints actually manifest outside the laboratory in a manner that can be used to make an accurate inference about the latent state, associated cognitive process, or proximal behavior of the individual. Improving our understanding of when and how specific patterns of neural activity manifest in ecologically valid scenarios would provide validation for laboratory-based approaches that study similar neural phenomena in isolation and meaningful insight into the latent states that occur during complex tasks. We argue that domain generalization methods from the brain-computer interface community have the potential to address this challenge. We previously used such an approach to decode phasic neural responses associated with visual target discrimination. Here, we extend that work to more tonic phenomena such as internal latent states. We use data from two highly controlled laboratory paradigms to train two separate domain-generalized models. We apply the trained models to an ecologically valid paradigm in which participants performed multiple, concurrent driving-related tasks. Using the pretrained models, we derive estimates of the underlying latent state and associated patterns of neural activity. Importantly, as the patterns of neural activity change along the axis defined by the original training data, we find changes in behavior and task performance consistent with the observations from the original, laboratory paradigms. We argue that these results lend ecological validity to those experimental designs and provide a methodology for understanding the relationship between observed neural activity and behavior during complex tasks.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Deep Convolutional Neural Network for Plume Rise Measurements in Industrial Environments
Authors:
Mohammad Koushafar,
Gunho Sohn,
Mark Gordon
Abstract:
Estimating Plume Cloud (PC) height is essential for various applications, such as global climate models. Smokestack Plume Rise (PR) is the constant height at which the PC is carried downwind as its momentum dissipates and the PC and the ambient temperatures equalize. Although different parameterizations are used in most air-quality models to predict PR, they have yet to be verified thoroughly. Thi…
▽ More
Estimating Plume Cloud (PC) height is essential for various applications, such as global climate models. Smokestack Plume Rise (PR) is the constant height at which the PC is carried downwind as its momentum dissipates and the PC and the ambient temperatures equalize. Although different parameterizations are used in most air-quality models to predict PR, they have yet to be verified thoroughly. This paper proposes a low-cost measurement technology to monitor smokestack PCs and make long-term, real-time measurements of PR. For this purpose, a two-stage method is developed based on Deep Convolutional Neural Networks (DCNNs). In the first stage, an improved Mask R-CNN, called Deep Plume Rise Network (DPRNet), is applied to recognize the PC. Here, image processing analyses and least squares, respectively, are used to detect PC boundaries and fit an asymptotic model into the boundaries centerline. The y-component coordinate of this model's critical point is considered PR. In the second stage, a geometric transformation phase converts image measurements into real-life ones. A wide range of images with different atmospheric conditions, including day, night, and cloudy/foggy, have been selected for the DPRNet training algorithm. Obtained results show that the proposed method outperforms widely-used networks in smoke border detection and recognition.
△ Less
Submitted 12 March, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Resource frugal optimizer for quantum machine learning
Authors:
Charles Moussa,
Max Hunter Gordon,
Michal Baczyk,
M. Cerezo,
Lukasz Cincio,
Patrick J. Coles
Abstract:
Quantum-enhanced data science, also known as quantum machine learning (QML), is of growing interest as an application of near-term quantum computers. Variational QML algorithms have the potential to solve practical problems on real hardware, particularly when involving quantum data. However, training these algorithms can be challenging and calls for tailored optimization procedures. Specifically,…
▽ More
Quantum-enhanced data science, also known as quantum machine learning (QML), is of growing interest as an application of near-term quantum computers. Variational QML algorithms have the potential to solve practical problems on real hardware, particularly when involving quantum data. However, training these algorithms can be challenging and calls for tailored optimization procedures. Specifically, QML applications can require a large shot-count overhead due to the large datasets involved. In this work, we advocate for simultaneous random sampling over both the dataset as well as the measurement operators that define the loss function. We consider a highly general loss function that encompasses many QML applications, and we show how to construct an unbiased estimator of its gradient. This allows us to propose a shot-frugal gradient descent optimizer called Refoqus (REsource Frugal Optimizer for QUantum Stochastic gradient descent). Our numerics indicate that Refoqus can save several orders of magnitude in shot cost, even relative to optimizers that sample over measurement operators alone.
△ Less
Submitted 28 July, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Can laypeople predict the replicability of social science studies without expert intervention: an exploratory study
Authors:
Juntao Wang,
Jonathan Lei,
Anna Dreber,
Michael Gordon,
Magnus Johannesson,
Thomas Pfeiffer,
Yiling Chen
Abstract:
The low replication rate of published studies has long concerned the social science community, making understanding the replicability a critical problem. Several studies have shown that relevant research communities can make predictions about the replicability of individual studies with above-chance accuracy. Follow-up work further indicates that laypeople can also achieve above-chance accuracy in…
▽ More
The low replication rate of published studies has long concerned the social science community, making understanding the replicability a critical problem. Several studies have shown that relevant research communities can make predictions about the replicability of individual studies with above-chance accuracy. Follow-up work further indicates that laypeople can also achieve above-chance accuracy in predicting replicability when experts interpret the studies into short descriptions that are more accessible for laypeople. The involvement of scarce expert resources may make these methods expensive from financial and time perspectives. In this work, we explored whether laypeople can predict the replicability of social science studies without expert intervention. We presented laypeople with raw materials truncated from published social science papers and elicited their answers to questions related to the paper. Our results suggested that laypeople were engaged in this technical task, providing reasonable and self-contained answers. The majority of them also demonstrated a good understanding of the material. However, the solicited information had limited predictive power on the actual replication outcomes. We further discuss several lessons we learned compared to the approach with expert intervention to inspire future works.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Inference-Based Quantum Sensing
Authors:
C. Huerta Alderete,
Max Hunter Gordon,
Frederic Sauvage,
Akira Sone,
Andrew T. Sornborger,
Patrick J. Coles,
M. Cerezo
Abstract:
In a standard Quantum Sensing (QS) task one aims at estimating an unknown parameter $θ$, encoded into an $n$-qubit probe state, via measurements of the system. The success of this task hinges on the ability to correlate changes in the parameter to changes in the system response $\mathcal{R}(θ)$ (i.e., changes in the measurement outcomes). For simple cases the form of $\mathcal{R}(θ)$ is known, but…
▽ More
In a standard Quantum Sensing (QS) task one aims at estimating an unknown parameter $θ$, encoded into an $n$-qubit probe state, via measurements of the system. The success of this task hinges on the ability to correlate changes in the parameter to changes in the system response $\mathcal{R}(θ)$ (i.e., changes in the measurement outcomes). For simple cases the form of $\mathcal{R}(θ)$ is known, but the same cannot be said for realistic scenarios, as no general closed-form expression exists. In this work we present an inference-based scheme for QS. We show that, for a general class of unitary families of encoding, $\mathcal{R}(θ)$ can be fully characterized by only measuring the system response at $2n+1$ parameters. This allows us to infer the value of an unknown parameter given the measured response, as well as to determine the sensitivity of the scheme, which characterizes its overall performance. We show that inference error is, with high probability, smaller than $δ$, if one measures the system response with a number of shots that scales only as $Ω(\log^3(n)/δ^2)$. Furthermore, the framework presented can be broadly applied as it remains valid for arbitrary probe states and measurement schemes, and, even holds in the presence of quantum noise. We also discuss how to extend our results beyond unitary families. Finally, to showcase our method we implement it for a QS task on real quantum hardware, and in numerical simulations.
△ Less
Submitted 4 August, 2023; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Covariance matrix preparation for quantum principal component analysis
Authors:
Max Hunter Gordon,
M. Cerezo,
Lukasz Cincio,
Patrick J. Coles
Abstract:
Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking.…
▽ More
Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| ψ_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overlineρ = \sum_i p_i |ψ_i\rangle \langle ψ_i |$. We first show that $\overlineρ$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overlineρ$, and hence $\overlineρ$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.
△ Less
Submitted 24 October, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets
Authors:
Xiaoxi Wei,
A. Aldo Faisal,
Moritz Grosse-Wentrup,
Alexandre Gramfort,
Sylvain Chevallier,
Vinay Jayaram,
Camille Jeunet,
Stylianos Bakas,
Siegfried Ludwig,
Konstantinos Barmpas,
Mehdi Bahri,
Yannis Panagakis,
Nikolaos Laskaris,
Dimitrios A. Adamos,
Stefanos Zafeiriou,
William C. Duong,
Stephen M. Gordon,
Vernon J. Lawhern,
Maciej Śliwowski,
Vincent Rouanne,
Piotr Tempczyk
Abstract:
Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other side, developments in…
▽ More
Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other side, developments in transfer learning would benefit significantly from a real-world benchmark with immediate practical application. Therefore, we pick electroencephalography (EEG) as an exemplar for what makes biosignal machine learning hard. We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI), that have to be solved in the face of low signal-to-noise ratios, major variability among subjects, differences in the data recording sessions and techniques, and even between the specific BCI tasks recorded in the dataset. Task 1 is centred on the field of medical diagnostics, addressing automatic sleep stage annotation across subjects. Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets. The BEETL competition with its over 30 competing teams and its 3 winning entries brought attention to the potential of deep transfer learning and combinations of set theory and conventional machine learning techniques to overcome the challenges. The results set a new state-of-the-art for the real-world BEETL benchmark.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Jury Learning: Integrating Dissenting Voices into Machine Learning Models
Authors:
Mitchell L. Gordon,
Michelle S. Lam,
Joon Sung Park,
Kayur Patel,
Jeffrey T. Hancock,
Tatsunori Hashimoto,
Michael S. Bernstein
Abstract:
Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We intr…
▽ More
Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Visual Intelligence through Human Interaction
Authors:
Ranjay Krishna,
Mitchell Gordon,
Li Fei-Fei,
Michael Bernstein
Abstract:
Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more d…
▽ More
Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more data, either for model training or for evaluation. In this chapter, we demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision. First, we present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models. Second, we explore a method to increase volunteer contributions using automated social interventions. Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable and grounded in psychophysics theory. We conclude with future opportunities for Human-Computer Interaction to aid Computer Vision.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Exploring and Analyzing Machine Commonsense Benchmarks
Authors:
Henrique Santos,
Minor Gordon,
Zhicheng Liang,
Gretchen Forbush,
Deborah L. McGuinness
Abstract:
Commonsense question-answering (QA) tasks, in the form of benchmarks, are constantly being introduced for challenging and comparing commonsense QA systems. The benchmarks provide question sets that systems' developers can use to train and test new models before submitting their implementations to official leaderboards. Although these tasks are created to evaluate systems in identified dimensions (…
▽ More
Commonsense question-answering (QA) tasks, in the form of benchmarks, are constantly being introduced for challenging and comparing commonsense QA systems. The benchmarks provide question sets that systems' developers can use to train and test new models before submitting their implementations to official leaderboards. Although these tasks are created to evaluate systems in identified dimensions (e.g. topic, reasoning type), this metadata is limited and largely presented in an unstructured format or completely not present. Because machine common sense is a fast-paced field, the problem of fully assessing current benchmarks and systems with regards to these evaluation dimensions is aggravated. We argue that the lack of a common vocabulary for aligning these approaches' metadata limits researchers in their efforts to understand systems' deficiencies and in making effective choices for future tasks. In this paper, we first discuss this MCS ecosystem in terms of its elements and their metadata. Then, we present how we are supporting the assessment of approaches by initially focusing on commonsense benchmarks. We describe our initial MCS Benchmark Ontology, an extensible common vocabulary that formalizes benchmark metadata, and showcase how it is supporting the development of a Benchmark tool that enables benchmark exploration and analysis.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Distributed Algorithms from Arboreal Ants for the Shortest Path Problem
Authors:
Shivam Garg,
Kirankumar Shiragur,
Deborah M. Gordon,
Moses Charikar
Abstract:
Colonies of the arboreal turtle ant create networks of trails that link nests and food sources on the graph formed by branches and vines in the canopy of the tropical forest. Ants put down a volatile pheromone on edges as they traverse them. At each vertex, the next edge to traverse is chosen using a decision rule based on the current pheromone level. There is a bidirectional flow of ants around t…
▽ More
Colonies of the arboreal turtle ant create networks of trails that link nests and food sources on the graph formed by branches and vines in the canopy of the tropical forest. Ants put down a volatile pheromone on edges as they traverse them. At each vertex, the next edge to traverse is chosen using a decision rule based on the current pheromone level. There is a bidirectional flow of ants around the network. In a field study, Chandrasekhar et al. (2021) observed that the trail networks approximately minimize the number of vertices, thus solving a variant of the popular shortest path problem without any central control and with minimal computational resources. We propose a biologically plausible model, based on a variant of the reinforced random walk on a graph, which explains this observation and suggests surprising algorithms for the shortest path problem and its variants. Through simulations and analysis, we show that when the rate of flow of ants does not change, the dynamics converges to the path with the minimum number of vertices, as observed in the field. The dynamics converges to the shortest path when the rate of flow increases with time, so the colony can solve the shortest path problem merely by increasing the flow rate. We also show that to guarantee convergence to the shortest path, bidirectional flow and a decision rule dividing the flow in proportion to the pheromone level are necessary, but convergence to approximately short paths is possible with other decision rules.
△ Less
Submitted 30 June, 2023; v1 submitted 30 November, 2020;
originally announced November 2020.
-
A Semantic Framework for Enabling Radio Spectrum Policy Management and Evaluation
Authors:
H. Santos,
A. Mulvehill,
J. S. Erickson,
J. P. McCusker,
M. Gordon,
O. Xie,
S. Stouffer,
G. Capraro,
A. Pidwerbetsky,
J. Burgess,
A. Berlinsky,
K. Turck,
J. Ashdown,
D. L. McGuinness
Abstract:
Because radio spectrum is a finite resource, its usage and sharing is regulated by government agencies. These agencies define policies to manage spectrum allocation and assignment across multiple organizations, systems, and devices. With more portions of the radio spectrum being licensed for commercial use, the importance of providing an increased level of automation when evaluating such policies…
▽ More
Because radio spectrum is a finite resource, its usage and sharing is regulated by government agencies. These agencies define policies to manage spectrum allocation and assignment across multiple organizations, systems, and devices. With more portions of the radio spectrum being licensed for commercial use, the importance of providing an increased level of automation when evaluating such policies becomes crucial for the efficiency and efficacy of spectrum management. We introduce our Dynamic Spectrum Access Policy Framework for supporting the United States government's mission to enable both federal and non-federal entities to compatibly utilize available spectrum. The DSA Policy Framework acts as a machine-readable policy repository providing policy management features and spectrum access request evaluation. The framework utilizes a novel policy representation using OWL and PROV-O along with a domain-specific reasoning implementation that mixes GeoSPARQL, OWL reasoning, and knowledge graph traversal to evaluate incoming spectrum access requests and explain how applicable policies were used. The framework is currently being used to support live, over-the-air field exercises involving a diverse set of federal and commercial radios, as a component of a prototype spectrum management system.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
Mitiq: A software package for error mitigation on noisy quantum computers
Authors:
Ryan LaRose,
Andrea Mari,
Sarah Kaiser,
Peter J. Karalekas,
Andre A. Alves,
Piotr Czarnik,
Mohamed El Mandouh,
Max H. Gordon,
Yousef Hindy,
Aaron Robertson,
Purva Thakre,
Misty Wahl,
Danny Samuel,
Rahul Mistri,
Maxime Tremblay,
Nick Gardner,
Nathaniel T. Stemen,
Nathan Shammah,
William J. Zeng
Abstract:
We introduce Mitiq, a Python package for error mitigation on noisy quantum computers. Error mitigation techniques can reduce the impact of noise on near-term quantum computers with minimal overhead in quantum resources by relying on a mixture of quantum sampling and classical post-processing techniques. Mitiq is an extensible toolkit of different error mitigation methods, including zero-noise extr…
▽ More
We introduce Mitiq, a Python package for error mitigation on noisy quantum computers. Error mitigation techniques can reduce the impact of noise on near-term quantum computers with minimal overhead in quantum resources by relying on a mixture of quantum sampling and classical post-processing techniques. Mitiq is an extensible toolkit of different error mitigation methods, including zero-noise extrapolation, probabilistic error cancellation, and Clifford data regression. The library is designed to be compatible with generic backends and interfaces with different quantum software frameworks. We describe Mitiq using code snippets to demonstrate usage and discuss features and contribution guidelines. We present several examples demonstrating error mitigation on IBM and Rigetti superconducting quantum processors as well as on noisy simulators.
△ Less
Submitted 1 August, 2022; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Improving the Segmentation of Scanning Probe Microscope Images using Convolutional Neural Networks
Authors:
Steff Farley,
Jo E. A. Hodgkinson,
Oliver M. Gordon,
Joanna Turner,
Andrea Soltoggio,
Philip J. Moriarty,
Eugenie Hunsicker
Abstract:
A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the…
▽ More
A wide range of techniques can be considered for segmentation of images of nanostructured surfaces. Manually segmenting these images is time-consuming and results in a user-dependent segmentation bias, while there is currently no consensus on the best automated segmentation methods for particular techniques, image classes, and samples. Any image segmentation approach must minimise the noise in the images to ensure accurate and meaningful statistical analysis can be carried out. Here we develop protocols for the segmentation of images of 2D assemblies of gold nanoparticles formed on silicon surfaces via deposition from an organic solvent. The evaporation of the solvent drives far-from-equilibrium self-organisation of the particles, producing a wide variety of nano- and micro-structured patterns. We show that a segmentation strategy using the U-Net convolutional neural network outperforms traditional automated approaches and has particular potential in the processing of images of nanostructured systems.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
RTEX: A novel methodology for Ranking, Tagging, and Explanatory diagnostic captioning of radiography exams
Authors:
Vasiliki Kougia,
John Pavlopoulos,
Panagiotis Papapetrou,
Max Gordon
Abstract:
This paper introduces RTEx, a novel methodology for a) ranking radiography exams based on their probability to contain an abnormality, b) generating abnormality tags for abnormal exams, and c) providing a diagnostic explanation in natural language for each abnormal exam. The task of ranking radiography exams is an important first step for practitioners who want to identify and prioritize those rad…
▽ More
This paper introduces RTEx, a novel methodology for a) ranking radiography exams based on their probability to contain an abnormality, b) generating abnormality tags for abnormal exams, and c) providing a diagnostic explanation in natural language for each abnormal exam. The task of ranking radiography exams is an important first step for practitioners who want to identify and prioritize those radiography exams that are more likely to contain abnormalities, for example, to avoid mistakes due to tiredness or to manage heavy workload (e.g., during a pandemic). We used two publicly available datasets to assess our methodology and demonstrate that for the task of ranking it outperforms its competitors in terms of NDCG@k. For each abnormal radiography exam RTEx generates a set of abnormality tags alongside an explanatory diagnostic text to explain the tags and guide the medical expert. Our tagging component outperforms two strong competitor methods in terms of F1. Moreover, the diagnostic captioning component of RTEx, which exploits the already extracted tags to constrain the captioning process, outperforms all competitors with respect to clinical precision and recall.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Replication Markets: Results, Lessons, Challenges and Opportunities in AI Replication
Authors:
Yang Liu,
Michael Gordon,
Juntao Wang,
Michael Bishop,
Yiling Chen,
Thomas Pfeiffer,
Charles Twardy,
Domenico Viganola
Abstract:
The last decade saw the emergence of systematic large-scale replication projects in the social and behavioral sciences, (Camerer et al., 2016, 2018; Ebersole et al., 2016; Klein et al., 2014, 2018; Collaboration, 2015). These projects were driven by theoretical and conceptual concerns about a high fraction of "false positives" in the scientific publications (Ioannidis, 2005) (and a high prevalence…
▽ More
The last decade saw the emergence of systematic large-scale replication projects in the social and behavioral sciences, (Camerer et al., 2016, 2018; Ebersole et al., 2016; Klein et al., 2014, 2018; Collaboration, 2015). These projects were driven by theoretical and conceptual concerns about a high fraction of "false positives" in the scientific publications (Ioannidis, 2005) (and a high prevalence of "questionable research practices" (Simmons, Nelson, and Simonsohn, 2011). Concerns about the credibility of research findings are not unique to the behavioral and social sciences; within Computer Science, Artificial Intelligence (AI) and Machine Learning (ML) are areas of particular concern (Lucic et al., 2018; Freire, Bonnet, and Shasha, 2012; Gundersen and Kjensmo, 2018; Henderson et al., 2018). Given the pioneering role of the behavioral and social sciences in the promotion of novel methodologies to improve the credibility of research, it is a promising approach to analyze the lessons learned from this field and adjust strategies for Computer Science, AI and ML In this paper, we review approaches used in the behavioral and social sciences and in the DARPA SCORE project. We particularly focus on the role of human forecasting of replication outcomes, and how forecasting can leverage the information gained from relatively labor and resource-intensive replications. We will discuss opportunities and challenges of using these approaches to monitor and improve the credibility of research areas in Computer Science, AI, and ML.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Authors:
Mitchell A. Gordon,
Kevin Duh
Abstract:
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest…
▽ More
We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.
△ Less
Submitted 23 June, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Authors:
Mitchell A. Gordon,
Kevin Duh,
Nicholas Andrews
Abstract:
Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-tr…
▽ More
Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-training affect transfer learning? We find that pruning affects transfer learning in three broad regimes. Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all. Medium levels of pruning increase the pre-training loss and prevent useful pre-training information from being transferred to downstream tasks. High levels of pruning additionally prevent models from fitting downstream datasets, leading to further degradation. Finally, we observe that fine-tuning BERT on a specific task does not improve its prunability. We conclude that BERT can be pruned once during pre-training rather than separately for each task without affecting performance.
△ Less
Submitted 14 May, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
Approximating Human Judgment of Generated Image Quality
Authors:
Y. Alex Kolchinski,
Sharon Zhou,
Shengjia Zhao,
Mitchell Gordon,
Stefano Ermon
Abstract:
Generative models have made immense progress in recent years, particularly in their ability to generate high quality images. However, that quality has been difficult to evaluate rigorously, with evaluation dominated by heuristic approaches that do not correlate well with human judgment, such as the Inception Score and Fréchet Inception Distance. Real human labels have also been used in evaluation,…
▽ More
Generative models have made immense progress in recent years, particularly in their ability to generate high quality images. However, that quality has been difficult to evaluate rigorously, with evaluation dominated by heuristic approaches that do not correlate well with human judgment, such as the Inception Score and Fréchet Inception Distance. Real human labels have also been used in evaluation, but are inefficient and expensive to collect for each image. Here, we present a novel method to automatically evaluate images based on their quality as perceived by humans. By not only generating image embeddings from Inception network activations and comparing them to the activations for real images, of which other methods perform a variant, but also regressing the activation statistics to match gold standard human labels, we demonstrate 66% accuracy in predicting human scores of image realism, matching the human inter-rater agreement rate. Our approach also generalizes across generative models, suggesting the potential for capturing a model-agnostic measure of image quality. We open source our dataset of human labels for the advancement of research and techniques in this area.
△ Less
Submitted 30 November, 2019;
originally announced December 2019.
-
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Authors:
Mitchell A. Gordon,
Kevin Duh
Abstract:
Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and find it unlikely in our cas…
▽ More
Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and find it unlikely in our case. Models trained on concatenations of original and "simplified" datasets generalize just as well as baseline SLKD. We then propose an alternative hypothesis under the lens of data augmentation and regularization. We try various augmentation strategies and observe that dropout regularization can become unnecessary. Our methods achieve BLEU gains of 0.7-1.2 on TED Talks.
△ Less
Submitted 6 December, 2019;
originally announced December 2019.
-
Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal
Authors:
Mitchell L. Gordon,
Tim Althoff,
Jure Leskovec
Abstract:
Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: setting a goal, and subsequently achieving that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps.
Here we exp…
▽ More
Activity tracking apps often make use of goals as one of their core motivational tools. There are two critical components to this tool: setting a goal, and subsequently achieving that goal. Despite its crucial role in how a number of prominent self-tracking apps function, there has been relatively little investigation of the goal-setting and achievement aspects of self-tracking apps.
Here we explore this issue, investigating a particular goal setting and achievement process that is extensive, recorded, and crucial for both the app and its users' success: weight loss goals in MyFitnessPal. We present a large-scale study of 1.4 million users and weight loss goals, allowing for an unprecedented detailed view of how people set and achieve their goals. We find that, even for difficult long-term goals, behavior within the first 7 days predicts those who ultimately achieve their goals, that is, those who lose at least as much weight as they set out to, and those who do not. For instance, high amounts of early weight loss, which some researchers have classified as unsustainable, leads to higher goal achievement rates. We also show that early food intake, self-monitoring motivation, and attitude towards the goal are important factors. We then show that we can use our findings to predict goal achievement with an accuracy of 79% ROC AUC just 7 days after a goal is set. Finally, we discuss how our findings could inform steps to improve goal achievement in self-tracking apps.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
Authors:
Sharon Zhou,
Mitchell L. Gordon,
Ranjay Krishna,
Austin Narcomey,
Li Fei-Fei,
Michael S. Bernstein
Abstract:
Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We constru…
▽ More
Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time. We introduce two variants: one that measures visual perception under adaptive time constraints to determine the threshold at which a model's outputs appear real (e.g. 250ms), and the other a less expensive variant that measures human error rate on fake and real images sans time constraints. We test HYPE across six state-of-the-art generative adversarial networks and two sampling techniques on conditional and unconditional image generation using four datasets: CelebA, FFHQ, CIFAR-10, and ImageNet. We find that HYPE can track model improvements across training epochs, and we confirm via bootstrap sampling that HYPE rankings are consistent and replicable.
△ Less
Submitted 31 October, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Authors:
Vladimir Mironov,
Yuri Alexeev,
Kristopher Keipert,
Michael D'mello,
Alexander Moskovsky,
Mark S. Gordon
Abstract:
Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores p…
▽ More
Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.
△ Less
Submitted 14 August, 2017; v1 submitted 31 July, 2017;
originally announced August 2017.
-
Epidemiological modeling of the 2005 French riots: a spreading wave and the role of contagion
Authors:
Laurent Bonnasse-Gahot,
Henri Berestycki,
Marie-Aude Depuiset,
Mirta B. Gordon,
Sebastian Roché,
Nancy Rodriguez,
Jean-Pierre Nadal
Abstract:
As a large-scale instance of dramatic collective behaviour, the 2005 French riots started in a poor suburb of Paris, then spread in all of France, lasting about three weeks. Remarkably, although there were no displacements of rioters, the riot activity did travel. Access to daily national police data has allowed us to explore the dynamics of riot propagation. Here we show that an epidemic-like mod…
▽ More
As a large-scale instance of dramatic collective behaviour, the 2005 French riots started in a poor suburb of Paris, then spread in all of France, lasting about three weeks. Remarkably, although there were no displacements of rioters, the riot activity did travel. Access to daily national police data has allowed us to explore the dynamics of riot propagation. Here we show that an epidemic-like model, with just a few parameters and a single sociological variable characterizing neighbourhood deprivation, accounts quantitatively for the full spatio-temporal dynamics of the riots. This is the first time that such data-driven modelling involving contagion both within and between cities (through geographic proximity or media) at the scale of a country, and on a daily basis, is performed. Moreover, we give a precise mathematical characterization to the expression "wave of riots", and provide a visualization of the propagation around Paris, exhibiting the wave in a way not described before. The remarkable agreement between model and data demonstrates that geographic proximity played a major role in the propagation, even though information was readily available everywhere through media. Finally, we argue that our approach gives a general framework for the modelling of the dynamics of spontaneous collective uprisings.
△ Less
Submitted 8 January, 2018; v1 submitted 25 January, 2017;
originally announced January 2017.
-
EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces
Authors:
Vernon J. Lawhern,
Amelia J. Solon,
Nicholas R. Waytowich,
Stephen M. Gordon,
Chou P. Hung,
Brent J. Lance
Abstract:
Brain computer interfaces (BCI) enable direct communication with a computer, using neural activity as the control signal. This neural signal is generally chosen from a variety of well-studied electroencephalogram (EEG) signals. For a given BCI paradigm, feature extractors and classifiers are tailored to the distinct characteristics of its expected EEG control signal, limiting its application to th…
▽ More
Brain computer interfaces (BCI) enable direct communication with a computer, using neural activity as the control signal. This neural signal is generally chosen from a variety of well-studied electroencephalogram (EEG) signals. For a given BCI paradigm, feature extractors and classifiers are tailored to the distinct characteristics of its expected EEG control signal, limiting its application to that specific signal. Convolutional Neural Networks (CNNs), which have been used in computer vision and speech recognition, have successfully been applied to EEG-based BCIs; however, they have mainly been applied to single BCI paradigms and thus it remains unclear how these architectures generalize to other paradigms. Here, we ask if we can design a single CNN architecture to accurately classify EEG signals from different BCI paradigms, while simultaneously being as compact as possible. In this work we introduce EEGNet, a compact convolutional network for EEG-based BCIs. We introduce the use of depthwise and separable convolutions to construct an EEG-specific model which encapsulates well-known EEG feature extraction concepts for BCI. We compare EEGNet to current state-of-the-art approaches across four BCI paradigms: P300 visual-evoked potentials, error-related negativity responses (ERN), movement-related cortical potentials (MRCP), and sensory motor rhythms (SMR). We show that EEGNet generalizes across paradigms better than the reference algorithms when only limited training data is available. We demonstrate three different approaches to visualize the contents of a trained EEGNet model to enable interpretation of the learned features. Our results suggest that EEGNet is robust enough to learn a wide variety of interpretable features over a range of BCI tasks, suggesting that the observed performances were not due to artifact or noise sources in the data.
△ Less
Submitted 15 May, 2018; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Idle Ants Have a Role
Authors:
Yehuda Afek,
Deborah M. Gordon,
Moshe Sulamy
Abstract:
Using elementary distributed computing techniques we suggest an explanation for two unexplained phenomena in regards to ant colonies, (a) a substantial amount of ants in an ant colony are idle, and (b) the observed low survivability of new ant colonies in nature. Ant colonies employ task allocation, in which ants progress from one task to the other, to meet changing demands introduced by the envir…
▽ More
Using elementary distributed computing techniques we suggest an explanation for two unexplained phenomena in regards to ant colonies, (a) a substantial amount of ants in an ant colony are idle, and (b) the observed low survivability of new ant colonies in nature. Ant colonies employ task allocation, in which ants progress from one task to the other, to meet changing demands introduced by the environment. Extending the biological task allocation model given in [Pacala, Gordon and Godfray 1996] we present a distributed algorithm which mimics the mechanism ants use to solve task allocation efficiently in nature. Analyzing the time complexity of the algorithm reveals an exponential gap on the time it takes an ant colony to satisfy a certain work demand with and without idle ants. We provide an $O(\ln n)$ upper bound when a constant fraction of the colony are idle ants, and a contrasting lower bound of $Ω(n)$ when there are no idle ants, where $n$ is the total number of ants in the colony.
△ Less
Submitted 20 May, 2016; v1 submitted 23 June, 2015;
originally announced June 2015.
-
An optimal linear separator for the Sonar Signals Classification task
Authors:
Juan-Manuel Torres-Moreno,
Mirta B. Gordon
Abstract:
The problem of classifying sonar signals from rocks and mines first studied by Gorman and Sejnowski has become a benchmark against which many learning algorithms have been tested. We show that both the training set and the test set of this benchmark are linearly separable, although with different hyperplanes. Moreover, the complete set of learning and test patterns together, is also linearly sep…
▽ More
The problem of classifying sonar signals from rocks and mines first studied by Gorman and Sejnowski has become a benchmark against which many learning algorithms have been tested. We show that both the training set and the test set of this benchmark are linearly separable, although with different hyperplanes. Moreover, the complete set of learning and test patterns together, is also linearly separable. We give the weights that separate these sets, which may be used to compare results found by other algorithms.
△ Less
Submitted 2 June, 2009;
originally announced June 2009.
-
Adaptive Learning with Binary Neurons
Authors:
Juan-Manuel Torres-Moreno,
Mirta B. Gordon
Abstract:
A efficient incremental learning algorithm for classification tasks, called NetLines, well adapted for both binary and real-valued input patterns is presented. It generates small compact feedforward neural networks with one hidden layer of binary units and binary output units. A convergence theorem ensures that solutions with a finite number of hidden units exist for both binary and real-valued…
▽ More
A efficient incremental learning algorithm for classification tasks, called NetLines, well adapted for both binary and real-valued input patterns is presented. It generates small compact feedforward neural networks with one hidden layer of binary units and binary output units. A convergence theorem ensures that solutions with a finite number of hidden units exist for both binary and real-valued input patterns. An implementation for problems with more than two classes, valid for any binary classifier, is proposed. The generalization error and the size of the resulting networks are compared to the best published results on well-known classification benchmarks. Early stop** is shown to decrease overfitting, without improving the generalization performance.
△ Less
Submitted 29 April, 2009;
originally announced April 2009.
-
Induction of High-level Behaviors from Problem-solving Traces using Machine Learning Tools
Authors:
Vivien Robinet,
Gilles Bisson,
Mirta B. Gordon,
Benoît Lemaire
Abstract:
This paper applies machine learning techniques to student modeling. It presents a method for discovering high-level student behaviors from a very large set of low-level traces corresponding to problem-solving actions in a learning environment. Basic actions are encoded into sets of domain-dependent attribute-value patterns called cases. Then a domain-independent hierarchical clustering identifie…
▽ More
This paper applies machine learning techniques to student modeling. It presents a method for discovering high-level student behaviors from a very large set of low-level traces corresponding to problem-solving actions in a learning environment. Basic actions are encoded into sets of domain-dependent attribute-value patterns called cases. Then a domain-independent hierarchical clustering identifies what we call general attitudes, yielding automatic diagnosis expressed in natural language, addressed in principle to teachers. The method can be applied to individual students or to entire groups, like a class. We exhibit examples of this system applied to thousands of students' actions in the domain of algebraic transformations.
△ Less
Submitted 5 April, 2009;
originally announced April 2009.
-
Optimal hash functions for approximate closest pairs on the n-cube
Authors:
Daniel M. Gordon,
Victor Miller,
Peter Ostapenko
Abstract:
One way to find closest pairs in large datasets is to use hash functions. In recent years locality-sensitive hash functions for various metrics have been given: projecting an n-cube onto k bits is simple hash function that performs well. In this paper we investigate alternatives to projection. For various parameters hash functions given by complete decoding algorithms for codes work better, and…
▽ More
One way to find closest pairs in large datasets is to use hash functions. In recent years locality-sensitive hash functions for various metrics have been given: projecting an n-cube onto k bits is simple hash function that performs well. In this paper we investigate alternatives to projection. For various parameters hash functions given by complete decoding algorithms for codes work better, and asymptotically random codes perform better than projection.
△ Less
Submitted 15 October, 2009; v1 submitted 20 June, 2008;
originally announced June 2008.
-
Finite size scaling of the bayesian perceptron
Authors:
A. Buhot,
J. -M. Torres Moreno,
M. B. Gordon
Abstract:
We study numerically the properties of the bayesian perceptron through a gradient descent on the optimal cost function. The theoretical distribution of stabilities is deduced. It predicts that the optimal generalizer lies close to the boundary of the space of (error-free) solutions. The numerical simulations are in good agreement with the theoretical distribution. The extrapolation of the genera…
▽ More
We study numerically the properties of the bayesian perceptron through a gradient descent on the optimal cost function. The theoretical distribution of stabilities is deduced. It predicts that the optimal generalizer lies close to the boundary of the space of (error-free) solutions. The numerical simulations are in good agreement with the theoretical distribution. The extrapolation of the generalization error to infinite input space size agrees with the theoretical results. Finite size corrections are negative and exhibit two different scaling regimes, depending on the training set size. The variance of the generalization error vanishes for $N \rightarrow \infty$ confirming the property of self-averaging.
△ Less
Submitted 20 March, 1997;
originally announced March 1997.