Search | arXiv e-print repository

AlabOS: A Python-based Reconfigurable Workflow Management Framework for Autonomous Laboratories

Authors: Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain, Gerbrand Ceder

Abstract: The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for… ▽ More The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. We demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory. AlabOS features a reconfigurable experiment workflow model, enabling the simultaneous execution of varied workflows composed of modular tasks. Therefore, AlabOS is well-suited to handle the rapidly changing experimental protocols defining the progress of self-driving laboratory development for materials research. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 30 pages, 5 figures

arXiv:2404.11611 [pdf, other]

Biosignatures from pre-oxygen photosynthesising life on TRAPPIST-1e

Authors: Jake K. Eager-Nash, Stuart J. Daines, James W. McDermott, Peter Andrews, Lucy A. Grain, James Bishop, Aaron A. Rogers, Jack W. G. Smith, Chadiga Khalek, Thomas J. Boxer, Mei Ting Mak, Robert J. Ridgway, Eric Hebrard, F. Hugo Lambert, Timothy M. Lenton, Nathan J. Mayne

Abstract: In order to assess observational evidence for potential atmospheric biosignatures on exoplanets, it will be essential to test whether spectral fingerprints from multiple gases can be explained by abiotic or biotic-only processes. Here, we develop and apply a coupled 1D atmosphere-ocean-ecosystem model to understand how primitive biospheres, which exploit abiotic sources of H2, CO and O2, could inf… ▽ More In order to assess observational evidence for potential atmospheric biosignatures on exoplanets, it will be essential to test whether spectral fingerprints from multiple gases can be explained by abiotic or biotic-only processes. Here, we develop and apply a coupled 1D atmosphere-ocean-ecosystem model to understand how primitive biospheres, which exploit abiotic sources of H2, CO and O2, could influence the atmospheric composition of rocky terrestrial exoplanets. We apply this to the Earth at 3.8 Ga and to TRAPPIST-1e. We focus on metabolisms that evolved before the evolution of oxygenic photosynthesis, which consume H2 and CO and produce potentially detectable levels of CH4. O2-consuming metabolisms are also considered for TRAPPIST-1e, as abiotic O2 production is predicted on M-dwarf orbiting planets. We show that these biospheres can lead to high levels of surface O2 (approximately 1-5 %) as a result of \ch{CO} consumption, which could allow high O2 scenarios, by removing the main loss mechanisms of atomic oxygen. Increasing stratospheric temperatures, which increases atmospheric OH can reduce the likelihood of such a state forming. O2-consuming metabolisms could also lower O2 levels to around 10 ppm and support a productive biosphere at low reductant inputs. Using predicted transmission spectral features from CH4, CO, O2/O3 and CO2 across the hypothesis space for tectonic reductant input, we show that biotically-produced CH4 may only be detectable at high reductant inputs. CO is also likely to be a dominant feature in transmission spectra for planets orbiting M-dwarfs, which could reduce the confidence in any potential biosignature observations linked to these biospheres. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 29 pages, 19 figures

arXiv:2310.19502 [pdf, other]

Mechanistically-guided materials chemistry: synthesis of new ternary nitrides, CaZrN$_2$ and CaHfN$_2$

Authors: Christopher L. Rom, Andrew Novick, Matthew J. McDermott, Andrey A. Yakovenko, Jessica R. Gallawa, Gia Thinh Tran, Dominic C. Asebiah, Emily N. Storck, Brennan C. McBride, Rebecca C. Miller, Amy L. Prieto, Kristin A. Persson, Eric Toberer, Vladan Stevanović, Andriy Zakutayev, James R. Neilson

Abstract: Recent computational studies have predicted many new ternary nitrides, revealing synthetic opportunities in this underexplored phase space. However, synthesizing new ternary nitrides is difficult, in part because intermediate and product phases often have high cohesive energies that inhibit diffusion. Here, we report the synthesis of two new phases, calcium zirconium nitride (CaZrN$_2$) and calciu… ▽ More Recent computational studies have predicted many new ternary nitrides, revealing synthetic opportunities in this underexplored phase space. However, synthesizing new ternary nitrides is difficult, in part because intermediate and product phases often have high cohesive energies that inhibit diffusion. Here, we report the synthesis of two new phases, calcium zirconium nitride (CaZrN$_2$) and calcium hafnium nitride (CaHfN$_2$), by solid state metathesis reactions between Ca$_3$N$_2$ and $M$Cl$_4$ ($M$ = Zr, Hf). Although the reaction nominally proceeds to the target phases in a 1:1 ratio of the precursors via Ca$_3$N$_2$ + $M$Cl$_4$ $\rightarrow$ Ca$M$N$_2$ + 2 CaCl$_2$, reactions prepared this way result in Ca-poor materials (Ca$_xM_{2-x}$N$_2$, $x<1$). A small excess of Ca$_3$N$_2$ (ca. 20 mol\%) is needed to yield stoichiometric Ca$M$N$_2$, as confirmed by high-resolution synchrotron powder X-ray diffraction. In situ synchrotron X-ray diffraction studies reveal that nominally stoichiometric reactions produce Zr$^{3+}$ intermediates early in the reaction pathway, and the excess Ca$_3$N$_2$ is needed to reoxidize Zr$^{3+}$ intermediates back to the Zr$^{4+}$ oxidation state of CaZrN$_2$. Analysis of computationally-derived chemical potential diagrams rationalizes this synthetic approach and its contrast from the synthesis of MgZrN$_2$. These findings additionally highlight the utility of in situ diffraction studies and computational thermochemistry to provide mechanistic guidance for synthesis. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2308.11816 [pdf, other]

doi 10.1021/acscentsci.3c01051

Assessing Thermodynamic Selectivity of Solid-State Reactions for the Predictive Synthesis of Inorganic Materials

Authors: Matthew J. McDermott, Brennan C. McBride, Corlyn Regier, Gia Thinh Tran, Yu Chen, Adam A. Corrao, Max C. Gallant, Gabrielle E. Kamm, Christopher J. Bartel, Karena W. Chapman, Peter G. Khalifah, Gerbrand Ceder, James R. Neilson, Kristin A. Persson

Abstract: Synthesis is a major challenge in the discovery of new inorganic materials. Currently, there is limited theoretical guidance for identifying optimal solid-state synthesis procedures. We introduce two selectivity metrics, primary and secondary competition, to assess the favorability of target/impurity phase formation in solid-state reactions. We used these metrics to analyze 3,520 solid-state react… ▽ More Synthesis is a major challenge in the discovery of new inorganic materials. Currently, there is limited theoretical guidance for identifying optimal solid-state synthesis procedures. We introduce two selectivity metrics, primary and secondary competition, to assess the favorability of target/impurity phase formation in solid-state reactions. We used these metrics to analyze 3,520 solid-state reactions in the literature, ranking existing approaches to popular target materials. Additionally, we implemented these metrics in a data-driven synthesis planning workflow and demonstrated its application in the synthesis of barium titanate (BaTiO$_3$). Using an 18-element chemical reaction network with first-principles thermodynamic data from the Materials Project, we identified 82,985 possible BaTiO$_3$ synthesis reactions and selected nine for experimental testing. Characterization of reaction pathways via synchrotron powder X-ray diffraction reveals that our selectivity metrics correlate with observed target/impurity formation. We discovered two efficient reactions using unconventional precursors (BaS/BaCl$_2$ and Na$_2$TiO$_3$) that produce BaTiO$_3$ faster and with fewer impurities than conventional methods, highlighting the importance of considering complex chemistries with additional elements during precursor selection. Our framework provides a foundation for predictive inorganic synthesis, facilitating the optimization of existing recipes and the discovery of new materials, including those not easily attainable with conventional precursors. △ Less

Submitted 27 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Minor revisions (updated abstract, added discussion, added precursor purities, formatting). The first two authors contributed equally to this work. 53 pages, 7 figures

arXiv:2303.11915 [pdf, other]

doi 10.1557/s43578-023-01037-2

Modernist Materials Synthesis: Finding Thermodynamic Shortcuts with Hyperdimensional Chemistry

Authors: James R Neilson, Matthew J McDermott, Kristin A Persson

Abstract: Synthesis remains a challenge for advancing materials science. A key focus of this challenge is how to enable selective synthesis, particularly as it pertains to metastable materials. This perspective addresses the question: how can ``spectator'' elements, such as those found in double ion exchange (metathesis) reactions, enable selective materials synthesis? By observing reaction pathways as they… ▽ More Synthesis remains a challenge for advancing materials science. A key focus of this challenge is how to enable selective synthesis, particularly as it pertains to metastable materials. This perspective addresses the question: how can ``spectator'' elements, such as those found in double ion exchange (metathesis) reactions, enable selective materials synthesis? By observing reaction pathways as they happen (\emph{in situ}) and calculating their energetics using modern computational thermodynamics, we observe transient, crystalline intermediates that suggest that many reactions attain a local thermodynamic equilibrium dictated by local chemical potentials far before achieving a global equilibrium set by the average composition. Using this knowledge, one can thermodynamically ``shortcut'' unfavorable intermediates by including additional elements beyond those of the desired target, providing access to a greater number of intermediates with advantageous energetics and selective phase nucleation. Ultimately, data-driven modeling that unites first-principles approaches with experimental insights will refine the accuracy of emerging predictive retrosynthetic models for complex materials synthesis. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2207.03483 [pdf, other]

Finding Fallen Objects Via Asynchronous Audio-Visual Integration

Authors: Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh McDermott, Antonio Torralba

Abstract: The way an object looks and sounds provide complementary reflections of its physical properties. In many settings cues from vision and audition arrive asynchronously but must be integrated, as when we hear an object dropped on the floor and then must find it. In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. An object is dropped som… ▽ More The way an object looks and sounds provide complementary reflections of its physical properties. In many settings cues from vision and audition arrive asynchronously but must be integrated, as when we hear an object dropped on the floor and then must find it. In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. An object is dropped somewhere in a room. An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics. To study this problem, we have generated a large-scale dataset -- the Fallen Objects dataset -- that includes 8000 instances of 30 physical object categories in 64 rooms. The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting. As a first step toward addressing this challenge, we develop a set of embodied agent baselines, based on imitation learning, reinforcement learning, and modular planning, and perform an in-depth analysis of the challenge of this new task. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: CVPR 2022. Project page: http://fallen-object.csail.mit.edu

arXiv:2112.08984 [pdf, other]

Object-based synthesis of scra** and rolling sounds based on non-linear physical constraints

Authors: Vinayak Agarwal, Maddie Cusimano, James Traer, Josh McDermott

Abstract: Sustained contact interactions like scra** and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scra** and rolling sounds with physically and perceptually relevant co… ▽ More Sustained contact interactions like scra** and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scra** and rolling sounds with physically and perceptually relevant controllable parameters constrained by principles of mechanics. Key features of our model include non-linearities to constrain the contact force, naturalistic normal force variation for different motions, and a method for morphing impulse responses within a material to achieve location-dependence. Perceptual experiments show that the presented model is able to synthesize realistic scra** and rolling sounds while conveying physical information similar to that in recorded sounds. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Journal ref: Proceeding of the 24th International Conference on Digital Audio Effects (DAFx-20in21), 2021

arXiv:2111.06979 [pdf, other]

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Authors: Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung

Abstract: Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response… ▽ More Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2104.05986 [pdf, other]

Selectivity in yttrium manganese oxide synthesis via local chemical potentials in hyperdimensional phase space

Authors: Paul K. Todd, Matthew J. McDermott, Christopher L. Rom, Adam A. Corrao, Jonathan J. Denney, Shyam S. Dwaraknath, Peter G. Khalifah, Kristin A. Persson, James R. Neilson

Abstract: In sharp contrast to molecular synthesis, materials synthesis is generally presumed to lack selectivity. The few known methods of designing selectivity in solid-state reactions have limited scope, such as topotactic reactions or strain stabilization. This contribution describes a general approach for searching large chemical spaces to identify selective reactions. This novel approach explains the… ▽ More In sharp contrast to molecular synthesis, materials synthesis is generally presumed to lack selectivity. The few known methods of designing selectivity in solid-state reactions have limited scope, such as topotactic reactions or strain stabilization. This contribution describes a general approach for searching large chemical spaces to identify selective reactions. This novel approach explains the ability of a nominally "innocent" Na$_2$CO$_3$ precursor to enable the metathesis synthesis of single-phase Y$_2$Mn$_2$O$_7$ -- an outcome that was previously only accomplished at extreme pressures and which cannot be achieved with closely related precursors of Li$_2$CO$_3$ and K$_2$CO$_3$. By calculating the required change in chemical potential across all possible reactant-product interfaces in an expanded chemical space including Y, Mn, O, alkali metals, and halogens, using thermodynamic parameters obtained from density functional theory calculations, we identify reactions that minimize the thermodynamic competition from intermediates. In this manner, only the Na-based intermediates minimize the distance in the hyperdimensional chemical potential space to Y$_2$Mn$_2$O$_7$, thus providing selective access to a phase which was previously thought to be metastable. Experimental evidence validating this mechanism for pathway-dependent selectivity is provided by intermediates identified from in situ synchrotron-based crystallographic analysis. This approach of calculating chemical potential distances in hyperdimensional compositional spaces provides a general method for designing selective solid-state syntheses that will be useful for gaining access to metastable phases and for identifying reaction pathways that can reduce the synthesis temperature, and cost, of technological materials. △ Less

Submitted 15 August, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 30 pages with 5 figures. The first two authors contributed equally to this work

arXiv:2103.14025 [pdf, other]

The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

Authors: Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh McDermott, Antonio Torralba, Joshua B. Tenenbaum

Abstract: We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desi… ▽ More We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desired final location. We also position containers around the house that can be used as tools to assist with transporting objects efficiently. To complete the task, an embodied agent must plan a sequence of actions to change the state of a large number of objects in the face of realistic physical constraints. We build this benchmark challenge using the ThreeDWorld simulation: a virtual 3D environment where all objects respond to physics, and where can be controlled using fully physics-driven navigation and interaction API. We evaluate several existing agents on this benchmark. Experimental results suggest that: 1) a pure RL model struggles on this challenge; 2) hierarchical planning-based agents can transport some objects but still far from solving this task. We anticipate that this benchmark will empower researchers to develop more intelligent physics-driven robots for the physical world. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Comments: Project page: http://tdw-transport.csail.mit.edu/

arXiv:2011.10706 [pdf, other]

Speech Denoising with Auditory Models

Authors: Mark R. Saddler, Andrew Francl, Jenelle Feather, Kaizhi Qian, Yang Zhang, Josh H. McDermott

Abstract: Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. The development of high-performing neural network sound recognition systems has raised the possibility of using deep feature representations as 'perceptual' losses with which to train denoising systems. We explored their utility by first training deep neural networks to… ▽ More Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. The development of high-performing neural network sound recognition systems has raised the possibility of using deep feature representations as 'perceptual' losses with which to train denoising systems. We explored their utility by first training deep neural networks to classify either spoken words or environmental sounds from audio. We then trained an audio transform to map noisy speech to an audio waveform that minimized the difference in the deep feature representations between the output audio and the corresponding clean audio. The resulting transforms removed noise substantially better than baseline methods trained to reconstruct clean waveforms, and also outperformed previous methods using deep feature losses. However, a similar benefit was obtained simply by using losses derived from the filter bank inputs to the deep networks. The results show that deep features can guide speech enhancement, but suggest that they do not yet outperform simple alternatives that do not involve learned features. △ Less

Submitted 12 August, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: First two authors contributed equally, 5 pages, 3 PDF figures

arXiv:2010.03068 [pdf, other]

Hypergraph Models of Biological Networks to Identify Genes Critical to Pathogenic Viral Response

Authors: Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D. Mitchell, Brenda Praggastis, Amie J. Eisfeld, Amy C. Sims, Larissa B. Thackray, Shufang Fan, Kevin B. Walters, Peter J. Halfmann, Danielle Westhoff-Smith, Qing Tan, Vineet D. Menachery, Timothy P. Sheahan, Adam S. Cockrell, Jacob F. Kocher, Kelly G. Stratton, Natalie C. Heller, Lisa M. Bramer, Michael S. Diamond, Ralph S. Baric, Katrina M. Waters , et al. (3 additional authors not shown)

Abstract: Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and… ▽ More Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets. Results: We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality. Conclusions: Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses. △ Less

Submitted 6 October, 2020; originally announced October 2020.

MSC Class: 92C42; 92-08; 05C65

arXiv:2007.04954 [pdf, other]

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedures for efficiently building classes of new environments; high-fidelity audio rendering; realistic physical interactions for a variety of material types, including cloths, liquid, and deformable objects; customizable agents that embody AI agents; and support for human interactions with VR devices. TDW's API enables multiple agents to interact within a simulation and returns a range of sensor and physics data representing the state of the world. We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science, including multi-modal physical scene understanding, physical dynamics predictions, multi-agent interactions, models that learn like a child, and attention studies in humans and neural networks. △ Less

Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org

arXiv:2003.01787 [pdf, other]

Untangling in Invariant Speech Recognition

Authors: Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung

Abstract: Encouraged by the success of deep neural networks on a variety of visual tasks, much theoretical and experimental work has been aimed at understanding and interpreting how vision networks operate. Meanwhile, deep neural networks have also achieved impressive performance in audio processing applications, both as sub-components of larger systems and as complete end-to-end systems by themselves. Desp… ▽ More Encouraged by the success of deep neural networks on a variety of visual tasks, much theoretical and experimental work has been aimed at understanding and interpreting how vision networks operate. Meanwhile, deep neural networks have also achieved impressive performance in audio processing applications, both as sub-components of larger systems and as complete end-to-end systems by themselves. Despite their empirical successes, comparatively little is understood about how these audio models accomplish these tasks. In this work, we employ a recently developed statistical mechanical theory that connects geometric properties of network representations and the separability of classes to probe how information is untangled within neural networks trained to recognize speech. We observe that speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties such as words and phonemes are untangled in later layers. Higher level concepts such as parts-of-speech and context dependence also emerge in the later layers of the network. Finally, we find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation. Taken together, these findings shed light on how deep auditory models process time dependent input signals to achieve invariant speech recognition, and show how different concepts emerge through the layers of the network. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: Advances in Neural Information Processing Systems. 2019

arXiv:1906.03280 [pdf, other]

doi 10.1007/s42257-019-00002-6

When and Why Metaheuristics Researchers Can Ignore "No Free Lunch" Theorems

Authors: James McDermott

Abstract: The No Free Lunch (NFL) theorem for search and optimisation states that averaged across all possible objective functions on a fixed search space, all search algorithms perform equally well. Several refined versions of the theorem find a similar outcome when averaging across smaller sets of functions. This paper argues that NFL results continue to be misunderstood by many researchers, and addresses… ▽ More The No Free Lunch (NFL) theorem for search and optimisation states that averaged across all possible objective functions on a fixed search space, all search algorithms perform equally well. Several refined versions of the theorem find a similar outcome when averaging across smaller sets of functions. This paper argues that NFL results continue to be misunderstood by many researchers, and addresses this issue in several ways. Existing arguments against real-world implications of NFL results are collected and re-stated for accessibility, and new ones are added. Specific misunderstandings extant in the literature are identified, with speculation as to how they may have arisen. This paper presents an argument against a common paraphrase of NFL findings -- that algorithms must be specialised to problem domains in order to do well -- after problematising the usually undefined term "domain". It provides novel concrete counter-examples illustrating cases where NFL theorems do not apply. In conclusion it offers a novel view of the real meaning of NFL, incorporating the anthropic principle and justifying the position that in many common situations researchers can ignore NFL. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Journal ref: Springer Metaheuristics 2019

arXiv:1904.09013 [pdf, other]

Self-Supervised Audio-Visual Co-Segmentation

Authors: Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh McDermott, Antonio Torralba

Abstract: Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image… ▽ More Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image pixels to sounds. Here, we introduce a learning approach to disentangle concepts in the neural networks, and assign semantic categories to network feature channels to enable independent image segmentation and sound source separation after audio-visual training on videos. Our evaluations show that the disentangled model outperforms several baselines in semantic segmentation and sound source separation. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: Accepted to ICASSP 2019

arXiv:1804.03160 [pdf, other]

The Sound of Pixels

Authors: Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba

Abstract: We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiri… ▽ More We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiring additional manual supervision. Experimental results on a newly collected MUSIC dataset show that our proposed Mix-and-Separate framework outperforms several baselines on source separation. Qualitative results suggest our model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources. △ Less

Submitted 13 October, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

arXiv:1712.07271 [pdf, other]

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

Authors: Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

Abstract: The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throug… ▽ More The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. This paper extends an earlier conference paper, Owens et al. 2016, with additional experiments and discussion. △ Less

Submitted 19 December, 2017; originally announced December 2017.

Comments: Journal preprint of arXiv:1608.07017 (unpublished submission to IJCV)

arXiv:1704.03522 [pdf]

Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card Datasets

Authors: Van Loi Cao, Nhien-An Le-Khac, Miguel Nicolau, Michael ONeill, James McDermott

Abstract: Credit card fraud detection based on machine learning has recently attracted considerable interest from the research community. One of the most important tasks in this area is the ability of classifiers to handle the imbalance in credit card data. In this scenario, classifiers tend to yield poor accuracy on the fraud class (minority class) despite realizing high overall accuracy. This is due to th… ▽ More Credit card fraud detection based on machine learning has recently attracted considerable interest from the research community. One of the most important tasks in this area is the ability of classifiers to handle the imbalance in credit card data. In this scenario, classifiers tend to yield poor accuracy on the fraud class (minority class) despite realizing high overall accuracy. This is due to the influence of the majority class on traditional training criteria. In this paper, we aim to apply genetic programming to address this issue by adapting existing fitness functions. We examine two fitness functions from previous studies and develop two new fitness functions to evolve GP classifier with superior accuracy on the minority class and overall. Two UCI credit card datasets are used to evaluate the effectiveness of the proposed fitness functions. The results demonstrate that the proposed fitness functions augment GP classifiers, encouraging fitter solutions on both the minority and the majority classes. △ Less

Submitted 11 April, 2017; originally announced April 2017.

arXiv:1703.09752 [pdf]

Collective Anomaly Detection based on Long Short Term Memory Recurrent Neural Network

Authors: Loic Bontemps, Van Loi Cao, James McDermott, Nhien-An Le-Khac

Abstract: Intrusion detection for computer network systems becomes one of the most critical tasks for network administrators today. It has an important role for organizations, governments and our society due to its valuable resources on computer networks. Traditional misuse detection strategies are unable to detect new and unknown intrusion. Besides, anomaly detection in network security is aim to distingui… ▽ More Intrusion detection for computer network systems becomes one of the most critical tasks for network administrators today. It has an important role for organizations, governments and our society due to its valuable resources on computer networks. Traditional misuse detection strategies are unable to detect new and unknown intrusion. Besides, anomaly detection in network security is aim to distinguish between illegal or malicious events and normal behavior of network systems. Anomaly detection can be considered as a classification problem where it builds models of normal network behavior, which it uses to detect new patterns that significantly deviate from the model. Most of the cur- rent research on anomaly detection is based on the learning of normally and anomaly behaviors. They do not take into account the previous, re- cent events to detect the new incoming one. In this paper, we propose a real time collective anomaly detection model based on neural network learning and feature operating. Normally a Long Short Term Memory Recurrent Neural Network (LSTM RNN) is trained only on normal data and it is capable of predicting several time steps ahead of an input. In our approach, a LSTM RNN is trained with normal time series data before performing a live prediction for each time step. Instead of considering each time step separately, the observation of prediction errors from a certain number of time steps is now proposed as a new idea for detecting collective anomalies. The prediction errors from a number of the latest time steps above a threshold will indicate a collective anomaly. The model is built on a time series version of the KDD 1999 dataset. The experiments demonstrate that it is possible to offer reliable and efficient for collective anomaly detection. △ Less

Submitted 28 March, 2017; originally announced March 2017.

arXiv:1703.08535 [pdf, other]

doi 10.1145/3067695.3082469

PonyGE2: Grammatical Evolution in Python

Authors: Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Michael O'Neill, Erik Hemberg

Abstract: Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype map** process. PonyGE2 is an open source implementation of GE in Python, developed at UCD's Natural Computing Research and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a r… ▽ More Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype map** process. PonyGE2 is an open source implementation of GE in Python, developed at UCD's Natural Computing Research and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototy** medium for our own experiments, and a Python workout. As well as providing the characteristic genotype to phenotype map** of GE, a search algorithm engine is also provided. A number of sample problems and tutorials on how to use and adapt PonyGE2 have been developed. △ Less

Submitted 26 April, 2017; v1 submitted 24 March, 2017; originally announced March 2017.

Comments: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on Evolutionary Computation Software Systems (EvoSoft)

Journal ref: In Proceedings of GECCO '17 Companion, Berlin, Germany, July 15-19, 2017, 8 pages

arXiv:1701.07138 [pdf, other]

Learning Mid-Level Auditory Codes from Natural Sound Statistics

Authors: Wiktor Młynarski, Josh H. McDermott

Abstract: Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low… ▽ More Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. Although models exist in the visual domain to explain how mid-level features such as junctions and curves might be derived from oriented filters in early visual cortex, little is known about analogous grou** principles for mid-level auditory representations. We propose a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first layer coefficients. Because second-layer features are sensitive to combinations of spectrotemporal features, the representation they support encodes more complex acoustic patterns than the first layer. When trained on corpora of speech and environmental sounds, some second-layer units learned to group spectrotemporal features that occur together in natural sounds. Others instantiate opponency between dissimilar sets of spectrotemporal features. Such grou**s might be instantiated by neurons in the auditory cortex, providing a hypothesis for mid-level neuronal computation. △ Less

Submitted 14 October, 2017; v1 submitted 24 January, 2017; originally announced January 2017.

Comments: 38 pages, 12 figures

arXiv:1610.05983 [pdf]

Tidal Frequencies in the Time Series Measurements of Atmospheric Muon Flux from Cosmic Rays

Authors: H. Takai, C. Feldman, M. Minelli, J. Sundermier, G. Winters, M. K. Russ, J. Dodaro, A. Varshney, C. J. McIlwaine, T. Tomaszewski, J. Tomaszewski, R. Warasila, J. McDermott, U. Khan, K. Chaves, O. Kassim, J. Ripka

Abstract: Tidal frequencies are detected in time series muon flux measurements performed over a period of eight years. Meson production and subsequent decay produce the muons that are observed at ground level. We interpret the periodic behavior as a consequence of high altitude density variations at the point of meson production. These variations are driven by solar thermal cycles. The detected frequencies… ▽ More Tidal frequencies are detected in time series muon flux measurements performed over a period of eight years. Meson production and subsequent decay produce the muons that are observed at ground level. We interpret the periodic behavior as a consequence of high altitude density variations at the point of meson production. These variations are driven by solar thermal cycles. The detected frequencies are in good agreement with published tidal frequencies and suggest that muons can be a complementary probe to the study of atmospheric tides at altitudes between 20 to 60 km. △ Less

Submitted 28 October, 2016; v1 submitted 19 October, 2016; originally announced October 2016.

Comments: 16 pages, 6 figures, 2 tables

arXiv:1608.07017 [pdf, other]

Ambient Sound Provides Supervision for Visual Learning

Authors: Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

Abstract: The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throug… ▽ More The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. △ Less

Submitted 5 December, 2016; v1 submitted 25 August, 2016; originally announced August 2016.

Comments: ECCV 2016

arXiv:1512.08512 [pdf, other]

Visually Indicated Sounds

Authors: Andrew Owens, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, William T. Freeman

Abstract: Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people… ▽ More Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions. △ Less

Submitted 29 April, 2016; v1 submitted 28 December, 2015; originally announced December 2015.

arXiv:1305.6821 [pdf, other]

Nonlinear dynamics of running: Speed, stability, symmetry and the effects of leg amputations

Authors: Nicole Look, Christopher J. Arellano, Alena M. Grabowski, William J. McDermott, Rodger Kram, Elizabeth Bradley

Abstract: In this paper, we study dynamic stability during running, focusing on the effects of speed and the use of a leg prosthesis. We compute and compare the maximal Lyapunov exponents of kinematic time-series data from subjects with and without unilateral transtibial amputations running at a wide range of speeds. We find that the dynamics of the affected leg with the running-specific prosthesis are less… ▽ More In this paper, we study dynamic stability during running, focusing on the effects of speed and the use of a leg prosthesis. We compute and compare the maximal Lyapunov exponents of kinematic time-series data from subjects with and without unilateral transtibial amputations running at a wide range of speeds. We find that the dynamics of the affected leg with the running-specific prosthesis are less stable than the dynamics of the unaffected leg, and also less stable than the biological legs of the non-amputee runners. Surprisingly, we find that the center-of-mass dynamics of runners with two intact biological legs are slightly less stable than those of runners with amputations. Our results suggest that while leg asymmetries may be associated with instability, runners may compensate for this effect by increased control of their center-of-mass dynamics. △ Less

Submitted 29 May, 2013; originally announced May 2013.

arXiv:math/9911244 [pdf, ps, other]

On some two parameter Quantum and Jordanian deformations, and their Coloured extensions

Authors: Deepak Parashar, Roger J. McDermott

Abstract: This paper suveys some recent algebraic developments in two parameter Quantum deformations and their Nonstandard (or Jordanian) counterparts. In particular, we discuss the contraction procedure and the quantum group homomorphisms associated to these deformations. The scheme is then set in the wider context of the coloured extensions of these deformations, namely, the so-called Coloured Quantum G… ▽ More This paper suveys some recent algebraic developments in two parameter Quantum deformations and their Nonstandard (or Jordanian) counterparts. In particular, we discuss the contraction procedure and the quantum group homomorphisms associated to these deformations. The scheme is then set in the wider context of the coloured extensions of these deformations, namely, the so-called Coloured Quantum Groups. △ Less

Submitted 30 November, 1999; originally announced November 1999.

Comments: 10 pages LaTeX, Contribution to Proceedings of "LMS - Durham Symposium on Quantum Groups", Durham, July 19 - 29, 1999

Journal ref: LMS Lecture Note Series 290 (CUP 2001), pp. 206-215

arXiv:math/9911194 [pdf, ps, other]

doi 10.1063/1.533248

Contraction of the G_r,s Quantum Group to its Nonstandard analogue and corresponding Coloured Quantum Groups

Authors: Deepak Parashar, Roger J. McDermott

Abstract: The quantum group G_r,s provides a realisation of the two parameter quantum GL_p,q(2) which is known to be related to the two parameter nonstandard GL_hh'(2) group via a contraction method. We apply the contraction procedure to G_r,s and obtain a new Jordanian quantum group G_m,k. Furthermore, we provide a realisation of GL_h,h'(2) in terms of G_m,k. The contraction procedure is then extended to… ▽ More The quantum group G_r,s provides a realisation of the two parameter quantum GL_p,q(2) which is known to be related to the two parameter nonstandard GL_hh'(2) group via a contraction method. We apply the contraction procedure to G_r,s and obtain a new Jordanian quantum group G_m,k. Furthermore, we provide a realisation of GL_h,h'(2) in terms of G_m,k. The contraction procedure is then extended to the coloured quantum group GL_r{λ,μ}(2) to yield a new Jordanian quantum group GL_m{λ,μ}(2). Both G_r,s and G_m,k are then generalised to their coloured versions which inturn provide similar realisations of GL_r{λ,μ}(2) and GL_m{λ,μ}(2). △ Less

Submitted 24 November, 1999; originally announced November 1999.

Comments: 22 pages LaTeX, to be published in J. Math. Phys

Journal ref: J. Math. Phys. 41 (2000) 2403-2416

arXiv:math/9909045 [pdf, ps, other]

doi 10.1023/A:1022845603408

Inhomogeneous Multiparameter Jordanian Quantum Groups by Contraction

Authors: Roger J. McDermott, Deepak Parashar

Abstract: It is known that the inhomogeneous quantum group IGL_{q,r}(2) can be constructed as a quotient of the multiparameter q-deformation of GL(3). We show that a similar result holds for the inhomogeneous Jordanian deformation and exhibit its Hopf structure. It is known that the inhomogeneous quantum group IGL_{q,r}(2) can be constructed as a quotient of the multiparameter q-deformation of GL(3). We show that a similar result holds for the inhomogeneous Jordanian deformation and exhibit its Hopf structure. △ Less

Submitted 8 September, 1999; originally announced September 1999.

Comments: 6 pages LaTeX, Contribution to Proceedings of "8th International Colloquium on Quantum Groups and Integrable Systems", Prague, June 17-19, 1999

Journal ref: Czech. J. Phys. 50 (2000) 145-150

arXiv:math/9909001 [pdf, ps, other]

doi 10.1023/A:1022849704316

Realisations of Quantum GL_p,q(2) and Jordanian GL_h,h'(2)

Authors: Deepak Parashar, Roger J. McDermott

Abstract: The quantum group GL_p,q(2) is known to be related to the Jordanian GL_h,h'(2) via a contraction procedure. It can also be realised using the generators of the Hopf algebra G_r,s. We contract the G_r,s quantum group to obtain its Jordanian analogue G_m,k, which provides a realisation of GL_h,h'(2) in a manner similar to the q-deformed case. The quantum group GL_p,q(2) is known to be related to the Jordanian GL_h,h'(2) via a contraction procedure. It can also be realised using the generators of the Hopf algebra G_r,s. We contract the G_r,s quantum group to obtain its Jordanian analogue G_m,k, which provides a realisation of GL_h,h'(2) in a manner similar to the q-deformed case. △ Less

Submitted 1 September, 1999; originally announced September 1999.

Comments: 6 pages LaTex, Contribution to Proceedings of "8th International Colloquium on Quantum Groups and Integrable Systems", Prague, June 17 - 19, 1999

Journal ref: Czech.J.Phys.50:157-162,2000

arXiv:math/9901132 [pdf, ps, other]

Duality for the G_r,s Quantum Group

Authors: Deepak Parashar, Roger J. McDermott

Abstract: The two parameter quantum group G_r,s is generated by five elements, four of which form a Hopf subalgebra isomorphic to GL_q(2), while the fifth generator relates G_r,s to GL_p,q(2). We construct explicitly the dual algebra of G_r,s and show that it is isomorphic to the single parameter deformation of gl(2) + gl(1), with the second parameter appearing in the costructure. We also formulate a diff… ▽ More The two parameter quantum group G_r,s is generated by five elements, four of which form a Hopf subalgebra isomorphic to GL_q(2), while the fifth generator relates G_r,s to GL_p,q(2). We construct explicitly the dual algebra of G_r,s and show that it is isomorphic to the single parameter deformation of gl(2) + gl(1), with the second parameter appearing in the costructure. We also formulate a differential calculus on G_r,s which provides a realisation of the calculus on GL_p,q(2). △ Less

Submitted 29 November, 1999; v1 submitted 28 January, 1999; originally announced January 1999.

Comments: 25 pages LaTeX, Includes the R- matrix approach and related differential calculus

Report number: Preprint RIMS - 1260 (1999)

Showing 1–31 of 31 results for author: McDermott, J