Search | arXiv e-print repository

Pretrained Hybrids with MAD Skills

Authors: Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala

Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: i… ▽ More While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose $\textbf{Manticore}$, a framework that addresses these challenges. Manticore $\textit{automates the design of hybrid architectures}$ while reusing pretrained models to create $\textit{pretrained}$ hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to $\textit{program}$ pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00166 [pdf, other]

On complexity of colloid cellular automata

Authors: Andrew Adamatzky, Nic Roberts, Raphael Fortulan, Noushin Raeisi Kheirabadi, Panagiotis Mougkogiannis, Michail-Antisthenis Tsompanas, Genaro J. Martinez, Georgios Ch. Sirakoulis, Alessandro Chiolerio

Abstract: The colloid cellular automata do not imitate the physical structure of colloids but are governed by logical functions derived from the colloids. We analyse the space-time complexity of Boolean circuits derived from the electrical responses of colloids: ZnO (zinc oxide, an inorganic compound also known as calamine or zinc white, which naturally occurs as the mineral zincite), proteinoids (microsphe… ▽ More The colloid cellular automata do not imitate the physical structure of colloids but are governed by logical functions derived from the colloids. We analyse the space-time complexity of Boolean circuits derived from the electrical responses of colloids: ZnO (zinc oxide, an inorganic compound also known as calamine or zinc white, which naturally occurs as the mineral zincite), proteinoids (microspheres and crystals of thermal abiotic proteins), and combinations thereof to electrical stimulation. To extract Boolean circuits from colloids, we send all possible configurations of two-, four-, and eight-bit binary strings, encoded as electrical potential values, to the colloids, record their responses, and thereby infer the Boolean functions they implement. We map the discovered functions onto the cell-state transition rules of cellular automata (arrays of binary state machines that update their states synchronously according to the same rule) -- the colloid cellular automata. We then analyse the phenomenology of the space-time configurations of the automata and evaluate their complexity using measures such as compressibility, Shannon entropy, Simpson diversity, and expressivity. A hierarchy of phenomenological and measurable space-time complexity is constructed. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2307.14430 [pdf, other]

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

Authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré

Abstract: The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when le… ▽ More The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.12226 [pdf, other]

Geometry-Aware Adaptation for Pretrained Models

Authors: Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala

Abstract: Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of… ▽ More Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swap** argmax with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. △ Less

Submitted 27 November, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.02664 [pdf, other]

Logical circuits in colloids

Authors: Nic Roberts, Noushin Raeisi Kheirabadi, Michail-Antisthenis Tsompanas, Alessandro Chiolerio, Marco Crepaldi, Andrew Adamatzky

Abstract: Colloid-based computing devices offer remarkable fault tolerance and adaptability to varying environmental conditions due to their amorphous structure. An intriguing observation is that a colloidal suspension of ZnO nanoparticles in DMSO exhibits reconfiguration when exposed to electrical stimulation and produces spikes of electrical potential in response. This study presents a novel laboratory pr… ▽ More Colloid-based computing devices offer remarkable fault tolerance and adaptability to varying environmental conditions due to their amorphous structure. An intriguing observation is that a colloidal suspension of ZnO nanoparticles in DMSO exhibits reconfiguration when exposed to electrical stimulation and produces spikes of electrical potential in response. This study presents a novel laboratory prototype of a ZnO colloidal computer, showcasing its capability to implement various Boolean functions featuring two, four, and eight inputs. During our experiments, we input binary strings into the colloid mixture, where a logical ``True" state is represented by an impulse of an electrical potential. In contrast, the absence of the electrical impulse denotes a logical ``False" state. The electrical responses of the colloid mixture are recorded, allowing us to extract truth tables from the recordings. Through this methodological approach, we demonstrate the successful implementation of a wide range of logical functions using colloidal mixtures. We provide detailed distributions of the logical functions discovered and offer speculation on the potential impacts of our findings on future and emerging unconventional computing technologies. This research highlights the exciting possibilities of colloid-based computing and paves the way for further advancements. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2304.10675 [pdf, other]

Propagation of electrical signals by fungi

Authors: Richard Mayne, Nic Roberts, Neil Phillips, Roshan Weerasekera, Andrew Adamatzky

Abstract: Living fungal mycelium networks are proven to have properties of memristors, capacitors and various sensors. To further progress our designs in fungal electronics we need to evaluate how electrical signals can be propagated through mycelium networks. We investigate the ability of mycelium-bound composites to convey electrical signals, thereby enabling the transmission of frequency-modulated inform… ▽ More Living fungal mycelium networks are proven to have properties of memristors, capacitors and various sensors. To further progress our designs in fungal electronics we need to evaluate how electrical signals can be propagated through mycelium networks. We investigate the ability of mycelium-bound composites to convey electrical signals, thereby enabling the transmission of frequency-modulated information through mycelium networks. Mycelia were found to reliably transfer signals with a recoverable frequency comparable to the input, in the \SIrange{100}{10000} {\hertz} frequency range. Mycelial adaptive responses, such as tissue repair, may result in fragile connections, however. While the mean amplitude of output signals was not reproducible among replicate experiments exposed to the same input frequency, the variance across groups was highly consistent. Our work is supported by NARX modelling through which an approximate transfer function was derived. These findings advance the state of the art of using mycelium-bound composites in analogue electronics and unconventional computing. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2211.13375 [pdf, other]

Lifting Weak Supervision To Structured Prediction

Authors: Harit Vishwakarma, Nicholas Roberts, Frederic Sala

Abstract: Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolab… ▽ More Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.03324 [pdf, other]

AutoML for Climate Change: A Call to Action

Authors: Renbo Tu, Nicholas Roberts, Vishak Prasad, Sibasis Nayak, Paarth Jain, Frederic Sala, Ganesh Ramakrishnan, Ameet Talwalkar, Willie Neiswanger, Colin White

Abstract: The challenge that climate change poses to humanity has spurred a rapidly develo** field of artificial intelligence research focused on climate change applications. The climate change AI (CCAI) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML)… ▽ More The challenge that climate change poses to humanity has spurred a rapidly develo** field of artificial intelligence research focused on climate change applications. The climate change AI (CCAI) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML) techniques to automatically find high-performing architectures and hyperparameters for a given dataset. In this work, we benchmark popular AutoML libraries on three high-leverage CCAI applications: climate modeling, wind power forecasting, and catalyst discovery. We find that out-of-the-box AutoML libraries currently fail to meaningfully surpass the performance of human-designed CCAI models. However, we also identify a few key weaknesses, which stem from the fact that most AutoML techniques are tailored to computer vision and NLP applications. For example, while dozens of search spaces have been designed for image and language data, none have been designed for spatiotemporal data. Addressing these key weaknesses can lead to the discovery of novel architectures that yield substantial performance gains across numerous CCAI applications. Therefore, we present a call to action to the AutoML community, since there are a number of concrete, promising directions for future work in the space of AutoML for CCAI. We release our code and a list of resources at https://github.com/climate-change-automl/climate-change-automl. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2208.14362 [pdf, other]

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Authors: Nicholas Roberts, Xintong Li, Tzu-Heng Huang, Dyah Adila, Spencer Schoenberg, Cheng-Yu Liu, Lauren Pick, Haotian Ma, Aws Albarghouthi, Frederic Sala

Abstract: Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of construc… ▽ More Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods. △ Less

Submitted 24 November, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2203.12023 [pdf, other]

Generative Modeling Helps Weak Supervision (and Vice Versa)

Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

Abstract: Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving… ▽ More Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples. △ Less

Submitted 11 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Published as a conference paper at ICLR 2023

ACM Class: I.2.0; I.4.m

arXiv:2112.07236 [pdf, other]

Logics in fungal mycelium networks

Authors: Andrew Adamatzky, Phil Ayres, Alexander E. Beasley, Nic Roberts, Martin Tegelaar, Michail-Antisthenis Tsompanas, Han A. B. Wösten

Abstract: The living mycelium networks are capable of efficient sensorial fusion over very large areas and distributed decision making. The information processing in the mycelium networks is implemented via propagation of electrical and chemical signals en pair with morphological changes in the mycelium structure. These information processing mechanisms are manifested in experimental laboratory findings tha… ▽ More The living mycelium networks are capable of efficient sensorial fusion over very large areas and distributed decision making. The information processing in the mycelium networks is implemented via propagation of electrical and chemical signals en pair with morphological changes in the mycelium structure. These information processing mechanisms are manifested in experimental laboratory findings that show that the mycelium networks exhibit rich dynamics of neuron-like spiking behaviour and a wide range of non-linear electrical properties. On an example of a single real colony of \emph{Aspergillus niger}, we demonstrate that the non-linear transformation of electrical signals and trains of extracellular voltage spikes can be used to implement logical gates and circuits. The approaches adopted include numerical modelling of excitation propagation on the mycelium network, representation of the mycelium network as a resistive and capacitive (RC) network and an experimental laboratory study on mining logical circuits in mycelium bound composites. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: To be published in special issue of Logica Universalis --- "Logic, Spatial Algorithms and Visual Reasoning", edited by Andrew Schumann and Jerzy Król, 2022

arXiv:2112.03865 [pdf, other]

Universalizing Weak Supervision

Authors: Changho Shin, Winfred Li, Harit Vishwakarma, Nicholas Roberts, Frederic Sala

Abstract: Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new lab… ▽ More Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space. Theoretically, our synthesis approach produces a consistent estimators for learning some challenging but important generalizations of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds. △ Less

Submitted 29 November, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: ICLR 2022

arXiv:2112.02721 [pdf, other]

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, **ho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter). △ Less

Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

arXiv:2111.11231 [pdf, other]

Fungal electronics

Authors: Andrew Adamatzky, Phil Ayres, Alexander E. Beasley, Alessandro Chiolerio, Mohammad M. Dehshibi, Antoni Gandia, Elena Albergati, Richard Mayne, Anna Nikolaidou, Nic Roberts, Martin Tegelaar, Michail-Antisthenis Tsompanas, Neil Phillips, Han A. B. Wösten

Abstract: Fungal electronics is a family of living electronic devices made of mycelium bound composites or pure mycelium. Fungal electronic devices are capable of changing their impedance and generating spikes of electrical potential in response to external control parameters. Fungal electronics can be embedded into fungal materials and wearables or used as stand alone sensing and computing devices. Fungal electronics is a family of living electronic devices made of mycelium bound composites or pure mycelium. Fungal electronic devices are capable of changing their impedance and generating spikes of electrical potential in response to external control parameters. Fungal electronics can be embedded into fungal materials and wearables or used as stand alone sensing and computing devices. △ Less

Submitted 22 November, 2021; originally announced November 2021.

arXiv:2110.05668 [pdf, other]

NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

Authors: Renbo Tu, Nicholas Roberts, Mikhail Khodak, Junhong Shen, Frederic Sala, Ameet Talwalkar

Abstract: Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, e.g. image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to… ▽ More Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, e.g. image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how NAS-Bench-360 and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360.ml.cmu.edu. △ Less

Submitted 19 January, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2108.05336 [pdf, other]

Mining logical circuits in fungi

Authors: Nic Roberts, Andrew Adamatzky

Abstract: Living substrates are capable for nontrivial map**s of electrical signals due to the substrate nonlinear electrical characteristics. This property can be used to realise Boolean functions. Input logical values are represented by amplitude or frequency of electrical stimuli. Output logical values are decoded from electrical responses of living substrates. We demonstrate how logical circuits can b… ▽ More Living substrates are capable for nontrivial map**s of electrical signals due to the substrate nonlinear electrical characteristics. This property can be used to realise Boolean functions. Input logical values are represented by amplitude or frequency of electrical stimuli. Output logical values are decoded from electrical responses of living substrates. We demonstrate how logical circuits can be implemented in mycelium bound composites. The mycelium bound composites (fungal materials) are getting growing recognition as building, packaging, decoration and clothing materials. Presently the fungal materials are passive. To make the fungal materials adaptive, i.e. sensing and computing, we should embed logical circuits into them. We demonstrate experimental laboratory prototypes of many-input Boolean functions implemented in fungal materials from oyster fungi \emph{P. ostreatus}. We characterise complexity of the functions discovered via complexity of the space-time configurations of one-dimensional cellular automata governed by the functions. We show that the mycelium bound composites can implement representative functions from all classes of cellular automata complexity including the computationally universal. The results presented will make an impact in the field of unconventional computing, experimental demonstration of purposeful computing with fungi, and in the field of intelligent materials, as the prototypes of computing mycelium bound composites. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2103.15798 [pdf, other]

Rethinking Neural Operations for Diverse Tasks

Authors: Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, Ameet Talwalkar

Abstract: An important goal of AutoML is to automate-away the design of neural networks on new tasks in under-explored domains. Motivated by this goal, we study the problem of enabling users to discover the right neural operations given data from their specific domain. We introduce a search space of operations called XD-Operations that mimic the inductive bias of standard multi-channel convolutions while be… ▽ More An important goal of AutoML is to automate-away the design of neural networks on new tasks in under-explored domains. Motivated by this goal, we study the problem of enabling users to discover the right neural operations given data from their specific domain. We introduce a search space of operations called XD-Operations that mimic the inductive bias of standard multi-channel convolutions while being much more expressive: we prove that it includes many named operations across multiple application areas. Starting with any standard backbone such as ResNet, we show how to transform it into a search space over XD-operations and how to traverse the space using a simple weight-sharing scheme. On a diverse set of tasks -- solving PDEs, distance prediction for protein folding, and music modeling -- our approach consistently yields models with lower error than baseline networks and often even lower error than expert-designed domain-specific approaches. △ Less

Submitted 4 November, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: NeurIPS 2021

arXiv:2011.13477 [pdf, other]

Decoding and Diversity in Machine Translation

Authors: Nicholas Roberts, Davis Liang, Graham Neubig, Zachary C. Lipton

Abstract: Neural Machine Translation (NMT) systems are typically evaluated using automated metrics that assess the agreement between generated translations and ground truth candidates. To improve systems with respect to these metrics, NLP researchers employ a variety of heuristic techniques, including searching for the conditional mode (vs. sampling) and incorporating various training heuristics (e.g., labe… ▽ More Neural Machine Translation (NMT) systems are typically evaluated using automated metrics that assess the agreement between generated translations and ground truth candidates. To improve systems with respect to these metrics, NLP researchers employ a variety of heuristic techniques, including searching for the conditional mode (vs. sampling) and incorporating various training heuristics (e.g., label smoothing). While search strategies significantly improve BLEU score, they yield deterministic outputs that lack the diversity of human translations. Moreover, search tends to bias the distribution of translated gender pronouns. This makes human-level BLEU a misleading benchmark in that modern MT systems cannot approach human-level BLEU while simultaneously maintaining human-level translation diversity. In this paper, we characterize distributional differences between generated and real translations, examining the cost in diversity paid for the BLEU scores enjoyed by NMT. Moreover, our study implicates search as a salient source of known bias when translating gender pronouns. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: Presented at the Resistance AI Workshop, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

arXiv:1912.08987 [pdf, other]

Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker

Authors: Nicholas Roberts, Vinay Uday Prabhu, Matthew McAteer

Abstract: This paper explores the scenarios under which an attacker can claim that 'Noise and access to the softmax layer of the model is all you need' to steal the weights of a convolutional neural network whose architecture is already known. We were able to achieve 96% test accuracy using the stolen MNIST model and 82% accuracy using the stolen KMNIST model learned using only i.i.d. Bernoulli noise inputs… ▽ More This paper explores the scenarios under which an attacker can claim that 'Noise and access to the softmax layer of the model is all you need' to steal the weights of a convolutional neural network whose architecture is already known. We were able to achieve 96% test accuracy using the stolen MNIST model and 82% accuracy using the stolen KMNIST model learned using only i.i.d. Bernoulli noise inputs. We posit that this theft-susceptibility of the weights is indicative of the complexity of the dataset and propose a new metric that captures the same. The goal of this dissemination is to not just showcase how far knowing the architecture can take you in terms of model stealing, but to also draw attention to this rather idiosyncratic weight learnability aspects of CNNs spurred by i.i.d. noise input. We also disseminate some initial results obtained with using the Ising probability distribution in lieu of the i.i.d. Bernoulli distribution. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: Presented at the Security and Privacy of Machine Learning Workshop, 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA

arXiv:1912.08986 [pdf, other]

Deep Connectomics Networks: Neural Network Architectures Inspired by Neuronal Networks

Authors: Nicholas Roberts, Dian Ang Yap, Vinay Uday Prabhu

Abstract: The interplay between inter-neuronal network topology and cognition has been studied deeply by connectomics researchers and network scientists, which is crucial towards understanding the remarkable efficacy of biological neural networks. Curiously, the deep learning revolution that revived neural networks has not paid much attention to topological aspects. The architectures of deep neural networks… ▽ More The interplay between inter-neuronal network topology and cognition has been studied deeply by connectomics researchers and network scientists, which is crucial towards understanding the remarkable efficacy of biological neural networks. Curiously, the deep learning revolution that revived neural networks has not paid much attention to topological aspects. The architectures of deep neural networks (DNNs) do not resemble their biological counterparts in the topological sense. We bridge this gap by presenting initial results of Deep Connectomics Networks (DCNs) as DNNs with topologies inspired by real-world neuronal networks. We show high classification accuracy obtained by DCNs whose architecture was inspired by the biological neuronal networks of C. Elegans and the mouse visual cortex. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: Presented at the Real Neurons & Hidden Units Workshop, 33rd Conference on Neural Information ProcessingSystems (NeurIPS 2019), Vancouver, Canada

arXiv:1911.07418 [pdf, other]

Grassmannian Packings in Neural Networks: Learning with Maximal Subspace Packings for Diversity and Anti-Sparsity

Authors: Dian Ang Yap, Nicholas Roberts, Vinay Uday Prabhu

Abstract: Kernel sparsity ("dying ReLUs") and lack of diversity are commonly observed in CNN kernels, which decreases model capacity. Drawing inspiration from information theory and wireless communications, we demonstrate the intersection of coding theory and deep learning through the Grassmannian subspace packing problem in CNNs. We propose Grassmannian packings for initial kernel layers to be initialized… ▽ More Kernel sparsity ("dying ReLUs") and lack of diversity are commonly observed in CNN kernels, which decreases model capacity. Drawing inspiration from information theory and wireless communications, we demonstrate the intersection of coding theory and deep learning through the Grassmannian subspace packing problem in CNNs. We propose Grassmannian packings for initial kernel layers to be initialized maximally far apart based on chordal or Fubini-Study distance. Convolutional kernels initialized with Grassmannian packings exhibit diverse features and obtain diverse representations. We show that Grassmannian packings, especially in the initial layers, address kernel sparsity and encourage diversity, while improving classification accuracy across shallow and deep CNNs with better convergence rates. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: Presented at Bayesian Deep Learning and Workshop on Information Theory and Machine Learning, 33rd Conference on Neural Information ProcessingSystems (NeurIPS 2019), Vancouver, Canada

arXiv:1711.04853 [pdf, other]

doi 10.1364/JOSAA.35.000690

Denoising Imaging Polarimetry by an Adapted BM3D Method

Authors: Alexander B. Tibbs, Ilse M. Daly, Nicholas W. Roberts, David R. Bull

Abstract: Imaging polarimetry allows more information to be extracted from a scene than conventional intensity or colour imaging. However, a major challenge of imaging polarimetry is image degradation due to noise. This paper investigates the mitigation of noise through denoising algorithms and compares existing denoising algorithms with a new method, based on BM3D. This algorithm, PBM3D, gives visual quali… ▽ More Imaging polarimetry allows more information to be extracted from a scene than conventional intensity or colour imaging. However, a major challenge of imaging polarimetry is image degradation due to noise. This paper investigates the mitigation of noise through denoising algorithms and compares existing denoising algorithms with a new method, based on BM3D. This algorithm, PBM3D, gives visual quality superior to the state of the art across all images and noise standard deviations tested. We show that denoising polarization images using PBM3D allows the degree of polarization to be more accurately calculated by comparing it to spectroscopy methods. △ Less

Submitted 16 November, 2017; v1 submitted 13 November, 2017; originally announced November 2017.

arXiv:1608.02567 [pdf, other]

A Geometric Multigrid Preconditioning Strategy for DPG System Matrices

Authors: Nathan V. Roberts, Jesse Chan

Abstract: The discontinuous Petrov-Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan [15,17] guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general - though there are some results for Poisson, e.g. - is how best to precondition the DPG system matrix, so that iterative solvers… ▽ More The discontinuous Petrov-Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan [15,17] guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general - though there are some results for Poisson, e.g. - is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems. In this paper, we detail a strategy for preconditioning the DPG system matrix using geometric multigrid which we have implemented as part of Camellia [26], and demonstrate through numerical experiments its effectiveness in the context of several variational formulations. We observe that in some of our experiments, the behavior of the preconditioner is closely tied to the discrete test space enrichment. We include experiments involving adaptive meshes with hanging nodes for lid-driven cavity flow, demonstrating that the preconditioners can be applied in the context of challenging problems. We also include a scalability study demonstrating that the approach - and our implementation - scales well to many MPI ranks. △ Less

Submitted 8 August, 2016; originally announced August 2016.

arXiv:1306.1740 [pdf]

doi 10.5121/ijnsa.2013.5306

HTTPI Based Web Service Security over SOAP

Authors: Pankaj Choudhary, Rajendra Aaseri, Nirmal Roberts

Abstract: Now a days, a new family of web applications open applications, are emerging (e.g., Social Networking, News and Blogging). Generally, these open applications are non-confidential. The security needs of these applications are only client/server authentication and data integrity. For securing these open applications, effectively and efficiently, HTTPI, a new transport protocol is proposed, which ens… ▽ More Now a days, a new family of web applications open applications, are emerging (e.g., Social Networking, News and Blogging). Generally, these open applications are non-confidential. The security needs of these applications are only client/server authentication and data integrity. For securing these open applications, effectively and efficiently, HTTPI, a new transport protocol is proposed, which ensures the entire security requirements of open applications. Benefit of using the HTTPI is that it is economical in use, well-suited for cache proxies, like HTTP is, and provides security against many Internet attacks (Server Impersonation and Message Modification) like HTTPS does. In terms of performance HTTPI is very close to the HTTP, but much better than HTTPS. A Web service is a method of communication between two ends over the Internet. These web services are developed over XML and HTTP. Today, most of the open applications use web services for most of their operations. For securing these web services, security design based on HTTPI is proposed. Our work involves securing the web services over SOAP, based on the HTTPI. This secure web service might be applicable for open applications, where authentication and integrity is needed, but no confidentiality required. In our paper, we introduce a web service security model based on HTTPI protocol over SOAP and develop a preliminary implementation of this model. We also analyze the performance of our approach through an experiment and show that our proposed approach provides higher throughput, lower average response time and lower response size than HTTPS based web service security approach. △ Less

Submitted 7 June, 2013; originally announced June 2013.

Comments: International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.3, May 2013

Journal ref: Choudhary, P., Aaseri, R., Roberts, N., (2013) "HTTPI Based Web Service Security over SOAP", IJNSA, Vol.5, No.3, on pp. 55-66

Showing 1–25 of 25 results for author: Roberts, N