Search | arXiv e-print repository

On Trojans in Refined Language Models

Authors: Jayaram Raghuram, George Kesidis, David J. Miller

Abstract: A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a different defense scenario. Finally, we provide a brief survey of related attacks and defen… ▽ More A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a different defense scenario. Finally, we provide a brief survey of related attacks and defenses. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.07278 [pdf, other]

Human-interpretable clustering of short-text using large language models

Authors: Justin K. Miller, Tristram J. Alexander

Abstract: Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close… ▽ More Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: Main text: 18 pages, 8 figures. Supplementary: 21 pages, 15 figures, 3 tables

ACM Class: I.2.7

arXiv:2403.14128 [pdf, other]

Gen-T: Table Reclamation in Data Lakes

Authors: Grace Fan, Roee Shraga, Renée J. Miller

Abstract: We introduce the problem of Table Reclamation. Given a Source Table and a large table repository, reclamation finds a set of tables that, when integrated, reproduce the source table as closely as possible. Unlike query discovery problems like Query-by-Example or by-Target, Table Reclamation focuses on reclaiming the data in the Source Table as fully as possible using real tables that may be incomp… ▽ More We introduce the problem of Table Reclamation. Given a Source Table and a large table repository, reclamation finds a set of tables that, when integrated, reproduce the source table as closely as possible. Unlike query discovery problems like Query-by-Example or by-Target, Table Reclamation focuses on reclaiming the data in the Source Table as fully as possible using real tables that may be incomplete or inconsistent. To do this, we define a new measure of table similarity, called error-aware instance similarity, to measure how close a reclaimed table is to a Source Table, a measure grounded in instance similarity used in data exchange. Our search covers not only SELECT-PROJECT- JOIN queries, but integration queries with unions, outerjoins, and the unary operators subsumption and complementation that have been shown to be important in data integration and fusion. Using reclamation, a data scientist can understand if any tables in a repository can be used to exactly reclaim a tuple in the Source. If not, one can understand if this is due to differences in values or to incompleteness in the data. Our solution, Gen-T, performs table discovery to retrieve a set of candidate tables from the table repository, filters these down to a set of originating tables, then integrates these tables to reclaim the Source as closely as possible. We show that our solution, while approximate, is accurate, efficient and scalable in the size of the table repository with experiments on real data lakes containing up to 15K tables, where the average number of tuples varies from small (web tables) to extremely large (open data tables) up to 1M tuples. △ Less

Submitted 22 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: to appear at ICDE 2024

arXiv:2403.03896 [pdf, other]

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Authors: Tianshu Huang, John Miller, Akarsh Prabhakara, Tao **, Tarana Laroia, Zico Kolter, Anthony Rowe

Abstract: Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototy** of various algorithms for imaging, target detection, classification, and tracking. However, simulating realistic radar scans is a challenging task that requires an accurate model of the scene, radio frequency material properties, and a corresponding radar synthesis function. Rather than specifying… ▽ More Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototy** of various algorithms for imaging, target detection, classification, and tracking. However, simulating realistic radar scans is a challenging task that requires an accurate model of the scene, radio frequency material properties, and a corresponding radar synthesis function. Rather than specifying these models explicitly, we propose DART - Doppler Aided Radar Tomography, a Neural Radiance Field-inspired method which uses radar-specific physics to create a reflectance and transmittance-based rendering pipeline for range-Doppler images. We then evaluate DART by constructing a custom data collection platform and collecting a novel radar dataset together with accurate position and instantaneous velocity measurements from lidar-based localization. In comparison to state-of-the-art baselines, DART synthesizes superior radar range-Doppler images from novel views across all datasets and additionally can be used to generate high quality tomographic images. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: To appear in CVPR 2024; see https://wiselabcmu.github.io/dart/ for our project site

arXiv:2403.03816 [pdf, other]

Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters

Authors: John Joshua Miller, Simon Mak

Abstract: The optimization of a black-box simulator over control parameters $\mathbf{x}$ arises in a myriad of scientific applications. In such applications, the simulator often takes the form $f(\mathbf{x},\boldsymbolθ)$, where $\boldsymbolθ$ are parameters that are uncertain in practice. Robust optimization aims to optimize the objective $\mathbb{E}[f(\mathbf{x},\boldsymbolΘ)]$, where… ▽ More The optimization of a black-box simulator over control parameters $\mathbf{x}$ arises in a myriad of scientific applications. In such applications, the simulator often takes the form $f(\mathbf{x},\boldsymbolθ)$, where $\boldsymbolθ$ are parameters that are uncertain in practice. Robust optimization aims to optimize the objective $\mathbb{E}[f(\mathbf{x},\boldsymbolΘ)]$, where $\boldsymbolΘ \sim \mathcal{P}$ is a random variable that models uncertainty on $\boldsymbolθ$. For this, existing black-box methods typically employ a two-stage approach for selecting the next point $(\mathbf{x},\boldsymbolθ)$, where $\mathbf{x}$ and $\boldsymbolθ$ are optimized separately via different acquisition functions. As such, these approaches do not employ a joint acquisition over $(\mathbf{x},\boldsymbolθ)$, and thus may fail to fully exploit control-to-noise interactions for effective robust optimization. To address this, we propose a new Bayesian optimization method called Targeted Variance Reduction (TVR). The TVR leverages a novel joint acquisition function over $(\mathbf{x},\boldsymbolθ)$, which targets variance reduction on the objective within the desired region of improvement. Under a Gaussian process surrogate on $f$, the TVR acquisition can be evaluated in closed form, and reveals an insightful exploration-exploitation-precision trade-off for robust black-box optimization. The TVR can further accommodate a broad class of non-Gaussian distributions on $\mathcal{P}$ via a careful integration of normalizing flows. We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs under operational uncertainty. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.02327 [pdf, other]

Model Lakes

Authors: Koyena Pal, David Bau, Renée J. Miller

Abstract: Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue o… ▽ More Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue of finding, differentiating, and understanding models is becoming more crucial. Inspired from research on data lakes, we introduce and define the concept of model lakes. We discuss fundamental research challenges in the management of large models. And we discuss what principled data management techniques can be brought to bear on the study of large model management. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.09567 [pdf, other]

TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction

Authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Qiong Liu, Bo Zhou, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Lawrence H. Staib, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek

Abstract: Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for ea… ▽ More Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for early frames where intensity-based image registration techniques often fail. To address this issue, we propose a novel method called Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) that utilizes an all-to-one map** to convert early frames into those with tracer distribution similar to the last reference frame. The TAI-GAN consists of a feature-wise linear modulation layer that encodes channel-wise parameters generated from temporal information and rough cardiac segmentation masks with local shifts that serve as anatomical information. Our proposed method was evaluated on a clinical 82-Rb PET dataset, and the results show that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, the motion estimation accuracy and subsequent myocardial blood flow (MBF) quantification with both conventional and deep learning-based motion correction methods were improved compared to using the original frames. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Under revision at Medical Image Analysis

arXiv:2402.08946 [pdf, other]

Measuring Sharpness in Grokking

Authors: Jack Miller, Patrick Gleeson, Charles O'Neill, Thang Bui, Noam Levi

Abstract: Neural networks sometimes exhibit grokking, a phenomenon where perfect or near-perfect performance is achieved on a validation set well after the same performance has been obtained on the corresponding training set. In this workshop paper, we introduce a robust technique for measuring grokking, based on fitting an appropriate functional form. We then use this to investigate the sharpness of transi… ▽ More Neural networks sometimes exhibit grokking, a phenomenon where perfect or near-perfect performance is achieved on a validation set well after the same performance has been obtained on the corresponding training set. In this workshop paper, we introduce a robust technique for measuring grokking, based on fitting an appropriate functional form. We then use this to investigate the sharpness of transitions in training and validation accuracy under two settings. The first setting is the theoretical framework developed by Levi et al. (2023) where closed form expressions are readily accessible. The second setting is a two-layer MLP trained to predict the parity of bits, with grokking induced by the concealment strategy of Miller et al. (2023). We find that trends between relative grokking gap and grokking sharpness are similar in both settings when using absolute and relative measures of sharpness. Reflecting on this, we make progress toward explaining some trends and identify the need for further study to untangle the various mechanisms which influence the sharpness of grokking. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.02034 [pdf, other]

Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Authors: Xi Li, Hang Wang, David J. Miller, George Kesidis

Abstract: A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended… ▽ More A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on benchmark CIFAR-10 and CIFAR-100 image classifiers. △ Less

Submitted 22 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.14973 [pdf, other]

Discovering group dynamics in synchronous time series via hierarchical recurrent switching-state models

Authors: Michael Wojnowicz, Preetish Rath, Eric Miller, Jeffrey Miller, Clifford Hancock, Meghan O'Donovan, Seth Elkin-Frankston, Thaddeus Brunye, Michael C. Hughes

Abstract: We seek to model a collection of time series arising from multiple entities interacting over the same time period. Recent work focused on modeling individual time series is inadequate for our intended applications, where collective system-level behavior influences the trajectories of individual entities. To address such problems, we present a new hierarchical switching-state model that can be trai… ▽ More We seek to model a collection of time series arising from multiple entities interacting over the same time period. Recent work focused on modeling individual time series is inadequate for our intended applications, where collective system-level behavior influences the trajectories of individual entities. To address such problems, we present a new hierarchical switching-state model that can be trained in an unsupervised fashion to simultaneously explain both system-level and individual-level dynamics. We employ a latent system-level discrete state Markov chain that drives latent entity-level chains which in turn govern the dynamics of each observed time series. Feedback from the observations to the chains at both the entity and system levels improves flexibility via context-dependent state transitions. Our hierarchical switching recurrent dynamical models can be learned via closed-form variational coordinate ascent updates to all latent chains that scale linearly in the number of individual time series. This is asymptotically no more costly than fitting separate models for each entity. Experiments on synthetic and real datasets show that our model can produce better forecasts of future entity behavior than existing methods. Moreover, the availability of latent state chains at both the entity and system level enables interpretation of group dynamics. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.13912 [pdf, other]

A Survey of Deep Learning and Foundation Models for Time Series Forecasting

Authors: John A. Miller, Mohammed Aldosari, Farah Saeed, Nasid Habib Barna, Subas Rana, I. Budak Arpinar, Ninghao Liu

Abstract: Deep Learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. For example, in the well-known Makridakis (M) Competitions, hybrids of traditional statistical or machine learning techniques have only recently become the top performers. With the recent architectural advances in deep learning being applied to time seri… ▽ More Deep Learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. For example, in the well-known Makridakis (M) Competitions, hybrids of traditional statistical or machine learning techniques have only recently become the top performers. With the recent architectural advances in deep learning being applied to time series forecasting (e.g., encoder-decoders with attention, transformers, and graph neural networks), deep learning has begun to show significant advantages. Still, in the area of pandemic prediction, there remain challenges for deep learning models: the time series is not long enough for effective training, unawareness of accumulated scientific knowledge, and interpretability of the model. To this end, the development of foundation models (large deep learning models with extensive pre-training) allows models to understand patterns and acquire knowledge that can be applied to new related problems before extensive training data becomes available. Furthermore, there is a vast amount of knowledge available that deep learning models can tap into, including Knowledge Graphs and Large Language Models fine-tuned with scientific domain knowledge. There is ongoing research examining how to utilize or inject such knowledge into deep learning models. In this survey, several state-of-the-art modeling techniques are reviewed, and suggestions for further work are provided. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2311.12676 [pdf, other]

Minimal covers in the Weihrauch degrees

Authors: Steffen Lempp, Joseph S. Miller, Arno Pauly, Mariya I. Soskova, Manlio Valenti

Abstract: In this paper, we study the existence of minimal covers and strong minimal covers in the Weihrauch degrees. We characterize when a problem $f$ is a minimal cover or strong minimal cover of a problem $h$. We show that strong minimal covers only exist in the cone below $\mathsf{id}$ and that the Weihrauch lattice above $\mathsf{id}$ is dense. From this, we conclude that the degree of $\mathsf{id}$ i… ▽ More In this paper, we study the existence of minimal covers and strong minimal covers in the Weihrauch degrees. We characterize when a problem $f$ is a minimal cover or strong minimal cover of a problem $h$. We show that strong minimal covers only exist in the cone below $\mathsf{id}$ and that the Weihrauch lattice above $\mathsf{id}$ is dense. From this, we conclude that the degree of $\mathsf{id}$ is first-order definable in the Weihrauch degrees and that the first-order theory of the Weihrauch degrees is computably isomorphic to third-order arithmetic. △ Less

Submitted 21 November, 2023; originally announced November 2023.

MSC Class: 03D30 03D78

arXiv:2311.02019 [pdf, other]

Reproducible Parameter Inference Using Bagged Posteriors

Authors: Jonathan H. Huggins, Jeffrey W. Miller

Abstract: Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty qu… ▽ More Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. Even more fundamentally, misspecification leads to a lack of reproducibility in the sense that the same model will yield contradictory posteriors on independent data sets from the true distribution. To define a criterion for reproducible uncertainty quantification under misspecification, we consider the probability that two confidence sets constructed from independent data sets have nonempty overlap, and we establish a lower bound on this overlap probability that holds for any valid confidence sets. We prove that credible sets from the standard posterior can strongly violate this bound, particularly in high-dimensional settings (i.e., with dimension increasing with sample size), indicating that it is not internally coherent under misspecification. To improve reproducibility in an easy-to-use and widely applicable way, we propose to apply bagging to the Bayesian posterior ("BayesBag"'); that is, to use the average of posterior distributions conditioned on bootstrapped datasets. We motivate BayesBag from first principles based on Jeffrey conditionalization and show that the bagged posterior typically satisfies the overlap lower bound. Further, we prove a Bernstein--Von Mises theorem for the bagged posterior, establishing its asymptotic normal distribution. We demonstrate the benefits of BayesBag via simulation experiments and an application to crime rate prediction. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:1912.07104

arXiv:2310.20498 [pdf, other]

Generative Learning of Continuous Data by Tensor Networks

Authors: Alex Meiburg, **g Chen, Jacob Miller, Raphaëlle Tihon, Guillaume Rabusseau, Alejandro Perdomo-Ortiz

Abstract: Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting… ▽ More Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. We overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. We develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. We then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. We develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. Overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 21 pages, 15 figures

arXiv:2310.17247 [pdf, other]

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Authors: Jack Miller, Charles O'Neill, Thang Bui

Abstract: In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression, linear r… ▽ More In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression, linear regression and Bayesian neural networks. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures shows that grokking is not restricted to settings considered in current theoretical and empirical studies. Instead, grokking may be possible in any model where solution search is guided by complexity and error. △ Less

Submitted 31 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.02656 [pdf, other]

Blend: A Unified Data Discovery System

Authors: Mahdi Esmailoghli, Christoph Schnell, Renée J. Miller, Ziawasch Abedjan

Abstract: Data discovery is an iterative and incremental process that necessitates the execution of multiple data discovery queries to identify the desired tables from large and diverse data lakes. Current methodologies concentrate on single discovery tasks such as join, correlation, or union discovery. However, in practice, a series of these approaches and their corresponding index structures are necessary… ▽ More Data discovery is an iterative and incremental process that necessitates the execution of multiple data discovery queries to identify the desired tables from large and diverse data lakes. Current methodologies concentrate on single discovery tasks such as join, correlation, or union discovery. However, in practice, a series of these approaches and their corresponding index structures are necessary to enable the user to discover the desired tables. This paper presents BLEND, a comprehensive data discovery system that empowers users to develop ad-hoc discovery tasks without the need to develop new algorithms or build a new index structure. To achieve this goal, we introduce a general index structure capable of addressing multiple discovery queries. We develop a set of lower-level operators that serve as the fundamental building blocks for more complex and sophisticated user tasks. These operators are highly efficient and enable end-to-end efficiency. To enhance the execution of the discovery pipeline, we rewrite the search queries into optimized SQL statements to push the data operators down to the database. We demonstrate that our holistic system is able to achieve comparable effectiveness and runtime efficiency to the individual state-of-the-art approaches specifically designed for a single task. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.16827 [pdf, other]

Post-Training Overfitting Mitigation in DNN Classifiers

Authors: Hang Wang, David J. Miller, George Kesidis

Abstract: Well-known (non-malicious) sources of overfitting in deep neural net (DNN) classifiers include: i) large class imbalances; ii) insufficient training-set diversity; and iii) over-training. In recent work, it was shown that backdoor data-poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activation… ▽ More Well-known (non-malicious) sources of overfitting in deep neural net (DNN) classifiers include: i) large class imbalances; ii) insufficient training-set diversity; and iii) over-training. In recent work, it was shown that backdoor data-poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. Thus, an effective post-training (with no knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. Improving upon that work, we threshold activations specifically to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. We also provide some analytical support for this mitigation approach. Most importantly, we show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining. Thus, unlike adversarial training, which provides some resilience against attacks but which harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning that helps clean generalization accuracy. Experiments on CIFAR-10 and CIFAR-100, in comparison with peer methods, demonstrate strong performance of our methods. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.13050 [pdf, other]

Decoding the Alphabet Soup of Degrees in the United States Postsecondary Education System Through Hybrid Method: Database and Text Mining

Authors: Sahar Voghoei, James Byars, John A Miller, Khaled Rasheed, Hamid A Arabnia

Abstract: This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC). The model will be the hybrid of two modules. The first module interprets the relevant abbreviatory elements embedded in NSC reports by referring to a comprehensive database t… ▽ More This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC). The model will be the hybrid of two modules. The first module interprets the relevant abbreviatory elements embedded in NSC reports by referring to a comprehensive database that we have made of nearly 950 abbreviations for degree titles used by American postsecondary educators. The second module is a combination of feature classification and text mining modeled with CNN-BiLSTM, which is preceded by several steps of heavy pre-processing. The model proposed in this paper was trained with four multi-label datasets of different grades of resolution and returned 97.83\% accuracy with the most sophisticated dataset. Such a thorough classification of degree levels will provide insights into the modeling patterns of student success and mobility. To date, such a classification strategy has not been attempted except using manual methods and simple text parsing logic. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 18 Pages, 8 figures

arXiv:2309.06126 [pdf, other]

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Authors: Tuan Dung Nguyen, Yuan-Sen Ting, Ioana Ciucă, Charlie O'Neill, Ze-Chang Sun, Maja Jabłońska, Sandor Kruk, Ernest Perkowski, Jack Miller, Jason Li, Josh Peek, Kartheik Iyer, Tomasz Różański, Pranav Khetarpal, Sharaf Zaman, David Brodrick, Sergio J. Rodríguez Méndez, Thang Bui, Alyssa Goodman, Alberto Accomazzi, Jill Naiman, Jesse Cranney, Kevin Schawinski, UniverseTBD

Abstract: Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marke… ▽ More Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marked domain adaptation. Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models despite having significantly fewer parameters. AstroLLaMA serves as a robust, domain-specific model with broad fine-tuning potential. Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 6 pages, 3 figures, submitted to IJCNLP-AACL 2023. Comments are welcome. The model can be found on Hugging Face - https://huggingface.co/universeTBD/astrollama

arXiv:2308.16403 [pdf, other]

Balancing between the Local and Global Structures (LGS) in Graph Embedding

Authors: Jacob Miller, Vahan Huroyan, Stephen Kobourov

Abstract: We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing… ▽ More We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online. △ Less

Submitted 1 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Appears in the Proceedings of the 31st International Symposium on Graph Drawing and Network Visualization (GD 2023)

arXiv:2308.13768 [pdf, other]

Adversarial Fine-Tuning of Language Models: An Iterative Optimisation Approach for the Generation and Detection of Problematic Content

Authors: Charles O'Neill, Jack Miller, Ioana Ciuca, Yuan-Sen Ting, Thang Bui

Abstract: In this paper, we tackle the emerging challenge of unintended harmful content generation in Large Language Models (LLMs) with a novel dual-stage optimisation technique using adversarial fine-tuning. Our two-pronged approach employs an adversarial model, fine-tuned to generate potentially harmful prompts, and a judge model, iteratively optimised to discern these prompts. In this adversarial cycle,… ▽ More In this paper, we tackle the emerging challenge of unintended harmful content generation in Large Language Models (LLMs) with a novel dual-stage optimisation technique using adversarial fine-tuning. Our two-pronged approach employs an adversarial model, fine-tuned to generate potentially harmful prompts, and a judge model, iteratively optimised to discern these prompts. In this adversarial cycle, the two models seek to outperform each other in the prompting phase, generating a dataset of rich examples which are then used for fine-tuning. This iterative application of prompting and fine-tuning allows continuous refinement and improved performance. The performance of our approach is evaluated through classification accuracy on a dataset consisting of problematic prompts not detected by GPT-4, as well as a selection of contentious but unproblematic prompts. We show considerable increase in classification accuracy of the judge model on this challenging dataset as it undergoes the optimisation process. Furthermore, we show that a rudimentary model \texttt{ada} can achieve 13\% higher accuracy on the hold-out test set than GPT-4 after only a few rounds of this process, and that this fine-tuning improves performance in parallel tasks such as toxic comment identification. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.12443 [pdf, other]

TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction

Authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Bo Zhou, Qiong Liu, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek

Abstract: The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods… ▽ More The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one map**. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted by Simulation and Synthesis in Medical Imaging (SASHIMI 2023, MICCAI workshop), preprint version

arXiv:2308.09850 [pdf, other]

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Authors: Xi Li, Zhen Xiang, David J. Miller, George Kesidis

Abstract: Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present. In this paper, we reveal and analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activa… ▽ More Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present. In this paper, we reveal and analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances, compared to that for clean instances. Even more importantly, we find that instances with the backdoor trigger will be correctly classified to their original source classes if this distribution alteration is corrected. Based on our observations, we propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration using reverse-engineered triggers. Notably, our method does not change any trainable parameters of the DNN, but achieves generally better mitigation performance than existing methods that do require intensive DNN parameter tuning. It also efficiently detects test instances with the trigger, which may help to catch adversarial entities in the act of exploiting the backdoor. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.07645 [pdf, other]

Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation

Authors: Charles O'Neill, Yuan-Sen Ting, Ioana Ciuca, Jack Miller, Thang Bui

Abstract: Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite their impressive capacities, consistently struggle to produce both coherent and diverse data. To address the coherency issue, we introduce contrastive expert gu… ▽ More Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite their impressive capacities, consistently struggle to produce both coherent and diverse data. To address the coherency issue, we introduce contrastive expert guidance, where the difference between the logit distributions of fine-tuned and base language models is emphasised to ensure domain adherence. In order to ensure diversity, we utilise existing real and synthetic examples as negative prompts to the model. We deem this dual-pronged approach to logit resha** as STEER: Semantic Text Enhancement via Embedding Repositioning. STEER operates at inference-time and systematically guides the LLMs to strike a balance between adherence to the data distribution (ensuring semantic fidelity) and deviation from prior synthetic examples or existing real datasets (ensuring diversity and authenticity). This delicate balancing act is achieved by dynamically moving towards or away from chosen representations in the latent space. STEER demonstrates improved performance over previous synthetic data generation techniques, exhibiting better balance between data diversity and coherency across three distinct tasks: hypothesis generation, toxic and non-toxic comment generation, and commonsense reasoning task generation. We demonstrate how STEER allows for fine-tuned control over the diversity-coherency trade-off via its hyperparameters, highlighting its versatility. △ Less

Submitted 17 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.06378 [pdf, other]

DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System

Authors: Mojtaba Yeganejou, Kimia Honari, Ryan Kluzinski, Scott Dick, Michael Lipsett, James Miller

Abstract: A key challenge in eXplainable Artificial Intelligence is the well-known tradeoff between the transparency of an algorithm (i.e., how easily a human can directly understand the algorithm, as opposed to receiving a post-hoc explanation), and its accuracy. We report on the design of a new deep network that achieves improved transparency without sacrificing accuracy. We design a deep convolutional ne… ▽ More A key challenge in eXplainable Artificial Intelligence is the well-known tradeoff between the transparency of an algorithm (i.e., how easily a human can directly understand the algorithm, as opposed to receiving a post-hoc explanation), and its accuracy. We report on the design of a new deep network that achieves improved transparency without sacrificing accuracy. We design a deep convolutional neuro-fuzzy inference system (DCNFIS) by hybridizing fuzzy logic and deep learning models and show that DCNFIS performs as accurately as existing convolutional neural networks on four well-known datasets and 3 famous architectures. Our performance comparison with available fuzzy methods show that DCNFIS is now state-of-the-art fuzzy system and outperforms other shallow and deep fuzzy methods to the best of our knowledge. At the end, we exploit the transparency of fuzzy logic by deriving explanations, in the form of saliency maps, from the fuzzy rules encoded in the network to take benefit of fuzzy logic upon regular deep learning methods. We investigate the properties of these explanations in greater depth using the Fashion-MNIST dataset. △ Less

Submitted 17 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2308.04617 [pdf, other]

Improved Activation Clip** for Universal Backdoor Mitigation and Test-Time Detection

Authors: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

Abstract: Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training c… ▽ More Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clip** method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.03890 [pdf, other]

On the Perception of Small Sub-graphs

Authors: Jacob Miller, Mohammad Ghoniem, Hsiang-Yun Wu, Helen C. Purchase

Abstract: Interpreting a node-link graph is enhanced if similar subgraphs (or motifs) are depicted in a similar manner; that is, they have the same visual form. Small motifs within graphs may be perceived to be identical when they are structurally dissimilar, or may be perceived to be dissimilar when they are identical. This issue primarily relates to the Gestalt principle of similarity, but may also includ… ▽ More Interpreting a node-link graph is enhanced if similar subgraphs (or motifs) are depicted in a similar manner; that is, they have the same visual form. Small motifs within graphs may be perceived to be identical when they are structurally dissimilar, or may be perceived to be dissimilar when they are identical. This issue primarily relates to the Gestalt principle of similarity, but may also include an element of quick, low-level pattern-matching. We believe that if motifs are identical, they should be depicted identically; if they are nearly-identical, they should be depicted nearly-identically. This principle is particularly important in domains where motifs hold meaning and where their identification is important. We identified five small motifs: bi-cliques, cliques, cycles, double-cycles, and stars. For each, we defined visual variations on two dimensions: same or different structure, same or different shape. We conducted a crowd-sourced empirical study to test the perception of similarity of these varied motifs, and found that determining whether motifs are identical or similar is affected by both shape and structure. △ Less

Submitted 9 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: Appears in the Proceedings of the 31st International Symposium on Graph Drawing and Network Visualization (GD 2023)

arXiv:2308.03883 [pdf, other]

Generative Benchmark Creation for Table Union Search

Authors: Koyena Pal, Aamod Khatiwada, Roee Shraga, Renée J. Miller

Abstract: Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the success and adoption of database management systems. But more and more, data management problems are of a semantic nature. An important example is fi… ▽ More Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the success and adoption of database management systems. But more and more, data management problems are of a semantic nature. An important example is finding tables that can be unioned. While any two tables with the same cardinality can be unioned, table union search is the problem of finding tables whose union is semantically coherent. Semantic problems cannot be benchmarked using synthetic data. Our current methods for creating benchmarks involve the manual curation and labeling of real data. These methods are not robust or scalable and perhaps more importantly, it is not clear how robust the created benchmarks are. We propose to use generative AI models to create structured data benchmarks for table union search. We present a novel method for using generative models to create tables with specified properties. Using this method, we create a new benchmark containing pairs of tables that are both unionable and non-unionable but related. We thoroughly evaluate recent existing table union search methods over existing benchmarks and our new benchmark. We also present and evaluate a new table search methods based on recent large language models over all benchmarks. We show that the new benchmark is more challenging for all methods than hand-curated benchmarks, specifically, the top-performing method achieves a Mean Average Precision of around 60%, over 30% less than its performance on existing manually created benchmarks. We examine why this is the case and show that the new benchmark permits more detailed analysis of methods, including a study of both false positives and false negatives that were not possible with existing benchmarks. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.12158 [pdf, other]

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

Authors: Ellen Novoseller, Vinicius G. Goecks, David Watkins, Josh Miller, Nicholas Waytowich

Abstract: In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended enviro… ▽ More In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways, including training an autoencoder, seeding reinforcement learning (RL) training batches with demonstration data, and inferring preferences over behaviors to learn a reward function to guide RL. We evaluate DIP-RL in a tree-chop** task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that reflects human preferences and that DIP-RL performs competitively relative to baselines. DIP-RL is inspired by our previous work on combining demonstrations and pairwise preferences in Minecraft, which was awarded a research prize at the 2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in Minecraft. Example trajectory rollouts of DIP-RL and baselines are located at https://sites.google.com/view/dip-rl. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: Paper accepted at The Many Facets of Preference Learning Workshop at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023

ACM Class: I.2.6; G.3

arXiv:2307.06871 [pdf, other]

Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis

Authors: Eufrásio de A. Lima Neto, Jonathan Bailiss, Axel Finke, Jo Miller, Georgina Cosma

Abstract: Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referre… ▽ More Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2304.08285 [pdf, other]

doi 10.1145/3555041.3589732

DIALITE: Discover, Align and Integrate Open Data Tables

Authors: Aamod Khatiwada, Roee Shraga, Renée J. Miller

Abstract: We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the-art table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze… ▽ More We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the-art table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze the integration result by applying different downstreaming tasks over it. Our pipeline is flexible such that the user can easily add and compare additional discovery and integration algorithms. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: SIGMOD 2023

arXiv:2304.05345 [pdf, other]

FIR-based Future Trajectory Prediction in Nighttime Autonomous Driving

Authors: Alireza Rahimpour, Navid Fallahinia, Devesh Upadhyay, Justin Miller

Abstract: The performance of the current collision avoidance systems in Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) can be drastically affected by low light and adverse weather conditions. Collisions with large animals such as deer in low light cause significant cost and damage every year. In this paper, we propose the first AI-based method for future trajectory prediction of larg… ▽ More The performance of the current collision avoidance systems in Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) can be drastically affected by low light and adverse weather conditions. Collisions with large animals such as deer in low light cause significant cost and damage every year. In this paper, we propose the first AI-based method for future trajectory prediction of large animals and mitigating the risk of collision with them in low light. In order to minimize false collision warnings, in our multi-step framework, first, the large animal is accurately detected and a preliminary risk level is predicted for it and low-risk animals are discarded. In the next stage, a multi-stream CONV-LSTM-based encoder-decoder framework is designed to predict the future trajectory of the potentially high-risk animals. The proposed model uses camera motion prediction as well as the local and global context of the scene to generate accurate predictions. Furthermore, this paper introduces a new dataset of FIR videos for large animal detection and risk estimation in real nighttime driving scenarios. Our experiments show promising results of the proposed framework in adverse conditions. Our code is available online. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Comments: Conference: IEEE Intelligent Vehicles 2023 (IEEE IV 2023)

arXiv:2303.13512 [pdf, other]

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Authors: Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yu**g Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik, Shu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Ellen Novoseller , et al. (5 additional authors not shown)

Abstract: To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use… ▽ More To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use human feedback as channels to learn the desired behavior. We describe the competition and provide an overview of the top solutions. We conclude by discussing the impact of the competition and future directions for improvement. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.00189 [pdf, other]

Detecting Lexical Borrowings from Dominant Languages in Multilingual Wordlists

Authors: John E. Miller, Johann-Mattis List

Abstract: Language contact is a pervasive phenomenon reflected in the borrowing of words from donor to recipient languages. Most computational approaches to borrowing detection treat all languages under study as equally important, even though dominant languages have a stronger impact on heritage languages than vice versa. We test new methods for lexical borrowing detection in contact situations where domina… ▽ More Language contact is a pervasive phenomenon reflected in the borrowing of words from donor to recipient languages. Most computational approaches to borrowing detection treat all languages under study as equally important, even though dominant languages have a stronger impact on heritage languages than vice versa. We test new methods for lexical borrowing detection in contact situations where dominant languages play an important role, applying two classical sequence comparison methods and one machine learning method to a sample of seven Latin American languages which have all borrowed extensively from Spanish. All methods perform well, with the supervised machine learning system outperforming the classical systems. A review of detection errors shows that borrowing detection could be substantially improved by taking into account donor words with divergent meanings from recipient words. △ Less

Submitted 21 February, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: To appear at The 17th Conference of the European Chapter of the Association for Computational Linguistics. See https://www.aclweb.org/portal/content/17th-conference-european-chapter-association-computational-linguistics

arXiv:2301.13095 [pdf, other]

Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V (Technical Report)

Authors: Roee Shraga, Renée J. Miller

Abstract: In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce \texttt{Explain-Da-V}, a framework aiming to explain changes between two give… ▽ More In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce \texttt{Explain-Da-V}, a framework aiming to explain changes between two given dataset versions. \texttt{Explain-Da-V} generates \emph{explanations} that use \emph{data transformations} to explain changes. We further introduce a set of measures that evaluate the validity, generalizability, and explainability of these explanations. We empirically show, using an adapted existing benchmark and a newly created benchmark, that \texttt{Explain-Da-V} generates better explanations than existing data transformation synthesis methods. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: To appear in VLDB 2023

arXiv:2301.05969 [pdf, other]

doi 10.1609/aaai.v37i5.25741

The Role of Heuristics and Biases During Complex Choices with an AI Teammate

Authors: Nikolos Gurney, John H. Miller, David V. Pynadath

Abstract: Behavioral scientists have classically documented aversion to algorithmic decision aids, from simple linear models to AI. Sentiment, however, is changing and possibly accelerating AI helper usage. AI assistance is, arguably, most valuable when humans must make complex choices. We argue that classic experimental methods used to study heuristics and biases are insufficient for studying complex choic… ▽ More Behavioral scientists have classically documented aversion to algorithmic decision aids, from simple linear models to AI. Sentiment, however, is changing and possibly accelerating AI helper usage. AI assistance is, arguably, most valuable when humans must make complex choices. We argue that classic experimental methods used to study heuristics and biases are insufficient for studying complex choices made with AI helpers. We adapted an experimental paradigm designed for studying complex choices in such contexts. We show that framing and anchoring effects impact how people work with an AI helper and are predictive of choice outcomes. The evidence suggests that some participants, particularly those in a loss frame, put too much faith in the AI helper and experienced worse choice outcomes by doing so. The paradigm also generates computational modeling-friendly data allowing future studies of human-AI decision making. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: AAAI 2023

arXiv:2212.12086 [pdf, other]

Eigenvalue initialisation and regularisation for Koopman autoencoders

Authors: Jack W. Miller, Charles O'Neill, Navid C. Constantinou, Omri Azencot

Abstract: Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operato… ▽ More Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physics-related settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches. △ Less

Submitted 25 December, 2022; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: 18 pages

arXiv:2212.07495 [pdf, other]

SAIF: Sparse Adversarial and Imperceptible Attack Framework

Authors: Tooba Imtiaz, Morgan Kohler, Jared Miller, Zifeng Wang, Mario Sznaier, Octavia Camps, Jennifer Dy

Abstract: Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks tha… ▽ More Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset. △ Less

Submitted 6 December, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.00241 [pdf, other]

Adversarial Policies Beat Superhuman Go AIs

Authors: Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

Abstract: We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human exper… ▽ More We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/. △ Less

Submitted 13 July, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: Accepted to ICML 2023, see paper for changelog

ACM Class: I.2.6

arXiv:2210.10272 [pdf, other]

Training set cleansing of backdoor poisoning by self-supervised representation learning

Authors: H. Wang, S. Karami, O. Dia, H. Ritter, E. Emamjomeh-Zadeh, J. Chen, Z. Xiang, D. J. Miller, G. Kesidis

Abstract: A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DN… ▽ More A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DNN behaves normally on most benign test samples but makes incorrect predictions to the target class when the test sample has the backdoor pattern incorporated (i.e., contains a backdoor trigger). Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. By contrast, self-supervised representation learning ignores the labels of samples and learns a feature embedding based on images' semantic content. %We thus propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class. Using a feature embedding found by self-supervised representation learning, a data cleansing method, which combines sample filtering and re-labeling, is developed. Experiments on CIFAR-10 benchmark datasets show that our method achieves state-of-the-art performance in mitigating backdoor attacks. △ Less

Submitted 14 March, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2209.13589 [pdf, other]

SANTOS: Relationship-based Semantic Table Union Search

Authors: Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

Abstract: Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new n… ▽ More Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way. To do so, we present two new methods to discover semantic relationship between pairs of columns. The first uses an existing knowledge base (KB), the second (which we call a "synthesized KB") uses knowledge from the data lake itself. We adopt an existing Table Union Search benchmark and present new (open) benchmarks that represent small and large real data lakes. We show that our new unionability search algorithm, called SANTOS, outperforms a state-of-the-art union search that uses a wide variety of column-based semantics, including word embeddings and regular expressions. We show empirically that our synthesized KB improves the accuracy of union search by representing relationship semantics that may not be contained in an available KB. This result hints at a promising future of creating a synthesized KBs from data lakes with limited KB coverage and using them for union search. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 15 pages, 10 figures, to appear at SIGMOD 2023

arXiv:2209.00191 [pdf, other]

Spherical Graph Drawing by Multi-dimensional Scaling

Authors: Jacob Miller, Vahan Huroyan, Stephen Kobourov

Abstract: We describe an efficient and scalable spherical graph embedding method. The method uses a generalization of the Euclidean stress function for Multi-Dimensional Scaling adapted to spherical space, where geodesic pairwise distances are employed instead of Euclidean distances. The resulting spherical stress function is optimized by means of stochastic gradient descent. Quantitative and qualitative ev… ▽ More We describe an efficient and scalable spherical graph embedding method. The method uses a generalization of the Euclidean stress function for Multi-Dimensional Scaling adapted to spherical space, where geodesic pairwise distances are employed instead of Euclidean distances. The resulting spherical stress function is optimized by means of stochastic gradient descent. Quantitative and qualitative evaluations demonstrate the scalability and effectiveness of the proposed method. We also show that some graph families can be embedded with lower distortion on the sphere, than in Euclidean and hyperbolic spaces. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: Appears in the Proceedings of the 30th International Symposium on Graph Drawing and Network Visualization (GD 2022)

arXiv:2208.13284 [pdf, other]

doi 10.46298/dmtcs.10037

Distinct Angles and Angle Chains in Three Dimensions

Authors: Ruben Ascoli, Livia Betti, Jacob Lehmann Duke, Xuyan Liu, Wyatt Milgrim, Steven J. Miller, Eyvindur A. Palsson, Francisco Romero Acosta, Santiago Velazquez Iannuzzelli

Abstract: In 1946, Erdős posed the distinct distance problem, which seeks to find the minimum number of distinct distances between pairs of points selected from any configuration of $n$ points in the plane. The problem has since been explored along with many variants, including ones that extend it into higher dimensions. Less studied but no less intriguing is Erdős' distinct angle problem, which seeks to fi… ▽ More In 1946, Erdős posed the distinct distance problem, which seeks to find the minimum number of distinct distances between pairs of points selected from any configuration of $n$ points in the plane. The problem has since been explored along with many variants, including ones that extend it into higher dimensions. Less studied but no less intriguing is Erdős' distinct angle problem, which seeks to find point configurations in the plane that minimize the number of distinct angles. In their recent paper "Distinct Angles in General Position," Fleischmann, Konyagin, Miller, Palsson, Pesikoff, and Wolf use a logarithmic spiral to establish an upper bound of $O(n^2)$ on the minimum number of distinct angles in the plane in general position, which prohibits three points on any line or four on any circle. We consider the question of distinct angles in three dimensions and provide bounds on the minimum number of distinct angles in general position in this setting. We focus on pinned variants of the question, and we examine explicit constructions of point configurations in $\mathbb{R}^3$ which use self-similarity to minimize the number of distinct angles. Furthermore, we study a variant of the distinct angles question regarding distinct angle chains and provide bounds on the minimum number of distinct chains in $\mathbb{R}^2$ and $\mathbb{R}^3$. △ Less

Submitted 19 February, 2023; v1 submitted 28 August, 2022; originally announced August 2022.

Comments: 16 pages, 7 figures

Journal ref: Discrete Mathematics & Theoretical Computer Science, vol. 25:1, Combinatorics (February 27, 2023) dmtcs:10037

arXiv:2207.14624 [pdf, other]

Post-processing of coronary and myocardial spatial data

Authors: Jay Aodh Mackenzie, Megan Jeanne Miller, Nicholas Hill, Mette Olufsen

Abstract: Numerical simulations of real-world phenomenon are implemented with at least two parts: the computational scheme and the computational domain. In the context of hemodynamics, the computational domain of a simulation represents the blood vessel network through which blood flows. Such blood vessel networks can contain millions of individual vessels that are joined together to form a in series and pa… ▽ More Numerical simulations of real-world phenomenon are implemented with at least two parts: the computational scheme and the computational domain. In the context of hemodynamics, the computational domain of a simulation represents the blood vessel network through which blood flows. Such blood vessel networks can contain millions of individual vessels that are joined together to form a in series and parallel to form the network. It is computationally unfeasible to explicitly simulate blood flow in all blood vessels. Here, from imaged data of a single porcine left coronary arterial tree, we develop a data-pipeline to obtain computational domains for hemodynmaic simulations from a graph representing the coronary vascular tree. Further, we develop a method to ascertain which subregions of the left ventricle are most likely to be perfused via a given artery using a comparison with the American Heart Association division of the left ventricle as a sense check. △ Less

Submitted 15 April, 2024; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: 21 pages, 22 figures

arXiv:2207.11767 [pdf, other]

Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

Authors: Nicholas Synovic, Matt Hyatt, Rohan Sethi, Sohini Thota, Shilpika, Allan J. Miller, Wenxin Jiang, Emmanuel S. Amobi, Austin Pinderski, Konstantin Läufer, Nicholas J. Hayward, Neil Klingensmith, James C. Davis, George K. Thiruvathukal

Abstract: Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about pr… ▽ More Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor. We illustrate the value of longitudinal data and conclude with a research agenda. The tool's demo video can be watched at https://youtu.be/YigEHy3_JCo. The source code can be found at https://github.com/SoftwareSystemsLaboratory/prime. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: Accepted at ASE 2022 Tool Demonstrations

arXiv:2206.13776 [pdf, other]

A Scalable Blockchain-based Smart Contract Model for Decentralized Voltage Stability Using Sharding Technique

Authors: Kimia Honari, Xiaotian Zhou, Sara Rouhani, Scott Dick, Hao Liang, James Miller Li, James Miller

Abstract: Blockchain technologies are one possible avenue for increasing the resilience of the Smart Grid, by decentralizing the monitoring and control of system-level objectives such as voltage stability protection. They furthermore offer benefits in data immutability and traceability, as blockchains are cryptographically secured. However, the performance of blockchain-based systems in real-time grid monit… ▽ More Blockchain technologies are one possible avenue for increasing the resilience of the Smart Grid, by decentralizing the monitoring and control of system-level objectives such as voltage stability protection. They furthermore offer benefits in data immutability and traceability, as blockchains are cryptographically secured. However, the performance of blockchain-based systems in real-time grid monitoring and control has never been empirically tested. This study proposes implementing a decentralized voltage stability algorithm using blockchain-based smart contracts, as a testbed for evaluating the performance of blockchains in real-time control. We furthermore investigate sharding mechanisms as a means of improving the system's scalability with fixed computing resources. We implement our models as a proof-of-concept prototype system using Hyperledger Fabric as our blockchain platform, the Matpower library in MATLAB as our power system simulator, and Hyperledger Caliper as our performance evaluation tool. We found that sharding does indeed lead to a substantial improvement in system scalability for this domain, measured by both transaction success rates and transaction latency. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 8 pages

arXiv:2206.09305 [pdf, other]

doi 10.1145/3531146.3533228

Adversarial Scrutiny of Evidentiary Statistical Software

Authors: Rediet Abebe, Moritz Hardt, Angela **, John Miller, Ludwig Schmidt, Rebecca Wexler

Abstract: The U.S. criminal legal system increasingly relies on software output to convict and incarcerate people. In a large number of cases each year, the government makes these consequential decisions based on evidence from statistical software -- such as probabilistic genoty**, environmental audio detection, and toolmark analysis tools -- that defense counsel cannot fully cross-examine or scrutinize.… ▽ More The U.S. criminal legal system increasingly relies on software output to convict and incarcerate people. In a large number of cases each year, the government makes these consequential decisions based on evidence from statistical software -- such as probabilistic genoty**, environmental audio detection, and toolmark analysis tools -- that defense counsel cannot fully cross-examine or scrutinize. This undermines the commitments of the adversarial criminal legal system, which relies on the defense's ability to probe and test the prosecution's case to safeguard individual rights. Responding to this need to adversarially scrutinize output from such software, we propose robust adversarial testing as an audit framework to examine the validity of evidentiary statistical software. We define and operationalize this notion of robust adversarial testing for defense use by drawing on a large body of recent work in robust machine learning and algorithmic fairness. We demonstrate how this framework both standardizes the process for scrutinizing such tools and empowers defense lawyers to examine their validity for instances most relevant to the case at hand. We further discuss existing structural and institutional challenges within the U.S. criminal legal system that may create barriers for implementing this and other such audit frameworks and close with a discussion on policy changes that could help address these concerns. △ Less

Submitted 30 September, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: Typos corrected, appendix B removed

ACM Class: K.4.1; I.2.1; G.3; D.2.5

arXiv:2206.08957 [pdf, other]

Not-Quite Transcendental Functions and their Applications

Authors: Jonah M. Miller, Joshua C. Dolence, Daniel Holladay

Abstract: Transcendental functions, such as exponentials and logarithms, appear in a broad array of computational domains: from simulations in curvilinear coordinates, to interpolation, to machine learning. Unfortunately they are typically expensive to compute accurately. In this note, we argue that in many cases, the properties of the function matters more than the exact functional form. We present new fun… ▽ More Transcendental functions, such as exponentials and logarithms, appear in a broad array of computational domains: from simulations in curvilinear coordinates, to interpolation, to machine learning. Unfortunately they are typically expensive to compute accurately. In this note, we argue that in many cases, the properties of the function matters more than the exact functional form. We present new functions, which are not transcendental, that can be used as drop-in replacements for the exponential and logarithm in many settings for a significant performance boost. We show that for certain applications using these functions result in no drop in the accuracy at all, as they are perfectly accurate representations of themselves, if not the original transcendental functions. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Submitted as a short note to the journal of computational physics

Report number: LA-UR-22-25573

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.04367 [pdf, ps, other]

Distinct Angles in General Position

Authors: Henry L. Fleischmann, Sergei V. Konyagin, Steven J. Miller, Eyvindur A. Palsson, Ethan Pesikoff, Charles Wolf

Abstract: The Erdős distinct distance problem is a ubiquitous problem in discrete geometry. Somewhat less well known is Erdős' distinct angle problem, the problem of finding the minimum number of distinct angles between $n$ non-collinear points in the plane. Recent work has introduced bounds on a wide array of variants of this problem, inspired by similar variants in the distance setting. In this short no… ▽ More The Erdős distinct distance problem is a ubiquitous problem in discrete geometry. Somewhat less well known is Erdős' distinct angle problem, the problem of finding the minimum number of distinct angles between $n$ non-collinear points in the plane. Recent work has introduced bounds on a wide array of variants of this problem, inspired by similar variants in the distance setting. In this short note, we improve the best known upper bound for the minimum number of distinct angles formed by $n$ points in general position from $O(n^{\log_2(7)})$ to $O(n^2)$. Before this work, similar bounds relied on projections onto a generic plane from higher dimensional space. In this paper, we employ the geometric properties of a logarithmic spiral, sidestep** the need for a projection. We also apply this configuration to reduce the upper bound on the largest integer such that any set of $n$ points in general position has a subset of that size with all distinct angles. This bound is decreased from $O(n^{\log_2(7)/3})$ to $O(n^{1/2})$. △ Less

Submitted 13 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: Former Corollary 4.1 upgraded to Theorem 1.2 with improved bounds

MSC Class: 52C10

Showing 1–50 of 170 results for author: Miller, J