-
There are no good infinite families of toric codes
Authors:
Jason P. Bell,
Sean Monahan,
Matthew Satriano,
Karen Situ,
Zheng Xie
Abstract:
Soprunov and Soprunova introduced the notion of a good infinite family of toric codes. We prove that such good families do not exist by proving a more general Szemerédi-type result: for all $c\in(0,1]$ and all positive integers $N$, subsets of density at least $c$ in $\{0,1,\dots,N-1\}^n$ contain hypercubes of arbitrarily large dimension as $n$ grows.
Soprunov and Soprunova introduced the notion of a good infinite family of toric codes. We prove that such good families do not exist by proving a more general Szemerédi-type result: for all $c\in(0,1]$ and all positive integers $N$, subsets of density at least $c$ in $\{0,1,\dots,N-1\}^n$ contain hypercubes of arbitrarily large dimension as $n$ grows.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Word-specific tonal realizations in Mandarin
Authors:
Yu-Ying Chuang,
Melanie J. Bell,
Yu-Hsiang Tseng,
R. Harald Baayen
Abstract:
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' m…
▽ More
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Authors:
Melissa Hall,
Samuel J. Bell,
Candace Ross,
Adina Williams,
Michal Drozdzal,
Adriana Romero Soriano
Abstract:
Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated met…
▽ More
Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of "appeal" captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
LLMorpheus: Mutation Testing using Large Language Models
Authors:
Frank Tip,
Jonathan Bell,
Max Schäfer
Abstract:
In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-" or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by…
▽ More
In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-" or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
The opportunities and risks of large language models in mental health
Authors:
Hannah R. Lawrence,
Renee A. Schneider,
Susan B. Rubin,
Maja J. Mataric,
Daniel J. McDuff,
Megan Jones Bell
Abstract:
Global rates of mental health concerns are rising and there is increasing realization that existing models of mental healthcare will not adequately expand to meet the demand. With the emergence of large language models (LLMs) has come great optimism regarding their promise to create novel, large-scale solutions to support mental health. Despite their nascence, LLMs have already been applied to men…
▽ More
Global rates of mental health concerns are rising and there is increasing realization that existing models of mental healthcare will not adequately expand to meet the demand. With the emergence of large language models (LLMs) has come great optimism regarding their promise to create novel, large-scale solutions to support mental health. Despite their nascence, LLMs have already been applied to mental health-related tasks. In this review, we summarize the extant literature on efforts to use LLMs to provide mental health education, assessment, and intervention and highlight key opportunities for positive impact in each area. We then highlight risks associated with LLMs application to mental health and encourage adoption of strategies to mitigate these risks. The urgent need for mental health support must be balanced with responsible development, testing, and deployment of mental health LLMs. Especially critical is ensuring that mental health LLMs are fine-tuned for mental health, enhance mental health equity, adhere to ethical standards, and that people, including those with lived experience with mental health concerns, are involved in all stages from development through deployment. Prioritizing these efforts will minimize potential harms to mental health and maximize the likelihood that LLMs will positively impact mental health globally.
△ Less
Submitted 26 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
AMReX and pyAMReX: Looking Beyond ECP
Authors:
Andrew Myers,
Weiqun Zhang,
Ann Almgren,
Thierry Antoun,
John Bell,
Axel Huebl,
Alexander Sinn
Abstract:
AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also de…
▽ More
AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also developed a Python binding, pyAMReX, that provides a bridge between AMReX-based application codes and the data science ecosystem. pyAMReX provides zero-copy application GPU data access for AI/ML, in situ analysis and application coupling, and enables rapid, massively parallel prototy**. In this paper we review the overall functionality of AMReX and pyAMReX, focusing on new developments, new functionality, and optimizations of key operations. We also summarize capabilities of ECP projects that used AMReX and provide an overview of new, non-ECP applications.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Consecutive Power Occurrences in Sturmian Words
Authors:
Jason Bell,
Chris Schulz,
Jeffrey Shallit
Abstract:
We show that every Sturmian word has the property that the distance between consecutive ending positions of cubes occurring in the word is always bounded by $10$ and this bound is optimal, extending a result of Rampersad, who proved that the bound $9$ holds for the Fibonacci word. We then give a general result showing that for every $e \in [1,(5+\sqrt{5})/2)$ there is a natural number $N$, dependi…
▽ More
We show that every Sturmian word has the property that the distance between consecutive ending positions of cubes occurring in the word is always bounded by $10$ and this bound is optimal, extending a result of Rampersad, who proved that the bound $9$ holds for the Fibonacci word. We then give a general result showing that for every $e \in [1,(5+\sqrt{5})/2)$ there is a natural number $N$, depending only on $e$, such that every Sturmian word has the property that the distance between consecutive ending positions of $e$-powers occurring in the word is uniformly bounded by $N$.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
230,439 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers
Authors:
Abdulrahman Alshammari,
Paul Ammann,
Michael Hilton,
Jonathan Bell
Abstract:
Flaky tests are tests that can non-deterministically pass or fail, even in the absence of code changes.Despite being a source of false alarms, flaky tests often remain in test suites once they are detected, as they also may be relied upon to detect true failures. Hence, a key open problem in flaky test research is: How to quickly determine if a test failed due to flakiness, or if it detected a bug…
▽ More
Flaky tests are tests that can non-deterministically pass or fail, even in the absence of code changes.Despite being a source of false alarms, flaky tests often remain in test suites once they are detected, as they also may be relied upon to detect true failures. Hence, a key open problem in flaky test research is: How to quickly determine if a test failed due to flakiness, or if it detected a bug? The state-of-the-practice is for developers to re-run failing tests: if a test fails and then passes, it is flaky by definition; if the test persistently fails, it is likely a true failure. However, this approach can be both ineffective and inefficient. An alternate approach that developers may already use for triaging test failures is failure de-duplication, which matches newly discovered test failures to previously witnessed flaky and true failures. However, because flaky test failure symptoms might resemble those of true failures, there is a risk of missclassifying a true test failure as a flaky failure to be ignored. Using a dataset of 498 flaky tests from 22 open-source Java projects, we collect a large dataset of 230,439 failure messages (both flaky and not), allowing us to empirically investigate the efficacy of failure de-duplication. We find that for some projects, this approach is extremely effective (with 100\% specificity), while for other projects, the approach is entirely ineffective. By analyzing the characteristics of these flaky and non-flaky failures, we provide useful guidance on how developers should rely on this approach.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Sparse regular subsets of the reals
Authors:
Jason Bell,
Alexi Block Gorman
Abstract:
This paper concerns the expansion of the real ordered additive group by a predicate for a subset of $[0,1]$ whose base-$r$ representations are recognized by a Büchi automaton. In the case that this predicate is closed, a dichotomy is established for when this expansion is interdefinable with the structure $(\mathbb{R},<,+,0,r^{-\mathbb{N}})$ for some $r \in \mathbb{N}_{>1}$. In the case that the c…
▽ More
This paper concerns the expansion of the real ordered additive group by a predicate for a subset of $[0,1]$ whose base-$r$ representations are recognized by a Büchi automaton. In the case that this predicate is closed, a dichotomy is established for when this expansion is interdefinable with the structure $(\mathbb{R},<,+,0,r^{-\mathbb{N}})$ for some $r \in \mathbb{N}_{>1}$. In the case that the closure of the predicate has Hausdorff dimension less than $1$, the dichotomy further characterizes these expansions of $(\mathbb{R},<,+,0,1)$ by when they have NIP and NTP$_2$, which is precisely when the closure of the predicate has Hausdorff dimension $0$.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
The Effects of Computational Resources on Flaky Tests
Authors:
Denini Silva,
Martin Gruber,
Satyajit Gokhale,
Ellen Arteca,
Alexi Turcotte,
Marcelo d'Amorim,
Wing Lam,
Stefan Winter,
Jonathan Bell
Abstract:
Flaky tests are tests that nondeterministically pass and fail in unchanged code. These tests can be detrimental to developers' productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (in)availability may be a significant factor in the failure rate…
▽ More
Flaky tests are tests that nondeterministically pass and fail in unchanged code. These tests can be detrimental to developers' productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (in)availability may be a significant factor in the failure rate of flaky tests. We present the first assessment of the impact that computational resources have on flaky tests, including a total of 52 projects written in Java, JavaScript and Python, and 27 different resource configurations. Using a rigorous statistical methodology, we determine which tests are RAFT (Resource-Affected Flaky Tests). We find that 46.5% of the flaky tests in our dataset are RAFT, indicating that a substantial proportion of flaky-test failures can be avoided by adjusting the resources available when running tests. We report RAFTs and configurations to avoid them to developers, and received interest to either fix the RAFTs or to improve the specifications of the projects so that tests would be run only in configurations that are unlikely to encounter RAFT failures. Our results also have implications for researchers attempting to detect flaky tests, e.g., reducing the resources available when running tests is a cost-effective approach to detect more flaky failures.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Prediction of MET Overexpression in Non-Small Cell Lung Adenocarcinomas from Hematoxylin and Eosin Images
Authors:
Kshitij Ingale,
Sun Hae Hong,
Josh S. K. Bell,
Abbas Rizvi,
Amy Welch,
Lingdao Sha,
Irvin Ho,
Kunal Nagpal,
Aicha BenTaieb,
Rohan P Joshi,
Martin C Stumpe
Abstract:
MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithm…
▽ More
MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithms using routinely available digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most. While assessment of MET expression using IHC is currently not routinely performed in NSCLC, next-generation sequencing is common and in some cases includes RNA expression panel testing. In this work, we leveraged a large database of matched H&E slides and RNA expression data to train a weakly supervised model to predict MET RNA overexpression directly from H&E images. This model was evaluated on an independent holdout test set of 300 over-expressed and 289 normal patients, demonstrating an ROC-AUC of 0.70 (95th percentile interval: 0.66 - 0.74) with stable performance characteristics across different patient clinical variables and robust to synthetic noise on the test set. These results suggest that H&E-based predictive models could be useful to prioritize patients for confirmatory testing of MET protein or MET gene expression status.
△ Less
Submitted 12 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
"Always Nice and Confident, Sometimes wrong": Developer's Experiences Engaging Generative AI Chatbots Versus Human-Powered Q&A Platforms
Authors:
Jiachen Li,
Elizabeth Mynatt,
Varun Mishra,
Jonathan Bell
Abstract:
Software engineers have historically relied on human-powered Q&A platforms, like Stack Overflow (SO), as coding aids. With the rise of generative AI, developers have adopted AI chatbots, such as ChatGPT, in their software development process. Recognizing the potential parallels between human-powered Q&A platforms and AI-powered question-based chatbots, we investigate and compare how developers int…
▽ More
Software engineers have historically relied on human-powered Q&A platforms, like Stack Overflow (SO), as coding aids. With the rise of generative AI, developers have adopted AI chatbots, such as ChatGPT, in their software development process. Recognizing the potential parallels between human-powered Q&A platforms and AI-powered question-based chatbots, we investigate and compare how developers integrate this assistance into their real-world coding experiences by conducting thematic analysis of Reddit posts. Through a comparative study of SO and ChatGPT, we identified each platform's strengths, use cases, and barriers. Our findings suggest that ChatGPT offers fast, clear, comprehensive responses and fosters a more respectful environment than SO. However, concerns about ChatGPT's reliability stem from its overly confident tone and the absence of validation mechanisms like SO's voting system. Based on these findings, we recommend leveraging each platform's unique features to improve developer experiences in the future.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
npm-follower: A Complete Dataset Tracking the NPM Ecosystem
Authors:
Donald Pinckney,
Federico Cassano,
Arjun Guha,
Jonathan Bell
Abstract:
Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of package…
▽ More
Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of packages. However, prior work on NPM dataset construction typically has two limitations: 1) only metadata is scraped, and 2) packages or versions that are deleted from NPM can not be scraped. Over 330,000 versions of packages were deleted from NPM between July 2022 and May 2023. This data is critical for researchers as it often pertains to important questions of security and malware. We present npm-follower, a dataset and crawling architecture which archives metadata and code of all packages and versions as they are published, and is thus able to retain data which is later deleted. The dataset currently includes over 35 million versions of packages, and grows at a rate of about 1 million versions per month. The dataset is designed to be easily used by researchers answering questions involving either metadata or program analysis. Both the code and dataset are available at https://dependencies.science.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Duality of Lattices Associated to Left and Right Quotients
Authors:
Jason Bell,
Daniel Smertnig,
Hellis Tamm
Abstract:
We associate lattices to the sets of unions and intersections of left and right quotients of a regular language. For both unions and intersections, we show that the lattices we produce using left and right quotients are dual to each other. We also give necessary and sufficient conditions for these lattices to have maximal possible complexity.
We associate lattices to the sets of unions and intersections of left and right quotients of a regular language. For both unions and intersections, we show that the lattices we produce using left and right quotients are dual to each other. We also give necessary and sufficient conditions for these lattices to have maximal possible complexity.
△ Less
Submitted 6 September, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Amplification by Shuffling without Shuffling
Authors:
Borja Balle,
James Bell,
Adrià Gascón
Abstract:
Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol…
▽ More
Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol only grows sub-linearly in the number of clients. Moreover, we study the concrete efficiency of our protocol and show it can improve per-client communication by one or more orders of magnitude with respect to previous (approximate) shuffling protocols. We also show a differential privacy amplification result for alternating shuffling analogous to the one for uniform shuffling, and demonstrate that shuffling-based protocols for secure summation based a construction of Ishai et al. (FOCS'06) remain secure under the Alternating Shuffle. In the process we also develop a protocol for exact shuffling in single-server threat model with amortized logarithmic communication per-client which might be of independent interest.
△ Less
Submitted 7 September, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Quantitative estimates for the size of an intersection of sparse automatic sets
Authors:
Seda Albayrak,
Jason Bell
Abstract:
A theorem of Cobham says that if $k$ and $\ell$ are two multiplicatively independent natural numbers then a subset of the natural numbers that is both $k$- and $\ell$-automatic is eventually periodic. A multidimensional extension was later given by Semenov. In this paper, we give a quantitative version of the Cobham-Semenov theorem for sparse automatic sets, showing that the intersection of a spar…
▽ More
A theorem of Cobham says that if $k$ and $\ell$ are two multiplicatively independent natural numbers then a subset of the natural numbers that is both $k$- and $\ell$-automatic is eventually periodic. A multidimensional extension was later given by Semenov. In this paper, we give a quantitative version of the Cobham-Semenov theorem for sparse automatic sets, showing that the intersection of a sparse $k$-automatic subset of $\mathbb{N}^d$ and a sparse $\ell$-automatic subset of $\mathbb{N}^d$ is finite with size that can be explicitly bounded in terms of data from the automata that accept these sets.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
A Large Scale Analysis of Semantic Versioning in NPM
Authors:
Donald Pinckney,
Federico Cassano,
Arjun Guha,
Jonathan Bell
Abstract:
The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possi…
▽ More
The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers' imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Counterexamples to a Conjecture of Dombi in Additive Number Theory
Authors:
Jason P. Bell,
Jeffrey Shallit
Abstract:
We disprove a 2002 conjecture of Dombi from additive number theory. More precisely, we find examples of sets $A \subset \mathbb{N}$ with the property that $\mathbb{N} \setminus A$ is infinite, but the sequence $n \rightarrow |\{ (a,b,c) \, : \, n=a+b+c \text{ and } a,b,c \in A \}|$, counting the number of $3$-compositions using elements of $A$ only, is strictly increasing.
We disprove a 2002 conjecture of Dombi from additive number theory. More precisely, we find examples of sets $A \subset \mathbb{N}$ with the property that $\mathbb{N} \setminus A$ is infinite, but the sequence $n \rightarrow |\{ (a,b,c) \, : \, n=a+b+c \text{ and } a,b,c \in A \}|$, counting the number of $3$-compositions using elements of $A$ only, is strictly increasing.
△ Less
Submitted 28 December, 2022; v1 submitted 23 December, 2022;
originally announced December 2022.
-
Simplicity Bias Leads to Amplified Performance Disparities
Authors:
Samuel J. Bell,
Levent Sagun
Abstract:
Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of wha…
▽ More
Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex-as measured by performance difference on the test set. When subsets with different levels of complexity align with demographic groups, we term this difficulty disparity, a phenomenon that occurs even with balanced datasets that lack group/label associations. We show how difficulty disparity is a model-dependent quantity, and is further amplified in commonly-used models as selected by typical average performance scores. We quantify an amplification factor across a range of settings in order to compare disparity of different models on a fixed dataset. Finally, we present two real-world examples of difficulty amplification in action, resulting in worse-than-expected performance disparities between groups even when using a balanced dataset. The existence of such disparities in balanced datasets demonstrates that merely balancing sample sizes of groups is not sufficient to ensure unbiased performance. We hope this work presents a step towards measurable understanding of the role of model bias as it interacts with the structure of data, and call for additional model-dependent mitigation methods to be deployed alongside dataset audits.
△ Less
Submitted 8 June, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
CliMedBERT: A Pre-trained Language Model for Climate and Health-related Text
Authors:
B. Jalalzadeh Fard,
S. A. Hasan,
J. E. Bell
Abstract:
Climate change is threatening human health in unprecedented orders and many ways. These threats are expected to grow unless effective and evidence-based policies are developed and acted upon to minimize or eliminate them. Attaining such a task requires the highest degree of the flow of knowledge from science into policy. The multidisciplinary, location-specific, and vastness of published science m…
▽ More
Climate change is threatening human health in unprecedented orders and many ways. These threats are expected to grow unless effective and evidence-based policies are developed and acted upon to minimize or eliminate them. Attaining such a task requires the highest degree of the flow of knowledge from science into policy. The multidisciplinary, location-specific, and vastness of published science makes it challenging to keep track of novel work in this area, as well as making the traditional knowledge synthesis methods inefficient in infusing science into policy. To this end, we consider develo** multiple domain-specific language models (LMs) with different variations from Climate- and Health-related information, which can serve as a foundational step toward capturing available knowledge to enable solving different tasks, such as detecting similarities between climate- and health-related concepts, fact-checking, relation extraction, evidence of health effects to policy text generation, and more. To our knowledge, this is the first work that proposes develo** multiple domain-specific language models for the considered domains. We will make the developed models, resources, and codebase available for the researchers.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Computing the linear hull: Deciding Deterministic? and Unambiguous? for weighted automata over fields
Authors:
Jason P. Bell,
Daniel Smertnig
Abstract:
The (left) linear hull of a weighted automaton over a field is a topological invariant. If the automaton is minimal, the linear hull can be used to determine whether or not the automaton is equivalent to a deterministic one. Furthermore, the linear hull can also be used to determine whether the minimal automaton is equivalent to an unambiguous one. We show how to compute the linear hull, and thus…
▽ More
The (left) linear hull of a weighted automaton over a field is a topological invariant. If the automaton is minimal, the linear hull can be used to determine whether or not the automaton is equivalent to a deterministic one. Furthermore, the linear hull can also be used to determine whether the minimal automaton is equivalent to an unambiguous one. We show how to compute the linear hull, and thus prove that it is decidable whether or not a given automaton over a number field is equivalent to a deterministic one. In this case we are also able to compute an equivalent deterministic automaton. We also show the analogous decidability and computability result for the unambiguous case. Our results resolve a problem posed in a 2006 survey by Lombardy and Sakarovitch.
△ Less
Submitted 6 June, 2023; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Modeling the Machine Learning Multiverse
Authors:
Samuel J. Bell,
Onno P. Kampman,
Jesse Dodge,
Neil D. Lawrence
Abstract:
Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the multiverse analysis. Our framework builds upon the multiverse analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML sea…
▽ More
Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the multiverse analysis. Our framework builds upon the multiverse analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML search spaces, we model the multiverse with a Gaussian Process surrogate and apply Bayesian experimental design. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. For the machine learning community, the multiverse analysis is a simple and effective technique for identifying robust claims, for increasing transparency, and a step toward improved reproducibility.
△ Less
Submitted 12 October, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
The Effect of Task Ordering in Continual Learning
Authors:
Samuel J. Bell,
Neil D. Lawrence
Abstract:
We investigate the effect of task ordering on continual learning performance. We conduct an extensive series of empirical experiments on synthetic and naturalistic datasets and show that reordering tasks significantly affects the amount of catastrophic forgetting. Connecting to the field of curriculum learning, we show that the effect of task ordering can be exploited to modify continual learning…
▽ More
We investigate the effect of task ordering on continual learning performance. We conduct an extensive series of empirical experiments on synthetic and naturalistic datasets and show that reordering tasks significantly affects the amount of catastrophic forgetting. Connecting to the field of curriculum learning, we show that the effect of task ordering can be exploited to modify continual learning performance, and present a simple approach for doing so. Our method computes the distance between all pairs of tasks, where distance is defined as the source task curvature of a gradient step toward the target task. Using statistically rigorous methods and sound experimental design, we show that task ordering is an important aspect of continual learning that can be modified for improved performance.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL
Authors:
Han Wang,
Archit Sakhadeo,
Adam White,
James Bell,
Vincent Liu,
Xutong Zhao,
Puer Liu,
Tadashi Kozuno,
Alona Fyshe,
Martha White
Abstract:
The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to full…
▽ More
The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to fully specify the hyperparameters for an RL agent that learns online in the real world. The approach is conceptually simple: we first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model to identify promising hyperparameters. We identify several criteria to make this strategy effective, and develop an approach that satisfies these criteria. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Flexible and Optimal Dependency Management via Max-SMT
Authors:
Donald Pinckney,
Federico Cassano,
Arjun Guha,
Jon Bell,
Massimiliano Culpo,
Todd Gamblin
Abstract:
Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which i…
▽ More
Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which is particularly bad for web applications that need to minimize code size; 3) NPM's vulnerability fixing algorithm is also greedy, and can even introduce new vulnerabilities; and 4) NPM's ability to duplicate dependencies can break stateful frameworks and requires a lot of care to workaround. Although existing tools try to address these problems they are either brittle, rely on post hoc changes to the dependency tree, do not guarantee optimality, or are not composable.
We present PacSolve, a unifying framework and implementation for dependency solving which allows for customizable constraints and optimization goals. We use PacSolve to build MaxNPM, a complete, drop-in replacement for NPM, which empowers developers to combine multiple objectives when installing dependencies. We evaluate MaxNPM with a large sample of packages from the NPM ecosystem and show that it can: 1) reduce more vulnerabilities in dependencies than NPM's auditing tool in 33% of cases; 2) chooses newer dependencies than NPM in 14% of cases; and 3) chooses fewer dependencies than NPM in 21% of cases. All our code and data is open and available.
△ Less
Submitted 24 August, 2023; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Topological invariants for words of linear factor complexity
Authors:
Jason Bell
Abstract:
Given a finite alphabet $Σ$ and a right-infinite word $w$ over the alphabet $Σ$, we construct a topological space ${\rm Rec}(w)$ consisting of all right-infinite recurrent words whose factors are all factors of $w$, where we work up to an equivalence in which two words are equivalent if they have the exact same set of factors (finite contiguous subwords). We show that ${\rm Rec}(w)$ can be endowed…
▽ More
Given a finite alphabet $Σ$ and a right-infinite word $w$ over the alphabet $Σ$, we construct a topological space ${\rm Rec}(w)$ consisting of all right-infinite recurrent words whose factors are all factors of $w$, where we work up to an equivalence in which two words are equivalent if they have the exact same set of factors (finite contiguous subwords). We show that ${\rm Rec}(w)$ can be endowed with a natural topology and we show that if $w$ is word of linear factor complexity then ${\rm Rec}(w)$ is a finite topological space. In addition, we note that there are examples which show that if $f:\mathbb{N}\to \mathbb{N}$ is a function that tends to infinity as $n\to \infty$ then there is a word whose factor complexity function is ${\rm O}(nf(n))$ such that ${\rm Rec}(w)$ is an infinite set. Finally, we pose a realization problem: which finite topological spaces can arise as ${\rm Rec}(w)$ for a word of linear factor complexity?
△ Less
Submitted 10 May, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based Sparse PCA Network
Authors:
Sundaresh Ram,
Wenfei Tang,
Alexander J. Bell,
Cara Spencer,
Alexander Buschhaus,
Charles R. Hatt,
Marina Pasca diMagliano,
Jeffrey J. Rodriguez,
Stefanie Galban,
Craig J. Galban
Abstract:
Early detection of lung cancer is critical for improvement of patient survival. To address the clinical need for efficacious treatments, genetically engineered mouse models (GEMM) have become integral in identifying and evaluating the molecular underpinnings of this complex disease that may be exploited as therapeutic targets. Assessment of GEMM tumor burden on histopathological sections performed…
▽ More
Early detection of lung cancer is critical for improvement of patient survival. To address the clinical need for efficacious treatments, genetically engineered mouse models (GEMM) have become integral in identifying and evaluating the molecular underpinnings of this complex disease that may be exploited as therapeutic targets. Assessment of GEMM tumor burden on histopathological sections performed by manual inspection is both time consuming and prone to subjective bias. Therefore, an interplay of needs and challenges exists for computer-aided diagnostic tools, for accurate and efficient analysis of these histopathology images. In this paper, we propose a simple machine learning approach called the graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E). Our method comprises four steps: 1) cascaded graph-based sparse PCA, 2) PCA binary hashing, 3) block-wise histograms, and 4) support vector machine (SVM) classification. In our proposed architecture, graph-based sparse PCA is employed to learn the filter banks of the multiple stages of a convolutional network. This is followed by PCA hashing and block histograms for indexing and pooling. The meaningful features extracted from this GS-PCA are then fed to an SVM classifier. We evaluate the performance of the proposed algorithm on H&E slides obtained from an inducible K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC) and show that our algorithm is efficient and provides improved detection accuracy compared to existing algorithms.
△ Less
Submitted 15 February, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Behavioral Experiments for Understanding Catastrophic Forgetting
Authors:
Samuel J. Bell,
Neil D. Lawrence
Abstract:
In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exp…
▽ More
In this paper we explore whether the fundamental tool of experimental psychology, the behavioral experiment, has the power to generate insight not only into humans and animals, but artificial systems too. We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting. Alongside our empirical findings, we demonstrate an alternative, behavior-first approach to investigating neural network phenomena.
△ Less
Submitted 13 December, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Monitoring the Mental State of Cooperativeness for Guiding an Elderly Person in Sit-to-Stand Assistance
Authors:
John Bell,
H. Harry Asada
Abstract:
In providing physical assistance to elderly people, ensuring cooperative behavior from the elderly persons is a critical requirement. In sit-to-stand assistance, for example, an older adult must lean forward, so that the body mass can shift towards the feet before a caregiver starts lifting the body. An experienced caregiver guides the older adult through verbal communications and physical interac…
▽ More
In providing physical assistance to elderly people, ensuring cooperative behavior from the elderly persons is a critical requirement. In sit-to-stand assistance, for example, an older adult must lean forward, so that the body mass can shift towards the feet before a caregiver starts lifting the body. An experienced caregiver guides the older adult through verbal communications and physical interactions, so that the older adult may be cooperative throughout the process. This guidance is of paramount importance and is a major challenge in introducing a robotic aid to the eldercare environment. The wide-scope goal of the current work is to develop an intelligent eldercare robot that can a) monitor the mental state of an older adult, and b) guide the older adult through an assisting procedure so that he/she can be cooperative in being assisted. The current work presents a basic modeling framework for describing a human's physical behaviors reflecting an internal mental state, and an algorithm for estimating the mental state through interactive observations. The sit-to-stand assistance problem is considered for the initial study. A simple Kalman Filter is constructed for estimating the level of cooperativeness in response to applied cues, with a thresholding scheme being used to make judgments on the cooperativeness state.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
MPC-Friendly Commitments for Publicly Verifiable Covert Security
Authors:
Nitin Agrawal,
James Bell,
Adrià Gascón,
Matt J. Kusner
Abstract:
We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verif…
▽ More
We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verifiable covert (PVC) security model, which is a relaxation of the malicious model of MPC appropriate in settings where P1 faces a reputational harm if caught cheating.
We introduce the notion of PVC commitment scheme and indexed hash functions to build commitments schemes tailored to the PVC framework, and propose constructions for both arithmetic and Boolean circuits that result in very efficient circuits. From a practical standpoint, our constructions for Boolean circuits are $60\times$ faster to evaluate securely, and use $36\times$ less communication than baseline methods based on hashing. Moreover, we show that our constructions are tight in terms of required non-linear operations, by proving lower bounds on the nonlinear gate count of commitment verification circuits. Finally, we present a technique to amplify the security properties our constructions that allows to efficiently recover malicious guarantees with statistical security.
△ Less
Submitted 27 January, 2022; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Automatic Sequences of Rank Two
Authors:
Jason Bell,
Jeffrey Shallit
Abstract:
Given a right-infinite word $\bf x$ over a finite alphabet $A$, the rank of $\bf x$ is the size of the smallest set $S$ of words over $A$ such that $\bf x$ can be realized as an infinite concatenation of words in $S$. We show that the property of having rank two is decidable for the class of $k$-automatic words for each integer $k\ge 2$.
Given a right-infinite word $\bf x$ over a finite alphabet $A$, the rank of $\bf x$ is the size of the smallest set $S$ of words over $A$ such that $\bf x$ can be realized as an infinite concatenation of words in $S$. We show that the property of having rank two is decidable for the class of $k$-automatic words for each integer $k\ge 2$.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Swap-Free Fat-Water Separation in Dixon MRI using Conditional Generative Adversarial Networks
Authors:
Nicolas Basty,
Marjola Thanaj,
Madeleine Cule,
Elena P. Sorokin,
Yi Liu,
Jimmy D. Bell,
E. Louise Thomas,
Brandon Whitcher
Abstract:
Dixon MRI is widely used for body composition studies. Current processing methods associated with large whole-body volumes are time intensive and prone to artifacts during fat-water separation performed on the scanner, making the data difficult to analyse. The most common artifact are fat-water swaps, where the labels are inverted at the voxel level. It is common for researchers to discard swapped…
▽ More
Dixon MRI is widely used for body composition studies. Current processing methods associated with large whole-body volumes are time intensive and prone to artifacts during fat-water separation performed on the scanner, making the data difficult to analyse. The most common artifact are fat-water swaps, where the labels are inverted at the voxel level. It is common for researchers to discard swapped data (generally around 10%), which can be wasteful and lead to unintended biases. The UK Biobank is acquiring Dixon MRI for over 100,000 participants, and thousands of swaps will occur. If those go undetected, errors will propagate into processes such as abdominal organ segmentation and dilute the results in population-based analyses. There is a clear need for a fast and robust method to accurately separate fat and water channels. In this work we propose such a method based on style transfer using a conditional generative adversarial network. We also introduce a new Dixon loss function for the generator model. Using data from the UK Biobank Dixon MRI, our model is able to predict highly accurate fat and water channels that are free from artifacts. We show that the model separates fat and water channels using either single input (in-phase) or dual input (in-phase and opposed-phase), with the latter producing improved results. Our proposed method enables faster and more accurate downstream analysis of body composition from Dixon MRI in population studies by eliminating the need for visual inspection or discarding data due to fat-water swaps.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Perspectives on Machine Learning from Psychology's Reproducibility Crisis
Authors:
Samuel J. Bell,
Onno P. Kampman
Abstract:
In the early 2010s, a crisis of reproducibility rocked the field of psychology. Following a period of reflection, the field has responded with radical reform of its scientific practices. More recently, similar questions about the reproducibility of machine learning research have also come to the fore. In this short paper, we present select ideas from psychology's reformation, translating them into…
▽ More
In the early 2010s, a crisis of reproducibility rocked the field of psychology. Following a period of reflection, the field has responded with radical reform of its scientific practices. More recently, similar questions about the reproducibility of machine learning research have also come to the fore. In this short paper, we present select ideas from psychology's reformation, translating them into relevance for a machine learning audience.
△ Less
Submitted 23 April, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Integrating Novelty Detection Capabilities with MSL Mastcam Operations to Enhance Data Analysis
Authors:
Paul Horton,
Hannah R. Kerner,
Samantha Jacob,
Ernest Cisneros,
Kiri L. Wagstaff,
James Bell
Abstract:
While innovations in scientific instrumentation have pushed the boundaries of Mars rover mission capabilities, the increase in data complexity has pressured Mars Science Laboratory (MSL) and future Mars rover operations staff to quickly analyze complex data sets to meet progressively shorter tactical and strategic planning timelines. MSLWEB is an internal data tracking tool used by operations staf…
▽ More
While innovations in scientific instrumentation have pushed the boundaries of Mars rover mission capabilities, the increase in data complexity has pressured Mars Science Laboratory (MSL) and future Mars rover operations staff to quickly analyze complex data sets to meet progressively shorter tactical and strategic planning timelines. MSLWEB is an internal data tracking tool used by operations staff to perform first pass analysis on MSL image sequences, a series of products taken by the Mast camera, Mastcam. Mastcam's multiband multispectral image sequences require more complex analysis compared to standard 3-band RGB images. Typically, these are analyzed using traditional methods to identify unique features within the sequence. Given the short time frame of tactical planning in which downlinked images might need to be analyzed (within 5-10 hours before the next uplink), there exists a need to triage analysis time to focus on the most important sequences and parts of a sequence. We address this need by creating products for MSLWEB that use novelty detection to help operations staff identify unusual data that might be diagnostic of new or atypical compositions or mineralogies detected within an imaging scene. This was achieved in two ways: 1) by creating products for each sequence to identify novel regions in the image, and 2) by assigning multispectral sequences a sortable novelty score. These new products provide colorized heat maps of inferred novelty that operations staff can use to rapidly review downlinked data and focus their efforts on analyzing potentially new kinds of diagnostic multispectral signatures. This approach has the potential to guide scientists to new discoveries by quickly drawing their attention to often subtle variations not detectable with simple color composites.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Lie complexity of words
Authors:
Jason P. Bell,
Jeffrey Shallit
Abstract:
Given a finite alphabet $Σ$ and a right-infinite word $\bf w$ over $Σ$, we define the Lie complexity function $L_{\bf w}:\mathbb{N}\to \mathbb{N}$, whose value at $n$ is the number of conjugacy classes (under cyclic shift) of length-$n$ factors $x$ of $\bf w$ with the property that every element of the conjugacy class appears in $\bf w$.
We show that the Lie complexity function is uniformly boun…
▽ More
Given a finite alphabet $Σ$ and a right-infinite word $\bf w$ over $Σ$, we define the Lie complexity function $L_{\bf w}:\mathbb{N}\to \mathbb{N}$, whose value at $n$ is the number of conjugacy classes (under cyclic shift) of length-$n$ factors $x$ of $\bf w$ with the property that every element of the conjugacy class appears in $\bf w$.
We show that the Lie complexity function is uniformly bounded for words with linear factor complexity, and as a result we show that words of linear factor complexity have at most finitely many primitive factors $y$ with the property that $y^n$ is again a factor for every $n$.
We then look at automatic sequences and show that the Lie complexity function of a $k$-automatic sequence is again $k$-automatic.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
Porting WarpX to GPU-accelerated platforms
Authors:
A. Myers,
A. Almgren,
L. D. Amorim,
J. Bell,
L. Fedeli,
L. Ge,
K. Gott,
D. P. Grote,
M. Hogan,
A. Huebl,
R. Jambunathan,
R. Lehe,
C. Ng,
M. Rowan,
O. Shapoval,
M. Thévenet,
J. -L. Vay,
H. Vincenti,
E. Yang,
N. Zaïm,
W. Zhang,
Y. Zhao,
E. Zoni
Abstract:
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren…
▽ More
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems.
△ Less
Submitted 2 September, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications
Authors:
Weiqun Zhang,
Andrew Myers,
Kevin Gott,
Ann Almgren,
John Bell
Abstract:
Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of ECP applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modelling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for thes…
▽ More
Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of ECP applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modelling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multi-physics algorithms. AMReX supports algorithms that solve systems of partial differential equations (PDEs) in simple or complex geometries, and those that use particles and/or particle-mesh operations to represent component physical processes. In this paper, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Large-Scale Analysis of Iliopsoas Muscle Volumes in the UK Biobank
Authors:
Julie Fitzpatrick,
Nicolas Basty,
Madeleine Cule,
Yi Liu,
Jimmy D. Bell,
E. Louise Thomas,
Brandon Whitcher
Abstract:
Psoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurementand manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of…
▽ More
Psoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurementand manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of the psoas and iliacus muscles) using a convolutional neural network. Magnetic resonance images were obtained from the UK Biobank for 5,000 male and female participants, balanced for age, gender and BMI. Ninety manual annotations were available for model training and validation. The model showed excellent performance against out-of-sample data (dice score coefficient of 0.912 +/- 0.018). Iliopsoas muscle volumes were successfully measured in all 5,000 participants. Iliopsoas volume was greater in male compared with female subjects. There was a small but significant asymmetry between left and right iliopsoas muscle volumes. We also found that iliopsoas volume was significantly related to height, BMI and age, and that there was an acceleration in muscle volume decrease in men with age. Our method provides a robust technique for measuring iliopsoas muscle volume that can be applied to large cohorts.
△ Less
Submitted 14 August, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Image Processing and Quality Control for Abdominal Magnetic Resonance Imaging in the UK Biobank
Authors:
Nicolas Basty,
Yi Liu,
Madeleine Cule,
E. Louise Thomas,
Jimmy D. Bell,
Brandon Whitcher
Abstract:
An end-to-end image analysis pipeline is presented for the abdominal MRI protocol used in the UK Biobank on the first 38,971 participants. Emphasis is on the processing steps necessary to ensure a high-level of data quality and consistency is produced in order to prepare the datasets for downstream quantitative analysis, such as segmentation and parameter estimation. Quality control procedures hav…
▽ More
An end-to-end image analysis pipeline is presented for the abdominal MRI protocol used in the UK Biobank on the first 38,971 participants. Emphasis is on the processing steps necessary to ensure a high-level of data quality and consistency is produced in order to prepare the datasets for downstream quantitative analysis, such as segmentation and parameter estimation. Quality control procedures have been incorporated to detect and, where possible, correct issues in the raw data. Detection of fat-water swaps in the Dixon series is performed by a deep learning model and corrected automatically. Bone joints are predicted using a hybrid atlas-based registration and deep learning model for the shoulders, hips and knees. Simultaneous estimation of proton density fat fraction and transverse relaxivity (R2*) is performed using both the magnitude and phase information for the single-slice multiecho series. Approximately 98.1% of the two-point Dixon acquisitions were successfully processed and passed quality control, with 99.98% of the high-resolution T1-weighted 3D volumes succeeding. Approximately 99.98% of the single-slice multiecho acquisitions covering the liver were successfully processed and passed quality control, with 97.6% of the single-slice multiecho acquisitions covering the pancreas succeeding. At least one fat-water swap was detected in 1.8% of participants. With respect to the bone joints, approximately 3.3% of participants were missing at least one knee joint and 0.8% were missing at least one shoulder joint. For the participants who received both single-slice multiecho acquisition protocols for the liver a systematic difference between the two protocols was identified and modeled using multiple linear regression. The findings presented here will be invaluable for scientists who seek to use image-derived phenotypes from the abdominal MRI protocol.
△ Less
Submitted 16 July, 2020; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Differentially Private Health Tokens for Estimating COVID-19 Risk
Authors:
David Butler,
Chris Hicks,
James Bell,
Carsten Maple,
Jon Crowcroft
Abstract:
In the fight against Covid-19, many governments and businesses are in the process of evaluating, trialling and even implementing so-called immunity passports. Also known as antibody or health certificates, there is a clear demand for any technology that could allow people to return to work and other crowded places without placing others at risk. One of the major criticisms of such systems is that…
▽ More
In the fight against Covid-19, many governments and businesses are in the process of evaluating, trialling and even implementing so-called immunity passports. Also known as antibody or health certificates, there is a clear demand for any technology that could allow people to return to work and other crowded places without placing others at risk. One of the major criticisms of such systems is that they could be misused to unfairly discriminate against those without immunity, allowing the formation of an `immuno-privileged' class of people. In this work we are motivated to explore an alternative technical solution that is non-discriminatory by design. In particular we propose health tokens -- randomised health certificates which, using methods from differential privacy, allow individual test results to be randomised whilst still allowing useful aggregate risk estimates to be calculated. We show that health tokens could mitigate immunity-based discrimination whilst still presenting a viable mechanism for estimating the collective transmission risk posed by small groups of users. We evaluate the viability of our approach in the context of identity-free and identity-binding use cases and then consider a number of possible attacks. Our experimental results show that for groups of size 500 or more, the error associated with our method can be as low as 0.03 on average and thus the aggregated results can be useful in a number of identity-free contexts. Finally, we present the results of our open-source prototype which demonstrates the practicality of our solution.
△ Less
Submitted 8 July, 2020; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Kiwifruit detection in challenging conditions
Authors:
Mahla Nejati,
Nicky Penhall,
Henry Williams,
Jamie Bell,
JongYoon Lim,
Ho Seok Ahn,
Bruce MacDonald
Abstract:
Accurate and reliable kiwifruit detection is one of the biggest challenges in develo** a selective fruit harvesting robot. The vision system of an orchard robot faces difficulties such as dynamic lighting conditions and fruit occlusions. This paper presents a semantic segmentation approach with two novel image prepossessing techniques designed to detect kiwifruit under the harsh lighting conditi…
▽ More
Accurate and reliable kiwifruit detection is one of the biggest challenges in develo** a selective fruit harvesting robot. The vision system of an orchard robot faces difficulties such as dynamic lighting conditions and fruit occlusions. This paper presents a semantic segmentation approach with two novel image prepossessing techniques designed to detect kiwifruit under the harsh lighting conditions found in the canopy. The performance of the presented system is evaluated on a 3D real-world image set of kiwifruit under different lighting conditions (typical, glare, and overexposed). Alone the semantic segmentation approach achieves an F1_score of 0.82 on the typical lighting image set, but struggles with harsh lighting with an F1_score of 0.13. Utilising the prepossessing techniques the vision system under harsh lighting improves to an F1_score 0.42. To address the fruit occlusion challenge, the overall approach was found to be capable of detecting 87.0% of non-occluded and 30.0% of occluded kiwifruit across all lighting conditions.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Deep Neural Network Based Real-time Kiwi Fruit Flower Detection in an Orchard Environment
Authors:
JongYoon Lim,
Ho Seok Ahn,
Mahla Nejati,
Jamie Bell,
Henry Williams,
Bruce A. MacDonald
Abstract:
In this paper, we present a novel approach to kiwi fruit flower detection using Deep Neural Networks (DNNs) to build an accurate, fast, and robust autonomous pollination robot system. Recent work in deep neural networks has shown outstanding performance on object detection tasks in many areas. Inspired this, we aim for exploiting DNNs for kiwi fruit flower detection and present intensive experimen…
▽ More
In this paper, we present a novel approach to kiwi fruit flower detection using Deep Neural Networks (DNNs) to build an accurate, fast, and robust autonomous pollination robot system. Recent work in deep neural networks has shown outstanding performance on object detection tasks in many areas. Inspired this, we aim for exploiting DNNs for kiwi fruit flower detection and present intensive experiments and their analysis on two state-of-the-art object detectors; Faster R-CNN and Single Shot Detector (SSD) Net, and feature extractors; Inception Net V2 and NAS Net with real-world orchard datasets. We also compare those approaches to find an optimal model which is suitable for a real-time agricultural pollination robot system in terms of accuracy and processing speed. We perform experiments with dataset collected from different seasons and locations (spatio-temporal consistency) in order to demonstrate the performance of the generalized model. The proposed system demonstrates promising results of 0.919, 0.874, and 0.889 for precision, recall, and F1-score respectively on our real-world dataset, and the performance satisfies the requirement for deploying the system onto an autonomous pollination robotics system.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
TraceSecure: Towards Privacy Preserving Contact Tracing
Authors:
James Bell,
David Butler,
Chris Hicks,
Jon Crowcroft
Abstract:
Contact tracing is being widely employed to combat the spread of COVID-19. Many apps have been developed that allow for tracing to be done automatically based off location and interaction data generated by users. There are concerns, however, regarding the privacy and security of users data when using these apps. These concerns are paramount for users who contract the virus, as they are generally r…
▽ More
Contact tracing is being widely employed to combat the spread of COVID-19. Many apps have been developed that allow for tracing to be done automatically based off location and interaction data generated by users. There are concerns, however, regarding the privacy and security of users data when using these apps. These concerns are paramount for users who contract the virus, as they are generally required to release all their data. Motivated by the need to protect users privacy we propose two solutions to this problem. Our first solution builds on current "message based" methods and our second leverages ideas from secret sharing and additively homomorphic encryption.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
The upper density of an automatic set is rational
Authors:
Jason P. Bell
Abstract:
Given a natural number $k\ge 2$ and a $k$-automatic set $S$ of natural numbers, we show that the lower density and upper density of $S$ are recursively computable rational numbers and we provide an algorithm for computing these quantities. In addition, we show that for every natural number $k\ge 2$ and every pair of rational numbers $(α,β)$ with $0<α<β<1$ or with $(α,β)\in \{(0,0),(1,1)\}$ there i…
▽ More
Given a natural number $k\ge 2$ and a $k$-automatic set $S$ of natural numbers, we show that the lower density and upper density of $S$ are recursively computable rational numbers and we provide an algorithm for computing these quantities. In addition, we show that for every natural number $k\ge 2$ and every pair of rational numbers $(α,β)$ with $0<α<β<1$ or with $(α,β)\in \{(0,0),(1,1)\}$ there is a $k$-automatic subset of the natural numbers whose lower density and upper density are $α$ and $β$ respectively, and we show that these are precisely the values that can occur as the lower and upper densities of an automatic set.
△ Less
Submitted 12 April, 2021; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Private Summation in the Multi-Message Shuffle Model
Authors:
Borja Balle,
James Bell,
Adria Gascon,
Kobbi Nissim
Abstract:
The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a…
▽ More
The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a secure shuffler is used to transmit messages from users to the collector in a way that hides which messages came from which user. An interesting feature of the shuffle model is that increasing the amount of messages sent by each user can lead to protocols with accuracies comparable to the ones achievable in the central model. In particular, for the problem of privately computing the sum of $n$ bounded real values held by $n$ different users, Cheu et al. showed that $O(\sqrt{n})$ messages per user suffice to achieve $O(1)$ error (the optimal rate in the central model), while Balle et al. (CRYPTO 2019) recently showed that a single message per user leads to $Θ(n^{1/3})$ MSE (mean squared error), a rate strictly in-between what is achievable in the local and central models.
This paper introduces two new protocols for summation in the shuffle model with improved accuracy and communication trade-offs. Our first contribution is a recursive construction based on the protocol from Balle et al. mentioned above, providing $\mathrm{poly}(\log \log n)$ error with $O(\log \log n)$ messages per user. The second contribution is a protocol with $O(1)$ error and $O(1)$ messages per user based on a novel analysis of the reduction from secure summation to shuffling introduced by Ishai et al. (FOCS 2006) (the original reduction required $O(\log n)$ messages per user).
△ Less
Submitted 19 December, 2022; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Iterative Construction of Gaussian Process Surrogate Models for Bayesian Inference
Authors:
Leen Alawieh,
Jonathan Goodman,
John B. Bell
Abstract:
A new algorithm is developed to tackle the issue of sampling non-Gaussian model parameter posterior probability distributions that arise from solutions to Bayesian inverse problems. The algorithm aims to mitigate some of the hurdles faced by traditional Markov Chain Monte Carlo (MCMC) samplers, through constructing proposal probability densities that are both, easy to sample and that provide a bet…
▽ More
A new algorithm is developed to tackle the issue of sampling non-Gaussian model parameter posterior probability distributions that arise from solutions to Bayesian inverse problems. The algorithm aims to mitigate some of the hurdles faced by traditional Markov Chain Monte Carlo (MCMC) samplers, through constructing proposal probability densities that are both, easy to sample and that provide a better approximation to the target density than a simple Gaussian proposal distribution would. To achieve that, a Gaussian proposal distribution is augmented with a Gaussian Process (GP) surface that helps capture non-linearities in the log-likelihood function. In order to train the GP surface, an iterative approach is adopted for the optimal selection of points in parameter space. Optimality is sought by maximizing the information gain of the GP surface using a minimum number of forward model simulation runs. The accuracy of the GP-augmented surface approximation is assessed in two ways. The first consists of comparing predictions obtained from the approximate surface with those obtained through running the actual simulation model at hold-out points in parameter space. The second consists of a measure based on the relative variance of sample weights obtained from sampling the approximate posterior probability distribution of the model parameters. The efficacy of this new algorithm is tested on inferring reaction rate parameters in a 3-node and 6-node network toy problems, which imitate idealized reaction networks in combustion applications.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Private Protocols for U-Statistics in the Local Model and Beyond
Authors:
James Bell,
Aurélien Bellet,
Adrià Gascón,
Tejas Kulkarni
Abstract:
In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical ris…
▽ More
In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. We first introduce an LDP protocol based on quantizing the data into bins and applying randomized response, which guarantees an $ε$-LDP estimate with a Mean Squared Error (MSE) of $O(1/\sqrt{n}ε)$ under regularity assumptions on the $U$-statistic or the data distribution. We then propose a specialized protocol for AUC based on a novel use of hierarchical histograms that achieves MSE of $O(α^3/nε^2)$ for arbitrary data distribution. We also show that 2-party secure computation allows to design a protocol with MSE of $O(1/nε^2)$, without any assumption on the kernel function or data distribution and with total communication linear in the number of users $n$. Finally, we evaluate the performance of our protocols through experiments on synthetic and real datasets.
△ Less
Submitted 2 March, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Improved Summation from Shuffling
Authors:
Borja Balle,
James Bell,
Adria Gascon,
Kobbi Nissim
Abstract:
A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ re…
▽ More
A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ requires the number of messages sent by each party to {\em grow} logarithmically with $n$ as $O(\log n + σ)$. In this note we give an improved analysis achieving a dependency of the form $O(1+σ/\log n)$. Conceptually, this addresses the intuitive question left open by Ishai et al.\ of whether the shuffling step in their protocol provides a "hiding in the crowd" amplification effect as $n$ increases. From a practical perspective, our analysis provides explicit constants and shows, for example, that the method of Ishai et al.\ applied to summation of $32$-bit numbers from $n=10^4$ parties sending $12$ messages each provides statistical security $2^{-40}$.
△ Less
Submitted 24 September, 2019;
originally announced September 2019.
-
Differentially Private Summation with Multi-Message Shuffling
Authors:
Borja Balle,
James Bell,
Adria Gascon,
Kobbi Nissim
Abstract:
In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication…
▽ More
In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication per party. Whether this gap in number of messages is necessary was left open by Cheu et al.
In this note we show a protocol with $O(1/ε)$ error and $O(\log(n/δ))$ messages of size $O(\log(n))$ per party. This protocol is based on the work of Ishai et al.\ (FOCS 2006) showing how to implement distributed summation from secure shuffling, and the observation that this allows simulating the Laplace mechanism in the shuffle model.
△ Less
Submitted 21 August, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Leaf segmentation through the classification of edges
Authors:
Jonathan Bell,
Hannah M. Dee
Abstract:
We present an approach to leaf level segmentation of images of Arabidopsis thaliana plants based upon detected edges. We introduce a novel approach to edge classification, which forms an important part of a method to both count the leaves and establish the leaf area of a growing plant from images obtained in a high-throughput phenoty** system. Our technique uses a relatively shallow convolutiona…
▽ More
We present an approach to leaf level segmentation of images of Arabidopsis thaliana plants based upon detected edges. We introduce a novel approach to edge classification, which forms an important part of a method to both count the leaves and establish the leaf area of a growing plant from images obtained in a high-throughput phenoty** system. Our technique uses a relatively shallow convolutional neural network to classify image edges as background, plant edge, leaf-on-leaf edge or internal leaf noise. The edges themselves were found using the Canny edge detector and the classified edges can be used with simple image processing techniques to generate a region-based segmentation in which the leaves are distinct. This approach is strong at distinguishing occluding pairs of leaves where one leaf is largely hidden, a situation which has proved troublesome for plant image analysis systems in the past. In addition, we introduce the publicly available plant image dataset that was used for this work.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.