-
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Authors:
Colin White,
Samuel Dooley,
Manley Roberts,
Arka Pal,
Ben Feuer,
Siddhartha Jain,
Ravid Shwartz-Ziv,
Neel Jain,
Khalid Saifullah,
Siddartha Naidu,
Chinmay Hegde,
Yann LeCun,
Tom Goldstein,
Willie Neiswanger,
Micah Goldblum
Abstract:
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In…
▽ More
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In this work, we introduce a new benchmark for LLMs designed to be immune to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. We release LiveBench, the first benchmark that (1) contains frequently-updated questions from recent information sources, (2) scores answers automatically according to objective ground-truth values, and (3) contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis. To achieve this, LiveBench contains questions that are based on recently-released math competitions, arXiv papers, news articles, and datasets, and it contains harder, contamination-free versions of tasks from previous benchmarks such as Big-Bench Hard, AMPS, and IFEval. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size. LiveBench is difficult, with top models achieving below 65% accuracy. We release all questions, code, and model answers. Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time so that LiveBench can distinguish between the capabilities of LLMs as they improve in the future. We welcome community engagement and collaboration for expanding the benchmark tasks and models.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Discovering influential text using convolutional neural networks
Authors:
Megan Ayers,
Luke Sanford,
Margaret Roberts,
Eddie Yang
Abstract:
Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focus…
▽ More
Experimental methods for estimating the impacts of text on human evaluation have been widely used in the social sciences. However, researchers in experimental settings are usually limited to testing a small number of pre-specified text treatments. While efforts to mine unstructured texts for features that causally affect outcomes have been ongoing in recent years, these models have primarily focused on the topics or specific words of text, which may not always be the mechanism of the effect. We connect these efforts with NLP interpretability techniques and present a method for flexibly discovering clusters of similar text phrases that are predictive of human reactions to texts using convolutional neural networks. When used in an experimental setting, this method can identify text treatments and their effects under certain assumptions. We apply the method to two datasets. The first enables direct validation of the model's ability to detect phrases known to cause the outcome. The second demonstrates its ability to flexibly discover text treatments with varying textual structures. In both cases, the model learns a greater variety of text treatments compared to benchmark methods, and these text features quantitatively meet or exceed the ability of benchmark methods to predict the outcome.
△ Less
Submitted 21 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Large Language Models Must Be Taught to Know What They Don't Know
Authors:
Sanyam Kapoor,
Nate Gruver,
Manley Roberts,
Katherine Collins,
Arka Pal,
Umang Bhatt,
Adrian Weller,
Samuel Dooley,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati…
▽ More
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
A study on the adequacy of common IQA measures for medical images
Authors:
Anna Breger,
Clemens Karner,
Ian Selby,
Janek Gröhl,
Sören Dittmer,
Edward Lilley,
Judith Babar,
Jake Beckford,
Timothy J Sadler,
Shahab Shahipasand,
Arthikkaa Thavakumar,
Michael Roberts,
Carola-Bibiane Schönlieb
Abstract:
Image quality assessment (IQA) is standard practice in the development stage of novel machine learning algorithms that operate on images. The most commonly used IQA measures have been developed and tested for natural images, but not in the medical setting. Reported inconsistencies arising in medical images are not surprising, as they have different properties than natural images. In this study, we…
▽ More
Image quality assessment (IQA) is standard practice in the development stage of novel machine learning algorithms that operate on images. The most commonly used IQA measures have been developed and tested for natural images, but not in the medical setting. Reported inconsistencies arising in medical images are not surprising, as they have different properties than natural images. In this study, we test the applicability of common IQA measures for medical image data by comparing their assessment to manually rated chest X-ray (5 experts) and photoacoustic image data (1 expert). Moreover, we include supplementary studies on grayscale natural images and accelerated brain MRI data. The results of all experiments show a similar outcome in line with previous findings for medical imaging: PSNR and SSIM in the default setting are in the lower range of the result list and HaarPSI outperforms the other tested measures in the overall performance. Also among the top performers in our medical experiments are the full reference measures DISTS, FSIM, LPIPS and MS-SSIM. Generally, the results on natural images yield considerably higher correlations, suggesting that the additional employment of tailored IQA measures for medical imaging algorithms is needed.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A study of why we need to reassess full reference image quality assessment with medical images
Authors:
Anna Breger,
Ander Biguri,
Malena Sabaté Landman,
Ian Selby,
Nicole Amberg,
Elisabeth Brunner,
Janek Gröhl,
Sepideh Hatamikia,
Clemens Karner,
Lipeng Ning,
Sören Dittmer,
Michael Roberts,
AIX-COVNET Collaboration,
Carola-Bibiane Schönlieb
Abstract:
Image quality assessment (IQA) is not just indispensable in clinical practice to ensure high standards, but also in the development stage of novel algorithms that operate on medical images with reference data. This paper provides a structured and comprehensive collection of examples where the two most common full reference (FR) image quality measures prove to be unsuitable for the assessment of no…
▽ More
Image quality assessment (IQA) is not just indispensable in clinical practice to ensure high standards, but also in the development stage of novel algorithms that operate on medical images with reference data. This paper provides a structured and comprehensive collection of examples where the two most common full reference (FR) image quality measures prove to be unsuitable for the assessment of novel algorithms using different kinds of medical images, including real-world MRI, CT, OCT, X-Ray, digital pathology and photoacoustic imaging data. In particular, the FR-IQA measures PSNR and SSIM are known and tested for working successfully in many natural imaging tasks, but discrepancies in medical scenarios have been noted in the literature. Inconsistencies arising in medical images are not surprising, as they have very different properties than natural images which have not been targeted nor tested in the development of the mentioned measures, and therefore might imply wrong judgement of novel methods for medical images. Therefore, improvement is urgently needed in particular in this era of AI to increase explainability, reproducibility and generalizability in machine learning for medical imaging and beyond. On top of the pitfalls we will provide ideas for future research as well as suggesting guidelines for the usage of FR-IQA measures applied to medical images.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
FedMAP: Unlocking Potential in Personalized Federated Learning through Bi-Level MAP Optimization
Authors:
Fan Zhang,
Carlos Esteve-Yagüe,
Sören Dittmer,
Carola-Bibiane Schönlieb,
Michael Roberts
Abstract:
Federated Learning (FL) enables collaborative training of machine learning models on decentralized data while preserving data privacy. However, data across clients often differs significantly due to class imbalance, feature distribution skew, sample size imbalance, and other phenomena. Leveraging information from these not identically distributed (non-IID) datasets poses substantial challenges. FL…
▽ More
Federated Learning (FL) enables collaborative training of machine learning models on decentralized data while preserving data privacy. However, data across clients often differs significantly due to class imbalance, feature distribution skew, sample size imbalance, and other phenomena. Leveraging information from these not identically distributed (non-IID) datasets poses substantial challenges. FL methods based on a single global model cannot effectively capture the variations in client data and underperform in non-IID settings. Consequently, Personalized FL (PFL) approaches that adapt to each client's data distribution but leverage other clients' data are essential but currently underexplored. We propose a novel Bayesian PFL framework using bi-level optimization to tackle the data heterogeneity challenges. Our proposed framework utilizes the global model as a prior distribution within a Maximum A Posteriori (MAP) estimation of personalized client models. This approach facilitates PFL by integrating shared knowledge from the prior, thereby enhancing local model performance, generalization ability, and communication efficiency. We extensively evaluated our bi-level optimization approach on real-world and synthetic datasets, demonstrating significant improvements in model accuracy compared to existing methods while reducing communication overhead. This study contributes to PFL by establishing a solid theoretical foundation for the proposed method and offering a robust, ready-to-use framework that effectively addresses the challenges posed by non-IID data in FL.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI
Authors:
Xiaodan Xing,
Fadong Shi,
Jiahao Huang,
Yinzhe Wu,
Yang Nan,
Sheng Zhang,
Yingying Fang,
Mike Roberts,
Carola-Bibiane Schönlieb,
Javier Del Ser,
Guang Yang
Abstract:
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effe…
▽ More
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes.
Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scra**, now mixed with unlabeled synthetic data. This trend portends a future where generative AI systems may increasingly rely blindly on consuming self-generated data, raising concerns about model performance and ethical issues. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects?
There is a significant gap in the scientific literature regarding the impact of synthetic data use in generative AI, particularly in terms of the fusion of multimodal information. To address this research gap, this review investigates the consequences of integrating synthetic data blindly on training generative AI on both image and text modalities and explores strategies to mitigate these effects. The goal is to offer a comprehensive view of synthetic data's role, advocating for a balanced approach to its use and exploring practices that promote the sustainable development of generative AI technologies in the era of large models.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Superconformal Monodromy Defects in $\mathcal{N}$=4 SYM and LS theory
Authors:
Igal Arav,
Jerome P. Gauntlett,
Yusheng Jiao,
Matthew M. Roberts,
Christopher Rosen
Abstract:
We study type IIB supergravity solutions that are dual to two-dimensional superconformal defects in $d=4$ SCFTs which preserve $\mathcal{N}=(0,2)$ supersymmetry. We consider solutions dual to defects in $\mathcal{N}=4$ SYM theory that have non-trivial monodromy for $U(1)^3\subset SO(6)$ global symmetry and we also allow for the possibility of conical singularities. In addition, we consider the add…
▽ More
We study type IIB supergravity solutions that are dual to two-dimensional superconformal defects in $d=4$ SCFTs which preserve $\mathcal{N}=(0,2)$ supersymmetry. We consider solutions dual to defects in $\mathcal{N}=4$ SYM theory that have non-trivial monodromy for $U(1)^3\subset SO(6)$ global symmetry and we also allow for the possibility of conical singularities. In addition, we consider the addition of fermionic and bosonic mass terms that have non trivial dependence on the spatial directions transverse to the defect, while preserving the superconformal symmetry of the defect. We compute various physical quantities including the central charges of the defect expressed as a function of the monodromy, the on-shell action as well as associated supersymmetric Renyi entropies. Analogous computations are carried out for superconformal defects in the $\mathcal{N}=1$, $d=4$ Leigh-Strassler SCFT. We also show that the defects of the two SCFTs are connected by a line of bulk marginal mass deformations and argue that they are also related by bulk RG flow.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
On the Impact of Dark Matter Scattering on the Trajectory of High-Energy Cosmic Rays
Authors:
Stefano Profumo,
M. Grant Roberts,
Shashank Dharanibalan
Abstract:
We study the impact on the trajectory of high-energy cosmic-ray protons of scattering off the cosmic dark matter. We compute the scattering angle as a function of the cosmic-ray energy, of the dark matter mass, and of the interaction strength for a few representative choices for the relevant interaction cross section. We find that the typical deflection angle over the cosmic ray path is largely in…
▽ More
We study the impact on the trajectory of high-energy cosmic-ray protons of scattering off the cosmic dark matter. We compute the scattering angle as a function of the cosmic-ray energy, of the dark matter mass, and of the interaction strength for a few representative choices for the relevant interaction cross section. We find that the typical deflection angle over the cosmic ray path is largely independent of the dark matter mass. Given existing limits on the interaction strength, we compute the average deflection angle. We find that for large interaction cross sections and low cosmic ray energies, the predicted deflection angle is much larger than the angular resolution of very high-energy cosmic-ray observatories such as Pierre Auger.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Automatically Learning HTN Methods from Landmarks
Authors:
Ruoxi Li,
Dana Nau,
Mark Roberts,
Morgan Fine-Morris
Abstract:
Hierarchical Task Network (HTN) planning usually requires a domain engineer to provide manual input about how to decompose a planning problem. Even HTN-MAKER, a well-known method-learning algorithm, requires a domain engineer to annotate the tasks with information about what to learn. We introduce CURRICULAMA, an HTN method learning algorithm that completely automates the learning process. It uses…
▽ More
Hierarchical Task Network (HTN) planning usually requires a domain engineer to provide manual input about how to decompose a planning problem. Even HTN-MAKER, a well-known method-learning algorithm, requires a domain engineer to annotate the tasks with information about what to learn. We introduce CURRICULAMA, an HTN method learning algorithm that completely automates the learning process. It uses landmark analysis to compose annotated tasks and leverages curriculum learning to order the learning of methods from simpler to more complex. This eliminates the need for manual input, resolving a core issue with HTN-MAKER. We prove CURRICULAMA's soundness, and show experimentally that it has a substantially similar convergence rate in learning a complete set of methods to HTN-MAKER.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Optimized Model Selection for Estimating Treatment Effects from Costly Simulations of the US Opioid Epidemic
Authors:
Abdulrahman A. Ahmed,
M. Amin Rahimian,
Mark S. Roberts
Abstract:
Agent-based simulation with a synthetic population can help us compare different treatment conditions while kee** everything else constant within the same population (i.e., as digital twins). Such population-scale simulations require large computational power (i.e., CPU resources) to get accurate estimates for treatment effects. We can use meta models of the simulation results to circumvent the…
▽ More
Agent-based simulation with a synthetic population can help us compare different treatment conditions while kee** everything else constant within the same population (i.e., as digital twins). Such population-scale simulations require large computational power (i.e., CPU resources) to get accurate estimates for treatment effects. We can use meta models of the simulation results to circumvent the need to simulate every treatment condition. Selecting the best estimating model at a given sample size (number of simulation runs) is a crucial problem. Depending on the sample size, the ability of the method to estimate accurately can change significantly. In this paper, we discuss different methods to explore what model works best at a specific sample size. In addition to the empirical results, we provide a mathematical analysis of the MSE equation and how its components decide which model to select and why a specific method behaves that way in a range of sample sizes. The analysis showed why the direction estimation method is better than model-based methods in larger sample sizes and how the between-group variation and the within-group variation affect the MSE equation.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Goal-Oriented End-User Programming of Robots
Authors:
David Porfirio,
Mark Roberts,
Laura M. Hiatt
Abstract:
End-user programming (EUP) tools must balance user control with the robot's ability to plan and act autonomously. Many existing task-oriented EUP tools enforce a specific level of control, e.g., by requiring that users hand-craft detailed sequences of actions, rather than offering users the flexibility to choose the level of task detail they wish to express. We thereby created a novel EUP system,…
▽ More
End-user programming (EUP) tools must balance user control with the robot's ability to plan and act autonomously. Many existing task-oriented EUP tools enforce a specific level of control, e.g., by requiring that users hand-craft detailed sequences of actions, rather than offering users the flexibility to choose the level of task detail they wish to express. We thereby created a novel EUP system, Polaris, that in contrast to most existing EUP tools, uses goal predicates as the fundamental building block of programs. Users can thereby express high-level robot objectives or lower-level checkpoints at their choosing, while an off-the-shelf task planner fills in any remaining program detail. To ensure that goal-specified programs adhere to user expectations of robot behavior, Polaris is equipped with a Plan Visualizer that exposes the planner's output to the user before runtime. In what follows, we describe our design of Polaris and its evaluation with 32 human participants. Our results support the Plan Visualizer's ability to help users craft higher-quality programs. Furthermore, there are strong associations between user perception of the robot and Plan Visualizer usage, and evidence that robot familiarity has a key role in sha** user experience.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Considerations for End-User Development in the Caregiving Domain
Authors:
Laura Stegner,
David Porfirio,
Mark Roberts,
Laura M. Hiatt
Abstract:
As service robots become more capable of autonomous behaviors, it becomes increasingly important to consider how people communicate with a robot what task it should perform and how to do the task. Accordingly, there has been a rise in attention to end-user development (EUD) interfaces, which enable non-roboticist end users to specify tasks for autonomous robots to perform. However, state-of-the-ar…
▽ More
As service robots become more capable of autonomous behaviors, it becomes increasingly important to consider how people communicate with a robot what task it should perform and how to do the task. Accordingly, there has been a rise in attention to end-user development (EUD) interfaces, which enable non-roboticist end users to specify tasks for autonomous robots to perform. However, state-of-the-art EUD interfaces are often constrained through simplified domains or restrictive end-user interaction. Motivated by prior qualitative design work that explores how to integrate a care robot in an assisted living community, we discuss the challenges of EUD in this complex domain. One set of challenges stems from different user-facing representations, e.g., certain tasks may lend themselves better to rule-based trigger-action representations, whereas other tasks may be easier to specify via sequences of actions. The other stems from considering the needs of multiple stakeholders, e.g., caregivers and residents of the facility may all create tasks for the robot, but the robot may not be able to share information about all tasks with all residents due to privacy concerns. We present scenarios that illustrate these challenges and also discuss possible solutions.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Optimal transmission expansion minimally reduces decarbonization costs of U.S. electricity
Authors:
Rangrang Zheng,
Greg Schivley,
Patricia Hidalgo-Gonzalez,
Matthias Fripp,
Michael J. Roberts
Abstract:
Solar and wind power are cost-competitive with fossil fuels, yet their intermittent nature presents challenges. Significant temporal and geographic differences in land, wind, and solar resources suggest that long-distance transmission could be particularly beneficial. Using a detailed, open-source model, we analyze optimal transmission expansion jointly with storage, generation, and hourly operati…
▽ More
Solar and wind power are cost-competitive with fossil fuels, yet their intermittent nature presents challenges. Significant temporal and geographic differences in land, wind, and solar resources suggest that long-distance transmission could be particularly beneficial. Using a detailed, open-source model, we analyze optimal transmission expansion jointly with storage, generation, and hourly operations across the three primary interconnects in the United States. Transmission expansion offers far more benefits in a high-renewable system than in a system with mostly conventional generation. Yet while an optimal nationwide plan would have more than triple current interregional transmission, transmission decreases the cost of a 100% clean system by only 4% compared to a plan that relies solely on current transmission. Expanding capacity only within existing interconnects can achieve most of these savings. Adjustments to energy storage and generation mix can leverage the current interregional transmission infrastructure to build a clean power system at a reasonable cost.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Authors:
Arka Pal,
Deep Karkhanis,
Samuel Dooley,
Manley Roberts,
Siddartha Naidu,
Colin White
Abstract:
Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a r…
▽ More
Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a reduction of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we find that DPOP outperforms DPO and other fine-tuning procedures across a wide variety of datasets and downstream tasks, including datasets with high edit distances between completions. Furthermore, we find that the DPOP-tuned model outperforms the DPO-tuned model (all else equal) on benchmarks independent of the fine-tuning data, such as MT-Bench. Finally, using DPOP, we create and open-source Smaug-34B and Smaug-72B, with the latter becoming the first open-source LLM to surpass an average accuracy of 80% on the HuggingFace Open LLM Leaderboard.
△ Less
Submitted 3 July, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Human-Centric Goal Reasoning with Ripple-Down Rules
Authors:
Kenji Brameld,
Germán Castro,
Claude Sammut,
Mark Roberts,
David W. Aha
Abstract:
ActorSim is a goal reasoning framework developed at the Naval Research Laboratory. Originally, all goal reasoning rules were hand-crafted. This work extends ActorSim with the capability of learning by demonstration, that is, when a human trainer disagrees with a decision made by the system, the trainer can take over and show the system the correct decision. The learning component uses Ripple-Down…
▽ More
ActorSim is a goal reasoning framework developed at the Naval Research Laboratory. Originally, all goal reasoning rules were hand-crafted. This work extends ActorSim with the capability of learning by demonstration, that is, when a human trainer disagrees with a decision made by the system, the trainer can take over and show the system the correct decision. The learning component uses Ripple-Down Rules (RDR) to build new decision rules to correctly handle similar cases in the future. The system is demonstrated using the RoboCup Rescue Agent Simulation, which simulates a city-wide disaster, requiring emergency services, including fire, ambulance and police, to be dispatched to different sites to evacuate civilians from dangerous situations. The RDRs are implemented in a scripting language, FrameScript, which is used to mediate between ActorSim and the agent simulator. Using Ripple-Down Rules, ActorSim can scale to an order of magnitude more goals than the previous version.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
A 350-MHz Green Bank Telescope Survey of Unassociated Fermi LAT Sources: Discovery and Timing of Ten Millisecond Pulsars
Authors:
P. Bangale,
B. Bhattacharyya,
F. Camilo,
C. J. Clark,
I. Cognard,
M. E. DeCesar,
E. C. Ferrara,
P. Gentile,
L. Guillemot,
J. W. T. Hessels,
T. J. Johnson,
M. Kerr,
M. A. McLaughlin,
L. Nieder,
S. M. Ransom,
P. S. Ray,
M. S. E. Roberts,
J. Roy,
S. Sanpa-Arsa,
G. Theureau,
M. T. Wolff
Abstract:
We have searched for radio pulsations towards 49 Fermi Large Area Telescope (LAT) 1FGL Catalog $γ$-ray sources using the Green Bank Telescope at 350 MHz. We detected 18 millisecond pulsars (MSPs) in blind searches of the data; 10 of these were discoveries unique to our survey. Sixteen are binaries, with eight having short orbital periods $P_B < 1$ day. No radio pulsations from young pulsars were d…
▽ More
We have searched for radio pulsations towards 49 Fermi Large Area Telescope (LAT) 1FGL Catalog $γ$-ray sources using the Green Bank Telescope at 350 MHz. We detected 18 millisecond pulsars (MSPs) in blind searches of the data; 10 of these were discoveries unique to our survey. Sixteen are binaries, with eight having short orbital periods $P_B < 1$ day. No radio pulsations from young pulsars were detected, although three targets are coincident with apparently radio-quiet $γ$-ray pulsars discovered in LAT data. Here, we give an overview of the survey and present radio and $γ$-ray timing results for the 10 MSPs discovered. These include the only isolated MSP discovered in our survey and six short-$P_B$ binary MSPs. Of these, three have very low-mass companions ($M_c$ $\ll$ 0.1M$_{\odot}$) and hence belong to the class of black widow pulsars. Two have more massive, non-degenerate companions with extensive radio eclipses and orbitally modulated X-ray emission consistent with the redback class. Significant $γ$-ray pulsations have been detected from nine of the discoveries. This survey and similar efforts suggest that the majority of Galactic $γ$-ray sources at high Galactic latitudes are either MSPs or relatively nearby non-recycled pulsars, with the latter having on average a much smaller radio/$γ$-ray beaming ratio as compared to MSPs. It also confirms that past surveys suffered from an observational bias against finding short-$P_B$ MSP systems.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Asymptotics for the growth of the infinite-parent Spatial Lambda-Fleming-Viot model
Authors:
Apolline Louvet,
Matthew I. Roberts
Abstract:
The infinite-parent spatial Lambda-Fleming-Viot (SLFV) process is a model of random growth, in which a set evolves by the addition of balls according to points of an underlying Poisson point process, and which was recently introduced to study genetic diversity in spatially expanding populations. In this article, we give asymptotics for the location and depth of the moving interface, and identify t…
▽ More
The infinite-parent spatial Lambda-Fleming-Viot (SLFV) process is a model of random growth, in which a set evolves by the addition of balls according to points of an underlying Poisson point process, and which was recently introduced to study genetic diversity in spatially expanding populations. In this article, we give asymptotics for the location and depth of the moving interface, and identify the exact asymptotic scale of the transverse fluctuations of geodesics. Our proofs are based on a new representation of the infinite-parent SLFV in terms of chains of reproduction events, and on the study of the properties of a typical geodesic. Moreover, we show that our representation coincides with the alternative definitions of the process considered in the literature, subject to a simple condition on the initial state. Our results represent a novel development in the study of stochastic growth models, and also have consequences for the study of genetic diversity in expanding populations.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
The curious case of the test set AUROC
Authors:
Michael Roberts,
Alon Hazan,
Sören Dittmer,
James H. F. Rudd,
Carola-Bibiane Schönlieb
Abstract:
Whilst the size and complexity of ML models have rapidly and significantly increased over the past decade, the methods for assessing their performance have not kept pace. In particular, among the many potential performance metrics, the ML community stubbornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from trai…
▽ More
Whilst the size and complexity of ML models have rapidly and significantly increased over the past decade, the methods for assessing their performance have not kept pace. In particular, among the many potential performance metrics, the ML community stubbornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from training data) or (b) the sensitivity and specificity for the test data at an optimal threshold determined from the validation ROC. However, we argue that considering scores derived from the test ROC curve alone gives only a narrow insight into how a model performs and its ability to generalise.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Ultralight Dark Matter Search with Space-Time Separated Atomic Clocks and Cavities
Authors:
Melina Filzinger,
Ashlee R. Caddell,
Dhruv Jani,
Martin Steinel,
Leonardo Giani,
Nils Huntemann,
Benjamin M. Roberts
Abstract:
We devise and demonstrate a method to search for non-gravitational couplings of ultralight dark matter to standard model particles using space-time separated atomic clocks and cavity-stabilized lasers. By making use of space-time separated sensors, which probe different values of an oscillating dark matter field, we can search for couplings that cancel in typical local experiments. We demonstrate…
▽ More
We devise and demonstrate a method to search for non-gravitational couplings of ultralight dark matter to standard model particles using space-time separated atomic clocks and cavity-stabilized lasers. By making use of space-time separated sensors, which probe different values of an oscillating dark matter field, we can search for couplings that cancel in typical local experiments. We demonstrate this method using existing data from a frequency comparison of lasers stabilized to two optical cavities connected via a 2220 km fiber link [Nat. Commun. 13, 212 (2022)]. The absence of significant oscillations in the data results in constraints on the coupling of scalar dark matter to electrons, d_me, for masses between 1e-19 eV and 2e-15 eV. These are the first constraints on d_me alone in this mass range, and improve the dark matter constraints on any scalar-Fermion coupling by up to two orders of magnitude.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
New Horizons: Pioneering Pharmaceutical R&D with Generative AI from lab to the clinic -- an industry perspective
Authors:
Guy Doron,
Sam Genway,
Mark Roberts,
Sai Jasti
Abstract:
The rapid advance of generative AI is resha** the strategic vision for R&D across industries. The unique challenges of pharmaceutical R&D will see applications of generative AI deliver value along the entire value chain from early discovery to regulatory approval. This perspective reviews these challenges and takes a three-horizon approach to explore the generative AI applications already delive…
▽ More
The rapid advance of generative AI is resha** the strategic vision for R&D across industries. The unique challenges of pharmaceutical R&D will see applications of generative AI deliver value along the entire value chain from early discovery to regulatory approval. This perspective reviews these challenges and takes a three-horizon approach to explore the generative AI applications already delivering impact, the disruptive opportunities which are just around the corner, and the longer-term transformation which will shape the future of the industry. Selected applications are reviewed for their potential to drive increase productivity, accelerate timelines, improve the quality of research, data and decision making, and support a sustainable future for the industry. Recommendations are given for Pharma R&D leaders develo** a generative AI strategy today which will lay the groundwork for getting real value from the technology and safeguarding future growth. Generative AI is today providing new, efficient routes to accessing and combining organisational data to drive productivity. Next, this impact will reach clinical development, enhancing the patient experience, driving operational efficiency, and unlocking digital innovation to better tackle the future burden of disease. Looking to the furthest horizon, rapid acquisition of rich multi-omics data, which capture the 'language of life', in combination with next generation AI technologies will allow organisations to close the loop around phases of the pipeline through rapid, automated generation and testing of hypotheses from bench to bedside. This provides a vision for the future of R&D with sustainability at the core, with reduced timescales and reduced dependency on resources, while offering new hope to patients to treat the untreatable and ultimately cure diseases.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Classifying bi-invariant 2-forms on infinite-dimensional Lie groups
Authors:
David Michael Roberts
Abstract:
A bi-invariant differential 2-form on a Lie group G is a highly constrained object, being determined by purely linear data: an Ad-invariant alternating bilinear form on the Lie algebra of G. On a compact connected Lie group these have an known classification, in terms of de Rham cohomology, which is here generalised to arbitrary finite-dimensional Lie groups, at the cost of losing the connection t…
▽ More
A bi-invariant differential 2-form on a Lie group G is a highly constrained object, being determined by purely linear data: an Ad-invariant alternating bilinear form on the Lie algebra of G. On a compact connected Lie group these have an known classification, in terms of de Rham cohomology, which is here generalised to arbitrary finite-dimensional Lie groups, at the cost of losing the connection to cohomology. This expanded classification extends further to all Milnor regular infinite-dimensional Lie groups. I give some examples of (structured) diffeomorphism groups to which the result on bi-invariant forms applies. For symplectomorphism and volume-preserving diffeomorphism groups the spaces of bi-invariant 2-forms are finite-dimensional, and related to the de Rham cohomology of the original compact manifold. In the particular case of the infinite-dimensional projective unitary group PU(H) the classification invalidates an assumption made by Mathai and the author about a certain 2-form on this Banach Lie group.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Data Contamination Through the Lens of Time
Authors:
Manley Roberts,
Himanshu Thakur,
Christine Herlihy,
Colin White,
Samuel Dooley
Abstract:
Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks. Since LLMs train on wide swaths of the internet, this practice raises concerns of data contamination, i.e., evaluating on examples that are explicitly or implicitly included in the training data. Data contamination remains notoriously challenging to measure…
▽ More
Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks. Since LLMs train on wide swaths of the internet, this practice raises concerns of data contamination, i.e., evaluating on examples that are explicitly or implicitly included in the training data. Data contamination remains notoriously challenging to measure and mitigate, even with partial attempts like controlled experimentation of training data, canary strings, or embedding similarities. In this work, we conduct the first thorough longitudinal analysis of data contamination in LLMs by using the natural experiment of training cutoffs in GPT models to look at benchmarks released over time. Specifically, we consider two code/mathematical problem-solving datasets, Codeforces and Project Euler, and find statistically significant trends among LLM pass rate vs. GitHub popularity and release date that provide strong evidence of contamination. By open-sourcing our dataset, raw results, and evaluation framework, our work paves the way for rigorous analyses of data contamination in modern models. We conclude with a discussion of best practices and future steps for publicly releasing benchmarks in the age of LLMs that train on webscale data.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Recent Methodological Advances in Federated Learning for Healthcare
Authors:
Fan Zhang,
Daniel Kreuter,
Yichen Chen,
Sören Dittmer,
Samuel Tull,
Tolou Shadbahr,
BloodCounts! Collaboration,
Jacobus Preller,
James H. F. Rudd,
John A. D. Aston,
Carola-Bibiane Schönlieb,
Nicholas Gleadall,
Michael Roberts
Abstract:
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy or logistical concerns. Federated learning allows for the utilisation of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges which require new methodologies to address, such as highly-siloed data, class imbala…
▽ More
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy or logistical concerns. Federated learning allows for the utilisation of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges which require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts and non-standardised variables. Federated learning adds significant methodological complexity to conventional centralised machine learning, requiring distributed optimisation, communication between nodes, aggregation of models and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers which fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
The development of HISPEC for Keck and MODHIS for TMT: science cases and predicted sensitivities
Authors:
Quinn M. Konopacky,
Ashley D. Baker,
Dimitri Mawet,
Michael P. Fitzgerald,
Nemanja Jovanovic,
Charles Beichman,
Garreth Ruane,
Rob Bertz,
Hiroshi Terada,
Richard Dekany,
Larry Lingvay,
Marc Kassis,
David Anderson,
Motohide Tamura,
Bjorn Benneke,
Thomas Beatty,
Tuan Do,
Shogo Nishiyama,
Peter Plavchan,
Jason Wang,
Ji Wang,
Adam Burgasser,
Jean-Baptiste Ruffio,
Huihao Zhang,
Aaron Brown
, et al. (50 additional authors not shown)
Abstract:
HISPEC is a new, high-resolution near-infrared spectrograph being designed for the W.M. Keck II telescope. By offering single-shot, R=100,000 between 0.98 - 2.5 um, HISPEC will enable spectroscopy of transiting and non-transiting exoplanets in close orbits, direct high-contrast detection and spectroscopy of spatially separated substellar companions, and exoplanet dynamical mass and orbit measureme…
▽ More
HISPEC is a new, high-resolution near-infrared spectrograph being designed for the W.M. Keck II telescope. By offering single-shot, R=100,000 between 0.98 - 2.5 um, HISPEC will enable spectroscopy of transiting and non-transiting exoplanets in close orbits, direct high-contrast detection and spectroscopy of spatially separated substellar companions, and exoplanet dynamical mass and orbit measurements using precision radial velocity monitoring calibrated with a suite of state-of-the-art absolute and relative wavelength references. MODHIS is the counterpart to HISPEC for the Thirty Meter Telescope and is being developed in parallel with similar scientific goals. In this proceeding, we provide a brief overview of the current design of both instruments, and the requirements for the two spectrographs as guided by the scientific goals for each. We then outline the current science case for HISPEC and MODHIS, with focuses on the science enabled for exoplanet discovery and characterization. We also provide updated sensitivity curves for both instruments, in terms of both signal-to-noise ratio and predicted radial velocity precision.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Estimating Treatment Effects Using Costly Simulation Samples from a Population-Scale Model of Opioid Use Disorder
Authors:
Abdulrahman A. Ahmed,
M. Amin Rahimian,
Mark S. Roberts
Abstract:
Large-scale models require substantial computational resources for analysis and studying treatment conditions. Specifically, estimating treatment effects using simulations may require a lot of infeasible resources to allocate at every treatment condition. Therefore, it is essential to develop efficient methods to allocate computational resources for estimating treatment effects. Agent-based simula…
▽ More
Large-scale models require substantial computational resources for analysis and studying treatment conditions. Specifically, estimating treatment effects using simulations may require a lot of infeasible resources to allocate at every treatment condition. Therefore, it is essential to develop efficient methods to allocate computational resources for estimating treatment effects. Agent-based simulation allows us to generate highly realistic simulation samples. FRED (A Framework for Reconstructing Epidemiological Dynamics) is an agent-based modeling system with a geospatial perspective using a synthetic population constructed based on the U.S. census data. Given its synthetic population, FRED simulations present a baseline for comparable results from different treatment conditions and treatment conditions. In this paper, we show three other methods for estimating treatment effects. In the first method, we resort to brute-force allocation, where all treatment conditions have an equal number of samples with a relatively large number of simulation runs. In the second method, we try to reduce the number of simulation runs by customizing individual samples required for each treatment effect based on the width of confidence intervals around the mean estimates. In the third method, we use a regression model, which allows us to learn across the treatment conditions such that simulation samples allocated for a treatment condition will help better estimate treatment effects in other conditions. We show that the regression-based methods result in a comparable estimate of treatment effects with less computational resources. The reduced variability and faster convergence of model-based estimates come at the cost of increased bias, and the bias-variance trade-off can be controlled by adjusting the number of model parameters (e.g., including higher-order interaction terms in the regression model).
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Giraffe: Adventures in Expanding Context Lengths in LLMs
Authors:
Arka Pal,
Deep Karkhanis,
Manley Roberts,
Samuel Dooley,
Arvind Sundararajan,
Siddartha Naidu
Abstract:
Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of w…
▽ More
Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding.
We test these methods using three new evaluation tasks (FreeFormQA, AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to be less fine-grained as a measure of long context performance of LLMs. We release the three tasks publicly as datasets on HuggingFace. We discover that linear scaling is the best method for extending context length, and show that further gains can be achieved by using longer scales at evaluation time. We also discover promising extrapolation capabilities in the truncated basis. To support further research in this area, we release three new 13B parameter long-context models which we call Giraffe: 4k and 16k context models trained from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We also release the code to replicate our results.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
REFORMS: Reporting Standards for Machine Learning Based Science
Authors:
Sayash Kapoor,
Emily Cantrell,
Kenny Peng,
Thanh Hien Pham,
Christopher A. Bail,
Odd Erik Gundersen,
Jake M. Hofman,
Jessica Hullman,
Michael A. Lones,
Momin M. Malik,
Priyanka Nanayakkara,
Russell A. Poldrack,
Inioluwa Deborah Raji,
Michael Roberts,
Matthew J. Salganik,
Marta Serra-Garcia,
Brandon M. Stewart,
Gilles Vandewiele,
Arvind Narayanan
Abstract:
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways acros…
▽ More
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
△ Less
Submitted 19 September, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Reinterpreting survival analysis in the universal approximator age
Authors:
Sören Dittmer,
Michael Roberts,
Jacobus Preller,
AIX COVNET,
James H. F. Rudd,
John A. D. Aston,
Carola-Bibiane Schönlieb
Abstract:
Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival ana…
▽ More
Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival analysis in deep learning. On the one hand, we discuss how survival analysis connects to classification and regression. On the other hand, we provide technical tools. We provide a new loss function, evaluation metrics, and the first universal approximating network that provably produces survival curves without numeric integration. We show that the loss function and model outperform other approaches using a large numerical study.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Inferring epidemic dynamics using Gaussian process emulation of agent-based simulations
Authors:
Abdulrahman A. Ahmed,
M. Amin Rahimian,
Mark S. Roberts
Abstract:
Computational models help decision makers understand epidemic dynamics to optimize public health interventions. Agent-based simulation of disease spread in synthetic populations allows us to compare and contrast different effects across identical populations or to investigate the effect of interventions kee** every other factor constant between ``digital twins''. FRED (A Framework for Reconstruc…
▽ More
Computational models help decision makers understand epidemic dynamics to optimize public health interventions. Agent-based simulation of disease spread in synthetic populations allows us to compare and contrast different effects across identical populations or to investigate the effect of interventions kee** every other factor constant between ``digital twins''. FRED (A Framework for Reconstructing Epidemiological Dynamics) is an agent-based modeling system with a geo-spatial perspective using a synthetic population that is constructed based on the U.S. census data. In this paper, we show how Gaussian process regression can be used on FRED-synthesized data to infer the differing spatial dispersion of the epidemic dynamics for two disease conditions that start from the same initial conditions and spread among identical populations. Our results showcase the utility of agent-based simulation frameworks such as FRED for inferring differences between conditions where controlling for all confounding factors for such comparisons is next to impossible without synthetic data.
△ Less
Submitted 11 September, 2023; v1 submitted 22 July, 2023;
originally announced July 2023.
-
Dis-AE: Multi-domain & Multi-task Generalisation on Real-World Clinical Data
Authors:
Daniel Kreuter,
Samuel Tull,
Julian Gilbey,
Jacobus Preller,
BloodCounts! Consortium,
John A. D. Aston,
James H. F. Rudd,
Suthesh Sivapalaratnam,
Carola-Bibiane Schönlieb,
Nicholas Gleadall,
Michael Roberts
Abstract:
Clinical data is often affected by clinically irrelevant factors such as discrepancies between measurement devices or differing processing methods between sites. In the field of machine learning (ML), these factors are known as domains and the distribution differences they cause in the data are known as domain shifts. ML models trained using data from one domain often perform poorly when applied t…
▽ More
Clinical data is often affected by clinically irrelevant factors such as discrepancies between measurement devices or differing processing methods between sites. In the field of machine learning (ML), these factors are known as domains and the distribution differences they cause in the data are known as domain shifts. ML models trained using data from one domain often perform poorly when applied to data from another domain, potentially leading to wrong predictions. As such, develo** machine learning models that can generalise well across multiple domains is a challenging yet essential task in the successful application of ML in clinical practice. In this paper, we propose a novel disentangled autoencoder (Dis-AE) neural network architecture that can learn domain-invariant data representations for multi-label classification of medical measurements even when the data is influenced by multiple interacting domain shifts at once. The model utilises adversarial training to produce data representations from which the domain can no longer be determined. We evaluate the model's domain generalisation capabilities on synthetic datasets and full blood count (FBC) data from blood donors as well as primary and secondary care patients, showing that Dis-AE improves model generalisation on multiple domains simultaneously while preserving clinically relevant information.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Algorithmic Censoring in Dynamic Learning Systems
Authors:
Jennifer Chien,
Margaret Roberts,
Berk Ustun
Abstract:
Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties…
▽ More
Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties in detection. We consider safeguards against censoring - recourse and randomized-exploration - both of which ensure we collect labels for points that would otherwise go unobserved. The resulting techniques allow examples from censored groups to enter into the training data and correct the model. Our results highlight the otherwise unmeasured harms of censoring and demonstrate the effectiveness of mitigation strategies across a range of data generating processes.
△ Less
Submitted 29 June, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Analog gravity and the continuum effective theory of the graphene tight binding lattice model
Authors:
Matthew M. Roberts,
Toby Wiseman
Abstract:
We consider the tight-binding model of graphene with slowly spatially varying hop** functions. We develop a low energy approximation as a derivative expansion in a Dirac spinor that is perturbative in the hop** function deformation. The leading description is the Dirac equation in flat 2+1-d spacetime with (strain-)gauge field. Prior work considered subleading corrections written as non-trivia…
▽ More
We consider the tight-binding model of graphene with slowly spatially varying hop** functions. We develop a low energy approximation as a derivative expansion in a Dirac spinor that is perturbative in the hop** function deformation. The leading description is the Dirac equation in flat 2+1-d spacetime with (strain-)gauge field. Prior work considered subleading corrections written as non-trivial frame and spin connection terms. We previously argued that such corrections cannot be considered consistently without taking all the terms at the same order of approximation, which due to the unconventional power counting originating from the large gauge field, involve also higher covariant derivative terms. Here we confirm this, explicitly computing subleading terms. To the order we explore, the theory is elegantly determined by the gauge field and frame, both given by the hop** functions, the torsion free spin connection of the frame, together with coefficients for the higher derivative terms derived from lattice invariants. For the first time we compute the metric that the Dirac field sees - the `electrometric' - to quadratic order in the deformation allowing us to describe the subleading corrections to the dispersion relation for inhomogeneous deformations originating from corrections to the frame. Focussing on in-plane inhomogeneous strain, we use a simple model to relate the hop** functions to the strain field, finding the electrometric becomes curved at this quadratic order. Thus this lattice model yields an effective analog gravity description as a curved space Dirac theory, with large magnetic field, and Lorentz violating higher covariant derivative terms. We check this by comparison to numerical diagonalization. From this we conjecture a form for the effective theory for monolayer graphene in terms of the strain tensor, consistent up to quadratic order in the deformation.
△ Less
Submitted 15 August, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
RAAD: LIGHT-1 CubeSat's Payload for the Detection of Terrestrial Gamma-Ray Flashes
Authors:
A. Di Giovanni,
F. Arneodo,
A. Al Qasim,
H. Alblooshi,
F. AlKhouri,
L. Alkindi,
A. AlMannei,
M. L. Benabderrahmane,
G. Bruno,
V. Conicella,
O. Fawwaz,
G. Franchi,
S. Kalos,
P. Oikonomou,
L. Perillo,
C. Pittori,
M. S. Roberts,
R. Torres
Abstract:
The Rapid Acquisition Atmospheric Detector (RAAD), onboard the LIGHT-1 3U CubeSat, detects photons between hard X-rays and soft gamma-rays, in order to identify and characterize Terrestrial Gamma Ray Flashes (TGFs). Three detector configurations are tested, making use of Cerium Bromide and Lanthanum BromoChloride scintillating crystals coupled to photomultiplier tubes or Multi-Pixel Photon Counter…
▽ More
The Rapid Acquisition Atmospheric Detector (RAAD), onboard the LIGHT-1 3U CubeSat, detects photons between hard X-rays and soft gamma-rays, in order to identify and characterize Terrestrial Gamma Ray Flashes (TGFs). Three detector configurations are tested, making use of Cerium Bromide and Lanthanum BromoChloride scintillating crystals coupled to photomultiplier tubes or Multi-Pixel Photon Counters, in order to identify the optimal combination for TGF detection. High timing resolution, a short trigger window, and the short decay time of its electronics allow RAAD to perform accurate measurements of prompt, transient events. Here we describe the overview of the detection concept, the development of the front-end acquisition electronics, as well as the ground testing and simulation the payload underwent prior to its launch on December 21st, 2021. We further present an analysis of the detector's in-orbit system behavior and some preliminary results.
△ Less
Submitted 16 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Accurate electron-recoil ionization factors for dark matter direct detection in xenon, krypton and argon
Authors:
A. R. Caddell,
V. V. Flambaum,
B. M. Roberts
Abstract:
While most scintillation-based dark matter experiments search for Weakly Interacting Massive Particles (WIMPs), a sub-GeV WIMP-like particle may also be detectable in these experiments. While dark matter of this type and scale would not leave appreciable nuclear recoil signals, it may instead induce ionization of atomic electrons. Accurate modelling of the atomic wavefunctions is key to investigat…
▽ More
While most scintillation-based dark matter experiments search for Weakly Interacting Massive Particles (WIMPs), a sub-GeV WIMP-like particle may also be detectable in these experiments. While dark matter of this type and scale would not leave appreciable nuclear recoil signals, it may instead induce ionization of atomic electrons. Accurate modelling of the atomic wavefunctions is key to investigating this possibility, with incorrect treatment leading to a large suppression in the atomic excitation factors. We have calculated these atomic factors for argon, krypton and xenon and present the tabulated results for use with a range of dark matter models. This is made possible by the separability of the atomic and dark matter form factor, allowing the atomic factors to be calculated for general couplings; we include tables for vector, scalar, pseudovector, and pseudoscalar electron couplings. Additionally, we calculate electron impact total ionization cross sections for xenon using the tabulated results as a test of accuracy. Lastly, we provide an example calculation of the event rate for dark matter scattering on electrons in XENON1T and show that these calculations depend heavily on how the low-energy response of the detector is modelled.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
open-UST: An Open-Source Ultrasound Tomography Transducer Array System
Authors:
Morgan Roberts,
Eleanor Martin,
Michael D. Brown,
Ben T. Cox,
Bradley E. Treeby
Abstract:
Fast imaging methods are needed to promote widespread clinical adoption of Ultrasound Tomography (UST), and more widely available UST hardware could support the experimental validation of new measurement configurations. In this work, an open-source 256-element transducer ring array was developed (morganjroberts.github.io/open-UST) and manufactured using rapid prototy**, for only £2k. Novel manuf…
▽ More
Fast imaging methods are needed to promote widespread clinical adoption of Ultrasound Tomography (UST), and more widely available UST hardware could support the experimental validation of new measurement configurations. In this work, an open-source 256-element transducer ring array was developed (morganjroberts.github.io/open-UST) and manufactured using rapid prototy**, for only £2k. Novel manufacturing techniques were used, resulting in a 1.17$^{\circ}$ mean beam axis skew angle, a 104 $μ$m mean element position error, and a $\pm$13.6 $μ$m deviation in matching layer thickness. The nominal acoustic performance was measured using hydrophone scans and watershot data, and the 61.2 dB SNR, 55.4$^{\circ}$ opening angle, 16.3 mm beamwidth and 54% transmit-receive bandwidth (-12 dB), were found to be similar to existing systems, and compatible with full waveform inversion reconstruction methods. The inter-element variation in acoustic performance was typically <10% without using normalisation, meaning that the elements can be modelled identically during image reconstruction, removing the need for individual source definitions based on hydrophone measurements. Finally, data from a phantom experiment was successfully reconstructed. These results demonstrate that the open-UST system is accessible for users, and suitable for UST imaging research.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Neutron star mass estimates from gamma-ray eclipses in spider millisecond pulsar binaries
Authors:
C. J. Clark,
M. Kerr,
E. D. Barr,
B. Bhattacharyya,
R. P. Breton,
P. Bruel,
F. Camilo,
W. Chen,
I. Cognard,
H. T. Cromartie,
J. Deneva,
V. S. Dhillon,
L. Guillemot,
M. R. Kennedy,
M. Kramer,
A. G. Lyne,
D. Mata Sánchez,
L. Nieder,
C. Phillips,
S. M. Ransom,
P. S. Ray,
M. S. E. Roberts,
J. Roy,
D. A. Smith,
R. Spiewak
, et al. (4 additional authors not shown)
Abstract:
Reliable neutron star mass measurements are key to determining the equation-of-state of cold nuclear matter, but these are rare. "Black Widows" and "Redbacks" are compact binaries consisting of millisecond pulsars and semi-degenerate companion stars. Spectroscopy of the optically bright companions can determine their radial velocities, providing inclination-dependent pulsar mass estimates. While i…
▽ More
Reliable neutron star mass measurements are key to determining the equation-of-state of cold nuclear matter, but these are rare. "Black Widows" and "Redbacks" are compact binaries consisting of millisecond pulsars and semi-degenerate companion stars. Spectroscopy of the optically bright companions can determine their radial velocities, providing inclination-dependent pulsar mass estimates. While inclinations can be inferred from subtle features in optical light curves, such estimates may be systematically biased due to incomplete heating models and poorly-understood variability. Using data from the Fermi Large Area Telescope, we have searched for gamma-ray eclipses from 49 spider systems, discovering significant eclipses in 7 systems, including the prototypical black widow PSR B1957$+$20. Gamma-ray eclipses require direct occultation of the pulsar by the companion, and so the detection, or significant exclusion, of a gamma-ray eclipse strictly limits the binary inclination angle, providing new robust, model-independent pulsar mass constraints. For PSR B1957$+$20, the eclipse implies a much lighter pulsar ($M_{\rm psr} = 1.81 \pm 0.07\,M_{\odot}$) than inferred from optical light curve modelling.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
QED radiative corrections to electric dipole amplitudes in heavy atoms
Authors:
C. J. Fairhall,
B. M. Roberts,
J. S. M. Ginges
Abstract:
We use the radiative potential method to perform a detailed study of quantum electrodynamics (QED) radiative corrections to electric dipole (E1) transition amplitudes in heavy alkali-metal atoms Rb, Cs, Fr, and alkali-metal-like ions Sr+, Ba+, and Ra+. The validity of the method is checked by comparing with the results of rigorous QED in simple atomic potentials. We study the effects of core relax…
▽ More
We use the radiative potential method to perform a detailed study of quantum electrodynamics (QED) radiative corrections to electric dipole (E1) transition amplitudes in heavy alkali-metal atoms Rb, Cs, Fr, and alkali-metal-like ions Sr+, Ba+, and Ra+. The validity of the method is checked by comparing with the results of rigorous QED in simple atomic potentials. We study the effects of core relaxation, polarization of the core by the E1 field, and valence-core correlations on QED, which are shown to be important in some cases. We identify several transitions for which the QED contribution exceeds the deviation between atomic theory and experiment.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Experimental and theoretical study of dynamic polarizabilities in the $5S_{1/2}$-$5D_{5/2}$ clock transition in rubidium-87 and determination of E1 matrix elements
Authors:
Rhona Hamilton,
Benjamin M. Roberts,
Sarah K. Scholten,
Clayton Locke,
Andre N. Luiten,
Jacinda S. M. Ginges,
Christopher Perrella
Abstract:
The interaction between light and an atom causes perturbations in the atom's energy levels, known as the light-shift. These light-shifts are a key source of inaccuracy in atomic clocks, and can also deteriorate their precision. We present a study of light-shifts and associated dynamic polarizabilities for a two-photon atomic clock based on the $5S_{1/2}$-$5D_{5/2}$ transition in rubidium-87 over t…
▽ More
The interaction between light and an atom causes perturbations in the atom's energy levels, known as the light-shift. These light-shifts are a key source of inaccuracy in atomic clocks, and can also deteriorate their precision. We present a study of light-shifts and associated dynamic polarizabilities for a two-photon atomic clock based on the $5S_{1/2}$-$5D_{5/2}$ transition in rubidium-87 over the range 770 nm to 800 nm. We determine experimental and theoretical values for a magic wavelength in this range and the electric dipole (E1) matrix element for the $5P_{3/2}$-$5D_{5/2}$ transition. We find a magic wavelength of 776.179(5) nm (experimental) and 776.21 nm (theoretical) in the vicinity of the $5P_{3/2}$-$5D_{5/2}$ resonance, and the corresponding reduced E1 matrix element 1.80(6) $ea_0$ (experimental) and 1.96(15) $ea_0$ (theoretical). These values resolve a previous discrepancy between theory and experiment.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Electric dipole transition amplitudes for atoms and ions with one valence electron
Authors:
B. M. Roberts,
C. J. Fairhall,
J. S. M. Ginges
Abstract:
Motivated by recent measurements for several alkali-metal atoms and alkali-metal-like ions, we perform a detailed study of electric dipole (E1) transition amplitudes in K, Ca+, Rb, Sr+, Cs, Ba+, Fr, and Ra+, which are of interest for studies of atomic parity violation, electric dipole moments, and polarizabilities. Using the all-orders correlation potential method, we perform high-precision calcul…
▽ More
Motivated by recent measurements for several alkali-metal atoms and alkali-metal-like ions, we perform a detailed study of electric dipole (E1) transition amplitudes in K, Ca+, Rb, Sr+, Cs, Ba+, Fr, and Ra+, which are of interest for studies of atomic parity violation, electric dipole moments, and polarizabilities. Using the all-orders correlation potential method, we perform high-precision calculations of E1 transition amplitudes between low-lying s, p, and d states. We perform a robust error analysis, and compare our calculations to many amplitudes for which there are high-precision experimental determinations. We find excellent agreement, with deviations at the level of ~0.1%. We also compare our results to other theoretical evaluations, and discuss the implications for uncertainty analyses. Further, combining calculations of branching ratios with recent measurements, we extract high-precision values for several E1 amplitudes of Ca+, Sr+, Cs, Fr, and Ra+.
△ Less
Submitted 6 March, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
High-Resolution Radio Study of the Dragonfly Nebula
Authors:
Ruolan **,
C. -Y. Ng,
Mallory S. E. Roberts,
Kwan-Lok Li
Abstract:
The Dragonfly Nebula (G75.2$+$0.1) powered by the young pulsar J2021$+$3651 is a rare pulsar wind nebula (PWN) that shows double tori and polar jets enclosed by a bow-shock structure in X-rays. We present new radio observations of this source taken with the Very Large Array (VLA) at 6 GHz. The radio PWN has an overall size about two times as large as the X-ray counterpart, consisting of a bright m…
▽ More
The Dragonfly Nebula (G75.2$+$0.1) powered by the young pulsar J2021$+$3651 is a rare pulsar wind nebula (PWN) that shows double tori and polar jets enclosed by a bow-shock structure in X-rays. We present new radio observations of this source taken with the Very Large Array (VLA) at 6 GHz. The radio PWN has an overall size about two times as large as the X-ray counterpart, consisting of a bright main body region in the southwest, a narrow and fainter bridge region in the northeast, and a dark gap in between. The nebula shows a radio spectrum much softer than that of a typical PWN. This could be resulting from compression by the ram pressure as the system travels mildly supersonically in the interstellar medium (ISM). Our polarization maps reveal a highly ordered and complex $B$-field structure. This can be explained by a toroidal field distorted by the pulsar motion.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Estimating defection in subscription-type markets: empirical analysis from the scholarly publishing industry
Authors:
Michael Roberts,
J. Ignacio Deza,
Hisham Ihshaish,
Yanhui Zhu
Abstract:
We present the first empirical study on customer churn prediction in the scholarly publishing industry. The study examines our proposed method for prediction on a customer subscription data over a period of 6.5 years, which was provided by a major academic publisher. We explore the subscription-type market within the context of customer defection and modelling, and provide analysis of the business…
▽ More
We present the first empirical study on customer churn prediction in the scholarly publishing industry. The study examines our proposed method for prediction on a customer subscription data over a period of 6.5 years, which was provided by a major academic publisher. We explore the subscription-type market within the context of customer defection and modelling, and provide analysis of the business model of such markets, and how these characterise the academic publishing business. The proposed method for prediction attempts to provide inference of customer's likelihood of defection on the basis of their re-sampled use of provider resources -in this context, the volume and frequency of content downloads. We show that this approach can be both accurate as well as uniquely useful in the business-to-business context, with which the scholarly publishing business model shares similarities. The main findings of this work suggest that whilst all predictive models examined, especially ensemble methods of machine learning, achieve substantially accurate prediction of churn, nearly a year ahead, this can be furthermore achieved even when the specific behavioural attributes that can be associated to each customer probability to churn are overlooked. Allowing as such highly accurate inference of churn from minimal possible data. We show that modelling churn on the basis of re-sampling customers' use of resources over subscription time is a better (simplified) approach than when considering the high granularity that can often characterise consumption behaviour.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Phase II of the Keck Planet Imager and Characterizer: system-level laboratory characterization and preliminary on-sky commissioning
Authors:
Daniel Echeverri,
Nemanja Jovanovic,
Jacques-Robert Delorme,
Yinzi Xin,
Tobias Schofield,
Luke Finnerty,
Jason J. Wang,
Jerry Xuan,
Dimitri Mawet,
Ashley Baker,
Randall Bartos,
Charlotte Z. Bond,
Marta L. Bryan,
Benjamin Calvin,
Sylvain Cetre,
Greg Doppmann,
Michael P. Fitzgerald,
Jason Fucik,
Katelyn Horstman,
Ronald Lopez,
Emily C. Martin,
Stefan Martin,
Bertrand Mennesson,
Evan Morris,
Reston Nash
, et al. (13 additional authors not shown)
Abstract:
The Keck Planet Imager and Characterizer (KPIC) is a series of upgrades for the Keck II Adaptive Optics (AO) system and the NIRSPEC spectrograph to enable diffraction-limited, high-resolution ($R>30,000$) spectroscopy of exoplanets and low-mass companions in the K and L bands. Phase I consisted of single-mode fiber injection/extraction units (FIU/FEU) used in conjunction with an H-band pyramid wav…
▽ More
The Keck Planet Imager and Characterizer (KPIC) is a series of upgrades for the Keck II Adaptive Optics (AO) system and the NIRSPEC spectrograph to enable diffraction-limited, high-resolution ($R>30,000$) spectroscopy of exoplanets and low-mass companions in the K and L bands. Phase I consisted of single-mode fiber injection/extraction units (FIU/FEU) used in conjunction with an H-band pyramid wavefront sensor. Phase II, deployed and commissioned in 2022, adds a 1000-actuator deformable mirror, beam-sha** optics, a vortex coronagraph, and other upgrades to the FIU/FEU. The use of single-mode fibers provides a gain in stellar rejection, a substantial reduction in sky background, and an extremely stable line-spread function on the spectrograph.
In this paper we present the results of extensive system-level laboratory testing and characterization showing the instrument's Phase II throughput, stability, repeatability, and other key performance metrics prior to delivery and during installation at Keck. We also demonstrate the capabilities of the various observing modes enabled by the new system modules using internal test light sources. Finally, we show preliminary results of on-sky tests performed in the first few months of Phase II commissioning along with the next steps for the instrument.
Once commissioning of Phase II is complete, KPIC will continue to characterize exoplanets at an unprecedented spectral resolution, thereby growing its already successful track record of 23 detected exoplanets and brown dwarfs from Phase I. Using the new vortex fiber nulling (VFN) mode, Phase II will also be able to search for exoplanets at small angular separations less than 45 milliarcseconds which conventional coronagraphs cannot reach.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Navigating the challenges in creating complex data systems: a development philosophy
Authors:
Sören Dittmer,
Michael Roberts,
Julian Gilbey,
Ander Biguri,
AIX-COVNET Collaboration,
Jacobus Preller,
James H. F. Rudd,
John A. D. Aston,
Carola-Bibiane Schönlieb
Abstract:
In this perspective, we argue that despite the democratization of powerful tools for data science and machine learning over the last decade, develo** the code for a trustworthy and effective data science system (DSS) is getting harder. Perverse incentives and a lack of widespread software engineering (SE) skills are among many root causes we identify that naturally give rise to the current syste…
▽ More
In this perspective, we argue that despite the democratization of powerful tools for data science and machine learning over the last decade, develo** the code for a trustworthy and effective data science system (DSS) is getting harder. Perverse incentives and a lack of widespread software engineering (SE) skills are among many root causes we identify that naturally give rise to the current systemic crisis in reproducibility of DSSs. We analyze why SE and building large complex systems is, in general, hard. Based on these insights, we identify how SE addresses those difficulties and how we can apply and generalize SE methods to construct DSSs that are fit for purpose. We advocate two key development philosophies, namely that one should incrementally grow -- not biphasically plan and build -- DSSs, and one should always employ two types of feedback loops during development: one which tests the code's correctness and another that evaluates the code's efficacy.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Understanding CNN Fragility When Learning With Imbalanced Data
Authors:
Damien Dablain,
Kristen N. Jacobson,
Colin Bellinger,
Mark Roberts,
Nitesh Chawla
Abstract:
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a blackbox. To demystify CNN decisions on imbalanced data,…
▽ More
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a blackbox. To demystify CNN decisions on imbalanced data, we focus on their latent features. Although CNNs embed the pattern knowledge learned from a training set in model parameters, the effect of this knowledge is contained in feature and classification embeddings (FE and CE). These embeddings can be extracted from a trained model and their global, class properties (e.g., frequency, magnitude and identity) can be analyzed. We find that important information regarding the ability of a neural network to generalize to minority classes resides in the class top-K CE and FE. We show that a CNN learns a limited number of class top-K CE per category, and that their number and magnitudes vary based on whether the same class is balanced or imbalanced. This calls into question whether a CNN has learned intrinsic class features, or merely frequently occurring ones that happen to exist in the sampled class distribution. We also hypothesize that latent class diversity is as important as the number of class examples, which has important implications for re-sampling and cost-sensitive methods. These methods generally focus on rebalancing model weights, class numbers and margins; instead of diversifying class latent features through augmentation. We also demonstrate that a CNN has difficulty generalizing to test data if the magnitude of its top-K latent features do not match the training set. We use three popular image datasets and two cost-sensitive algorithms commonly employed in imbalanced learning for our experiments.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Retrospectives on the Embodied AI Workshop
Authors:
Matt Deitke,
Dhruv Batra,
Yonatan Bisk,
Tommaso Campari,
Angel X. Chang,
Devendra Singh Chaplot,
Changan Chen,
Claudia Pérez D'Arpino,
Kiana Ehsani,
Ali Farhadi,
Li Fei-Fei,
Anthony Francis,
Chuang Gan,
Kristen Grauman,
David Hall,
Winson Han,
Unnat Jain,
Aniruddha Kembhavi,
Jacob Krantz,
Stefan Lee,
Chengshu Li,
Sagnik Majumder,
Oleksandr Maksymets,
Roberto Martín-Martín,
Roozbeh Mottaghi
, et al. (14 additional authors not shown)
Abstract:
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of…
▽ More
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
△ Less
Submitted 4 December, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Rigid models for 2-gerbes I: Chern-Simons geometry
Authors:
David Michael Roberts,
Raymond F. Vozzo
Abstract:
Motivated by the problem of constructing explicit geometric string structures, we give a rigid model for bundle 2-gerbes, and define connective structures thereon. This model is designed to make explicit calculations easier, for instance in applications to physics. To compare to the existing definition, we give a functorial construction of a bundle 2-gerbe as in the literature from our rigid model…
▽ More
Motivated by the problem of constructing explicit geometric string structures, we give a rigid model for bundle 2-gerbes, and define connective structures thereon. This model is designed to make explicit calculations easier, for instance in applications to physics. To compare to the existing definition, we give a functorial construction of a bundle 2-gerbe as in the literature from our rigid model, including with connections. As an example we prove that the Chern--Simons bundle 2-gerbe from the literature, with its connective structure, can be rigidified -- it arises, up to isomorphism in the strongest possible sense, from a rigid bundle 2-gerbe with connective structure via this construction. Further, our rigid version of 2-gerbe trivialisation (with connections) gives rise to trivialisations (with connections) of bundle 2-gerbes in the usual sense, and as such can be used to describe geometric string structures.
△ Less
Submitted 12 April, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Empirical determination of the Bohr-Weisskopf effect in cesium and improved tests of precision atomic theory in searches for new physics
Authors:
G. Sanamyan,
B. M. Roberts,
J. S. M. Ginges
Abstract:
The finite distribution of the nuclear magnetic moment across the nucleus gives a contribution to the hyperfine structure known as the Bohr-Weisskopf (BW) effect. We have obtained an empirical value of -0.24(18)% for this effect in the ground and excited s states of atomic Cs-133. This value is found from historical muonic-atom measurements in combination with our muonic-atom and atomic many-body…
▽ More
The finite distribution of the nuclear magnetic moment across the nucleus gives a contribution to the hyperfine structure known as the Bohr-Weisskopf (BW) effect. We have obtained an empirical value of -0.24(18)% for this effect in the ground and excited s states of atomic Cs-133. This value is found from historical muonic-atom measurements in combination with our muonic-atom and atomic many-body calculations. The effect differs by 0.5% in the hyperfine structure from the value found using the uniform magnetization distribution, which has been commonly employed in the precision heavy-atom community over the last several decades. We also deduce accurate values for the BW effect in other isotopes and states of cesium. These results enable cesium atomic wave functions to be tested in the nuclear region at an unprecedented 0.2% level, and are needed for the development of precision atomic many-body methods. This is important for increasing the discovery potential of precision atomic searches for new physics, in particular for atomic parity violation in cesium.
△ Less
Submitted 13 September, 2022; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Designing low-cost TaC virtual substrates for $Al_xGa_{1-x}N$ epitaxy
Authors:
Dennice M. Roberts,
Andrew Norman,
Moira K. Miller,
M. Brooks Tellekamp
Abstract:
$Al_xGa_{1-x}N$ is a critical ultra-wide bandgap material for optoelectronics, but the deposition of thick, high quality epitaxial layers has been hindered by a lack of lattice-matched substrates. Here we identify the (111) face of transition metal carbides as a suitable class of materials for substrates lattice matched to (0001) $Al_xGa_{1-x}N…
▽ More
$Al_xGa_{1-x}N$ is a critical ultra-wide bandgap material for optoelectronics, but the deposition of thick, high quality epitaxial layers has been hindered by a lack of lattice-matched substrates. Here we identify the (111) face of transition metal carbides as a suitable class of materials for substrates lattice matched to (0001) $Al_xGa_{1-x}N$ and demonstrate the growth of thin film TaC which has an effective hexagonal lattice constant matched to $Al_{0.45}Ga_{0.55}N$. We explore growth conditions for sputtered TaC on sapphire substrates and investigate the effects of sputter power, layer thickness and incident plasma angle on film structure and in- and out-of-plane strain. We then show critical improvements to film quality by annealing films in a face-to-face configuration at 1600 $^\circ$C, which significantly reduces full width at half max (FWHM) of in- and out-of-plane diffraction peaks and results in a step-and-terrace surface morphology. This work presents a path toward electrically conductive, lattice matched, thermally compatible substrates for $Al_xGa_{1-x}N$ heteroepitaxy, a critical step for vertical devices and other power electronics applications.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Unsupervised Learning under Latent Label Shift
Authors:
Manley Roberts,
Pranav Mani,
Saurabh Garg,
Zachary C. Lipton
Abstract:
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class…
▽ More
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve upon competitive unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true grou**s, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.
△ Less
Submitted 1 December, 2022; v1 submitted 26 July, 2022;
originally announced July 2022.