Search | arXiv e-print repository

arXiv:2406.19470 [pdf, other]

Changing Answer Order Can Decrease MMLU Accuracy

Authors: Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, Megan Ung

Abstract: As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities. Most commonly, we use test accuracy averaged across multiple subtasks in order to rank models on leaderboards, to determine which model is best for our purposes. In this paper, we investigate the robustness of the accurac… ▽ More As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities. Most commonly, we use test accuracy averaged across multiple subtasks in order to rank models on leaderboards, to determine which model is best for our purposes. In this paper, we investigate the robustness of the accuracy measurement on a widely used multiple choice question answering dataset, MMLU. When shuffling the answer label contents, we find that all explored models decrease in accuracy on MMLU, but not every model is equally sensitive. These findings suggest a possible adjustment to the standard practice of leaderboard testing, where we additionally consider the percentage of examples each model answers correctly by random chance. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Short paper, 9 pages

arXiv:2406.19090 [pdf, ps, other]

doi 10.1016/j.proci.2018.05.044

Influences of stoichiometry on steadily propagating triple flames in counterflows

Authors: Prabakaran Rajamanickam, Wilfried Coenen, Antonio L. Sánchez, Forman A. Williams

Abstract: Most studies of triple flames in counterflowing streams of fuel and oxidizer have been focused on the symmetric problem in which the stoichiometric mixture fraction is $1/2$. There then exist lean and rich premixed flames of roughly equal strengths, with a diffusion flame trailing behind from the stoichiometric point at which they meet. In the majority of realistic situations, however, the stoichi… ▽ More Most studies of triple flames in counterflowing streams of fuel and oxidizer have been focused on the symmetric problem in which the stoichiometric mixture fraction is $1/2$. There then exist lean and rich premixed flames of roughly equal strengths, with a diffusion flame trailing behind from the stoichiometric point at which they meet. In the majority of realistic situations, however, the stoichiometric mixture fraction departs appreciably from unity, typically being quite small. With the objective of clarifying the influences of stoichiometry, attention is focused on one of the simplest possible models, addressed here mainly by numerical integration. When the stoichiometric mixture fraction departs appreciably from $1/2$, one of the premixed wings is found to be dominant to such an extent that the diffusion flame and the other premixed flame are very weak by comparison. These curved, partially premixed flames are expected to be relevant in realistic configurations. In addition, a simple kinematic balance is shown to predict the shape of the front and the propagation velocity reasonably well in the limit of low stretch and low curvature. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Journal ref: Porc. Combust. Inst. (2019) 37(2), 1971-1977

arXiv:2406.18232 [pdf, other]

doi 10.1016/j.combustflame.2020.03.002

Near-limit H$_2$-O$_2$-N$_2$ combustion in nonpremixed counterflow mixing layers

Authors: Jaime Carpio, Prabakaran Rajamanickam, Antonio L. Sánchez, Forman A. Williams

Abstract: Numerical computations employing detailed chemistry are used to characterize the different combustion modes emerging in mixing layers separating nitrogen-diluted counterflowing planar streams of hydrogen and oxygen. Attention is focused on high degrees of dilution, resulting in near-limit flames, with peak temperatures close to the crossover temperature. A bifurcation diagram is presented in a pla… ▽ More Numerical computations employing detailed chemistry are used to characterize the different combustion modes emerging in mixing layers separating nitrogen-diluted counterflowing planar streams of hydrogen and oxygen. Attention is focused on high degrees of dilution, resulting in near-limit flames, with peak temperatures close to the crossover temperature. A bifurcation diagram is presented in a plane, having the stoichiometric mixture fraction and normalized strain rate as coordinates, that identifies six different combustion regimes involving four different flame types, namely, diffusion-flame sheets, advancing and retreating edge flames, multiple flame tubes, and single isolated flame tubes. Multiple-tube flame configurations vary from small, round, widely separated flame strings at high strain rates to wide, flat, densely packed flame strips, with narrow flame-free gaps between them, at lower strain rates, and they are steady and stable in various arrays over a continuum of tube-separation distances. The observed flame behavior exhibits hysteresis in a certain range of parameters, with the structure that is established depending on the ignition mechanism, as it also does at high strain rates, and a continuum of different stable steady-state flame configurations exists, each accessed from a different initial condition. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Journal ref: Combust. Flame (2020) 216, 426-438

arXiv:2406.11988 [pdf, other]

Decomposed evaluations of geographic disparities in text-to-image models

Authors: Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall

Abstract: Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these dispa… ▽ More Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08507 [pdf, other]

Experimental and Computational Investigation of the Influence of Ethanol on Auto-ignition of n-Heptane in Non-Premixed Flows

Authors: Liang Ji, Kalyanasundaram Seshadri, Forman A. Williams

Abstract: Experimental and computational investigations are carried out to elucidate the influence of ethanol addition on n-heptane auto-ignition in counterflows. An axisymmetric stream of air, temperature gradually increased, is directed onto the surface of an evaporating pool of a liquid fuel. The air-stream temperature at auto-ignition is measured at various strain rates for n-heptane, ethanol, and vario… ▽ More Experimental and computational investigations are carried out to elucidate the influence of ethanol addition on n-heptane auto-ignition in counterflows. An axisymmetric stream of air, temperature gradually increased, is directed onto the surface of an evaporating pool of a liquid fuel. The air-stream temperature at auto-ignition is measured at various strain rates for n-heptane, ethanol, and various n-heptane/ethanol mixtures. Critical conditions for auto-ignition are predicted employing San Diego Mechanism for both fuels and fuel mixtures, and the results are compared with measurements. Measurements and predictions show that low-temperature chemistry (LTC) plays a significant role in promoting auto-ignition of n-heptane at low strain rates, but there is insufficient residence time at high strain rates for LTC to take place, so auto-ignition is promoted by high-temperature chemistry. Experimental and computational results show addition of ethanol inhibits LTC of n-heptane. To identify the responsible elementary steps, computations are performed to identify those dominate O2 consumption and contribute to the temperature rise in the reaction zone for n-heptane and n-heptane/ethanol mixtures at low strain rates. For n-heptane, O2 is consumed primarily by the low-temperature steps that result in ketohydroperoxide; the temperature rise is produced by subsequent LTC steps. For the mixtures, a key step consuming O2 is O2 + CH3CHOH = HO2 + CH3CHO, and the heat release occurs through the classical high-temperature reaction mechanism. Thus, the inhibition of auto-ignition that is observed to occur when ethanol is added to n-heptane arises from the competition for O2 between this step and the LTC addition of O2 to the heptyl radical and to the radical arising from the subsequent isomerization, for n-heptane. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.05183 [pdf, other]

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Authors: Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

Abstract: Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorizatio… ▽ More Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2406.01535 [pdf, other]

Towards latent space evolution of spatiotemporal dynamics of six-dimensional phase space of charged particle beams

Authors: Mahindra Rautela, Alan Williams, Alexander Scheinker

Abstract: Addressing the charged particle beam diagnostics in accelerators poses a formidable challenge, demanding high-fidelity simulations in limited computational time. Machine learning (ML) based surrogate models have emerged as a promising tool for non-invasive charged particle beam diagnostics. Trained ML models can make predictions much faster than computationally expensive physics simulations. In th… ▽ More Addressing the charged particle beam diagnostics in accelerators poses a formidable challenge, demanding high-fidelity simulations in limited computational time. Machine learning (ML) based surrogate models have emerged as a promising tool for non-invasive charged particle beam diagnostics. Trained ML models can make predictions much faster than computationally expensive physics simulations. In this work, we have proposed a temporally structured variational autoencoder model to autoregressively forecast the spatiotemporal dynamics of the 15 unique 2D projections of 6D phase space of charged particle beam as it travels through the LANSCE linear accelerator. In the model, VAE embeds the phase space projections into a lower dimensional latent space. A long-short-term memory network then learns the temporal correlations in the latent space. The trained network can evolve the phase space projections across further modules provided the first few modules as inputs. The model predicts all the projections across different modules with low mean squared error and high structural similarity index. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.13858

arXiv:2406.01532 [pdf, other]

Accelerator system parameter estimation using variational autoencoded latent regression

Authors: Mahindra Rautela, Alan Williams, Alexander Scheinker

Abstract: Particle accelerators are time-varying systems whose components are perturbed by external disturbances. Tuning accelerators can be a time-consuming process involving manual adjustment of multiple components, such as RF cavities, to minimize beam loss due to time-varying drifts. The high dimensionality of the system ($\sim$100 amplitude and phase RF settings in the LANSCE accelerator) makes it diff… ▽ More Particle accelerators are time-varying systems whose components are perturbed by external disturbances. Tuning accelerators can be a time-consuming process involving manual adjustment of multiple components, such as RF cavities, to minimize beam loss due to time-varying drifts. The high dimensionality of the system ($\sim$100 amplitude and phase RF settings in the LANSCE accelerator) makes it difficult to achieve optimal operation. The time-varying drifts and the dimensionality make system parameter estimation a challenging optimization problem. In this work, we propose a Variational Autoencoded Latent Regression (VALeR) model for robust estimation of system parameters using 2D unique projections of a charged particle beam's 6D phase space. In VALeR, VAE projects the phase space projections into a lower-dimensional latent space, and a dense neural network maps the latent space onto the space of system parameters. The trained network can predict system parameters for unseen phase space projections. Furthermore, VALeR can generate new projections by randomly sampling the latent space of VAE and also estimate the corresponding system parameters. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.19604 [pdf, other]

LBT SHARK-VIS Observes a Major Resurfacing Event on Io

Authors: Al Conrad, Fernando Pedichini, Gianluca Li Causi, Simone Antoniucci, Imke de Pater, Ashley Gerard Davies, Katherine de Kleer, Roberto Piazzesi, Vincenzo Testa, Piero Vaccari, Martina Vicinanza, Jennifer Power, Steve Ertel, Joseph C. Shields, Sam Ragland, Fabrizio Giorgi, Stuart M. Jefferies, Douglas Hope, Jason Perry, David A. Williams, David M. Nelson

Abstract: Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io's surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a ground-based telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacin… ▽ More Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io's surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a ground-based telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacing event on Io's trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images show that a plume deposit from a powerful eruption at Pillan Patera has covered part of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io's surface using adaptive optics at visible wavelengths. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 15 pages, 4 figures

arXiv:2405.14635 [pdf, ps, other]

Defective Parking Functions and Young Tableaux

Authors: Rebecca E. Garcia, Pamela E. Harris, Alex Moon, Aaron Ortiz, Lauren J. Quesada, Cynthia Marie Rivera SÁnchez, Dwight Anderson Williams II

Abstract: Recall that a defective $(m,n)$-parking function with defect $d$ is a parking function with $m$ cars attempting to park on a street with $n$ parking spots in which exactly $d$ cars fail to park. We establish a way to compute the defect of a defective $(m,n)$-parking function and show that the defect of a parking function is invariant under the action of $\mathfrak{S}_m$ the symmetric group on… ▽ More Recall that a defective $(m,n)$-parking function with defect $d$ is a parking function with $m$ cars attempting to park on a street with $n$ parking spots in which exactly $d$ cars fail to park. We establish a way to compute the defect of a defective $(m,n)$-parking function and show that the defect of a parking function is invariant under the action of $\mathfrak{S}_m$ the symmetric group on $[m]=\{1,2,\ldots,m\}$. We also show that the set of nondecreasing defective $(m,n)$-parking functions with defect $d$ are in bijection with the set of standard Young tableaux of shape $(n + d, m - d)$. This implies that the number of $\mathfrak{S}_m$-orbits of defective $(m,n)$-parking functions with defect $d$ is given by $\frac{n-m+2d+1}{n+d+1}\binom{m+n}{n+d}$. We also give a multinomial formula for the size of an $\mathfrak{S}_m$-orbit of a nondecreasing $(m,n)$-parking function with defect $d$. We conclude by using these results to give a new formula for the number of defective parking functions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 17 pages, 3 figures, 1 table

MSC Class: 05A19 (primary) 05A05; 05A15 (secondary)

arXiv:2405.13099 [pdf, other]

The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach

Authors: Mohsen Jozani, Jason A. Williams, Ahmed Aleroud, Sarbottam Bhagat

Abstract: This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded… ▽ More This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 37 pages, 15 figures

ACM Class: H.4.3; I.2.7

arXiv:2405.09464 [pdf, other]

Scalable Scheduling Policies for Quantum Satellite Networks

Authors: Albert Williams, Nitish K. Panigrahy, Andrew McGregor, Don Towsley

Abstract: As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission schedulin… ▽ More As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission scheduling in quantum satellite networks subject to resource constraints at the satellites and ground stations. We show that the most general problem of assigning satellites to ground station pairs for entanglement distribution is NP-hard. We then propose four heuristic algorithms and evaluate their performance for Starlink mega constellation under various amount of resources and placements of the ground stations. We find that the maximum number of receivers necessary per ground station grows very slowly with the total number of deployed ground stations. Our proposed algorithms, leveraging optimal weighted b-matching and the global greedy heuristic, outperform others in entanglement distribution rate, entanglement fidelity, and handover cost metrics. While we develop these scheduling algorithms, we have also designed a software system to simulate, visualize, and evaluate satellite mega-constellations for entanglement distribution. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.04457 [pdf, other]

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Authors: Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

Abstract: Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated met… ▽ More Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of "appeal" captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.16019 [pdf, other]

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Authors: Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

Abstract: Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, t… ▽ More Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.12241 [pdf, other]

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark. △ Less

Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.08842 [pdf, other]

Practical Safe Extremum Seeking with Assignable Rate of Attractivity to the Safe Set

Authors: Alan Williams, Miroslav Krstic, Alexander Scheinker

Abstract: We present Assignably Safe Extremum Seeking (ASfES), an algorithm designed to minimize a measured objective function while maintaining a measured metric of safety (a control barrier function or CBF) be positive in a practical sense. We ensure that for trajectories with safe initial conditions, the violation of safety can be made arbitrarily small with appropriately chosen design constants. We also… ▽ More We present Assignably Safe Extremum Seeking (ASfES), an algorithm designed to minimize a measured objective function while maintaining a measured metric of safety (a control barrier function or CBF) be positive in a practical sense. We ensure that for trajectories with safe initial conditions, the violation of safety can be made arbitrarily small with appropriately chosen design constants. We also guarantee an assignable ``attractivity'' rate: from unsafe initial conditions, the trajectories approach the safe set, in the sense of the measured CBF, at a rate no slower than a user-assigned rate. Similarly, from safe initial conditions, the trajectories approach the unsafe set, in the sense of the CBF, no faster than the assigned attractivity rate. The feature of assignable attractivity is not present in the semiglobal version of safe extremum seeking, where the semiglobality of convergence is achieved by slowing the adaptation. We also demonstrate local convergence of the parameter to a neighborhood of the minimum of the objective function constrained to the safe set. The ASfES algorithm and analysis are multivariable, but we also extend the algorithm to a Newton-Based ASfES scheme (NB-ASfES) which we show is only useful in the scalar case. The proven properties of the designs are illustrated through simulation examples. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08605 [pdf, other]

Quantum Iterative Methods for Solving Differential Equations with Application to Computational Fluid Dynamics

Authors: Chelsea A. Williams, Antonio A. Gentile, Vincent E. Elfving, Daniel Berger, Oleksandr Kyriienko

Abstract: We propose quantum methods for solving differential equations that are based on a gradual improvement of the solution via an iterative process, and are targeted at applications in fluid dynamics. First, we implement the Jacobi iteration on a quantum register that utilizes a linear combination of unitaries (LCU) approach to store the trajectory information. Second, we extend quantum methods to Gaus… ▽ More We propose quantum methods for solving differential equations that are based on a gradual improvement of the solution via an iterative process, and are targeted at applications in fluid dynamics. First, we implement the Jacobi iteration on a quantum register that utilizes a linear combination of unitaries (LCU) approach to store the trajectory information. Second, we extend quantum methods to Gauss-Seidel iterative methods. Additionally, we propose a quantum-suitable resolvent decomposition based on the Woodbury identity. From a technical perspective, we develop and utilize tools for the block encoding of specific matrices as well as their multiplication. We benchmark the approach on paradigmatic fluid dynamics problems. Our results stress that instead of inverting large matrices, one can program quantum computers to perform multigrid-type computations and leverage corresponding advances in scientific computing. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 8 pages, 7 figures

arXiv:2404.06214 [pdf, other]

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03814 [pdf, other]

I Did Not Notice: A Comparison of Immersive Analytics with Augmented and Virtual Reality

Authors: Xiaoyan Zhou, Anil Ufuk Batmaz, Adam S. Williams, Dylan Schreiber, Francisco Ortega

Abstract: Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study see… ▽ More Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study seeks to investigate how augmented and virtual reality influences sensemaking processes in quantitative immersive analytics. Our results, derived from a diverse group of participants, indicate that users demonstrate comparable performance in both environments. However, it was observed that users exhibit a higher tolerance for cognitive load in VR and travel further in AR. Based on our findings, we recommend providing users with the option to switch between AR and VR, thereby enabling them to select an environment that aligns with their preferences and task requirements. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.17804 [pdf, other]

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.15968 [pdf, ps, other]

Symplectic differential reduction algebras and skew-affine generalized Weyl algebras

Authors: Jonas T. Hartwig, Dwight Anderson Williams II

Abstract: For a map $\varphi\!:U(\mathfrak{g})\rightarrow A$ of associative algebras, $U(\mathfrak{g})$ the universal envelo** algebra of a (complex) finite-dimensional reductive Lie algebra, the representation theory of $A$ is intimately tied to the representation theory of the $A$-subquotient known as the reduction algebra for $(A,\mathfrak{g}, \varphi)$. Herlemont and Ogievetsky studied differential re… ▽ More For a map $\varphi\!:U(\mathfrak{g})\rightarrow A$ of associative algebras, $U(\mathfrak{g})$ the universal envelo** algebra of a (complex) finite-dimensional reductive Lie algebra, the representation theory of $A$ is intimately tied to the representation theory of the $A$-subquotient known as the reduction algebra for $(A,\mathfrak{g}, \varphi)$. Herlemont and Ogievetsky studied differential reduction algebras for the general linear Lie algebra $\mathfrak{gl}(n)$ as the algebra of $h$-deformed differential operators formed from realizations of $\mathfrak{gl}(n)$ in the $N$-fold tensor product of the $n $th Weyl algebra. In this paper, we further the study of differential reduction algebras by presenting the symplectic differential reduction algebra $D\big(\mathfrak{sp}(4)\big)$, by generators and relations, and showing its connections to Bavula's generalized Weyl algebras (GWAs). In doing so, we determine a new class of GWAs we call $\textit{skew-affine}$ GWAs, of which $D\big(\mathfrak{gl}(2)\big)$ and $D\big(\mathfrak{sp}(4)\big)$ are examples. We conjecture that the differential reduction algebra of the orthosymplectic Lie superalgebra $\mathfrak{osp}(1|2n)$ is a twisted generalized Weyl algebra (TGWA) and that the relations for $D\big(\mathfrak{sp}(2n)\big)$ yield solutions to the dynamical Yang-Baxter equation (DYBE). △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 30 pages. Comments welcomed!

MSC Class: 16S80; 16S85; 16T25; 17B10; 17B60

arXiv:2403.15081 [pdf]

doi 10.1038/s41893-024-01308-8

Sustainable Skies and the Earth-Space Environment

Authors: Andrew Williams, Aaron Boley, Giuliana Rotola, Richard Green

Abstract: The rapid launch of hundreds of thousands of satellites into Low Earth Orbit will significantly alter our view of the sky and raise concerns about the sustainability of Earth's orbital space. A new framework for sustainable space development must balance technological advancement, protection of space environments, and our capacity to explore the Universe. The rapid launch of hundreds of thousands of satellites into Low Earth Orbit will significantly alter our view of the sky and raise concerns about the sustainability of Earth's orbital space. A new framework for sustainable space development must balance technological advancement, protection of space environments, and our capacity to explore the Universe. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Journal ref: NatSustain 7 (2024) 228 - 231

arXiv:2403.13858 [pdf, other]

A conditional latent autoregressive recurrent model for generation and forecasting of beam dynamics in particle accelerators

Authors: Mahindra Rautela, Alan Williams, Alexander Scheinker

Abstract: Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Re… ▽ More Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Recurrent Model (CLARM) for learning the spatiotemporal dynamics of charged particles in accelerators. CLARM consists of a Conditional Variational Autoencoder (CVAE) transforming six-dimensional phase space into a lower-dimensional latent distribution and a Long Short-Term Memory (LSTM) network capturing temporal dynamics in an autoregressive manner. The CLARM can generate projections at various accelerator modules by sampling and decoding the latent space representation. The model also forecasts future states (downstream locations) of charged particles from past states (upstream locations). The results demonstrate that the generative and forecasting ability of the proposed approach is promising when tested against a variety of evaluation metrics. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12201 [pdf, other]

Compositional learning of functions in humans and machines

Authors: Yanli Zhou, Brenden M. Lake, Adina Williams

Abstract: The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context… ▽ More The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 7 pages, 6 figures

arXiv:2403.04857 [pdf, other]

Dark Matter Line Searches with the Cherenkov Telescope Array

Authors: S. Abe, J. Abhir, A. Abhishek, F. Acero, A. Acharyya, R. Adam, A. Aguasca-Cabot, I. Agudo, A. Aguirre-Santaella, J. Alfaro, R. Alfaro, N. Alvarez-Crespo, R. Alves Batista, J. -P. Amans, E. Amato, G. Ambrosi, L. Angel, C. Aramo, C. Arcaro, T. T. H. Arnesen, L. Arrabito, K. Asano, Y. Ascasibar, J. Aschersleben, H. Ashkar , et al. (540 additional authors not shown)

Abstract: Monochromatic gamma-ray signals constitute a potential smoking gun signature for annihilating or decaying dark matter particles that could relatively easily be distinguished from astrophysical or instrumental backgrounds. We provide an updated assessment of the sensitivity of the Cherenkov Telescope Array (CTA) to such signals, based on observations of the Galactic centre region as well as of sele… ▽ More Monochromatic gamma-ray signals constitute a potential smoking gun signature for annihilating or decaying dark matter particles that could relatively easily be distinguished from astrophysical or instrumental backgrounds. We provide an updated assessment of the sensitivity of the Cherenkov Telescope Array (CTA) to such signals, based on observations of the Galactic centre region as well as of selected dwarf spheroidal galaxies. We find that current limits and detection prospects for dark matter masses above 300 GeV will be significantly improved, by up to an order of magnitude in the multi-TeV range. This demonstrates that CTA will set a new standard for gamma-ray astronomy also in this respect, as the world's largest and most sensitive high-energy gamma-ray observatory, in particular due to its exquisite energy resolution at TeV energies and the adopted observational strategy focussing on regions with large dark matter densities. Throughout our analysis, we use up-to-date instrument response functions, and we thoroughly model the effect of instrumental systematic uncertainties in our statistical treatment. We further present results for other potential signatures with sharp spectral features, e.g.~box-shaped spectra, that would likewise very clearly point to a particle dark matter origin. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 43 pages JCAP style (excluding author list and references), 19 figures

arXiv:2403.02556 [pdf]

Revealing the EuCd_{2}As_{2} Semiconducting Band Gap via n-type La-Do**

Authors: Ryan A. Nelson, Jesaiah King, Shuyu Cheng, Archibald J. Williams, Christopher Jozwiak, Aaron Bostwick, Eli Rotenberg, Souvik Sasmal, I-Hsuan Kao, Aalok Tiwari, Natalie R. Jones, Chuting Cai, Emma Martin, Andrei Dolocan, Li Shi, Roland Kawakami, Joseph P. Heremans, Jyoti Katoch, Joshua E. Goldberger

Abstract: EuCd_{2}As_{2} has attracted considerable interest as one of the few magnetic Weyl semimetal candidate materials, although recently there have been emerging reports that claim it to have a semiconducting electronic structure. To resolve this debate, we established the growth of n-type EuCd_{2}As_{2} crystals, to directly visualize the nature of the conduction band using angle resolve photoemission… ▽ More EuCd_{2}As_{2} has attracted considerable interest as one of the few magnetic Weyl semimetal candidate materials, although recently there have been emerging reports that claim it to have a semiconducting electronic structure. To resolve this debate, we established the growth of n-type EuCd_{2}As_{2} crystals, to directly visualize the nature of the conduction band using angle resolve photoemission spectroscopy (ARPES). We show that La-do** leads to n-type transport signatures in both the thermopower and Hall effect measurements, in crystals with do** levels at 2 - 6 x 10^{17} e^{-} cm^{-3}. Both p-type and n-type doped samples exhibit antiferromagnetic ordering at 9 K. ARPES experiments at 6 K clearly show the presence of the conduction band minimum at 0.8 eV above the valence band maximum, which is further corroborated by the observation of a 0.71 - 0.72 eV band gap in room temperature diffuse reflectance absorbance measurements. Together these findings unambiguously show that EuCd_{2}As_{2} is indeed a semiconductor with a substantial band gap and not a topological semimetal. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.16498 [pdf, other]

Results of the follow-up of ANTARES neutrino alerts

Authors: A. Albert, S. Alves, M. André, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, Y. Becherini, B. Belhorma, M. Bendahman, F. Benfenati, V. Bertin, S. Biagi, M. Bissinger, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzas, R. Bruijn, J. Brunner, J. Busto, B. Caiffi, D. Calvo , et al. (166 additional authors not shown)

Abstract: High-energy neutrinos could be produced in the interaction of charged cosmic rays with matter or radiation surrounding astrophysical sources. To look for transient sources associated with neutrino emission, a follow-up program of neutrino alerts has been operating within the ANTARES Collaboration since 2009. This program, named TAToO, has triggered robotic optical telescopes (MASTER, TAROT, ROTSE… ▽ More High-energy neutrinos could be produced in the interaction of charged cosmic rays with matter or radiation surrounding astrophysical sources. To look for transient sources associated with neutrino emission, a follow-up program of neutrino alerts has been operating within the ANTARES Collaboration since 2009. This program, named TAToO, has triggered robotic optical telescopes (MASTER, TAROT, ROTSE and the SVOM ground based telescopes) immediately after the detection of any relevant neutrino candidate and scheduled several observations in the weeks following the detection. A subset of ANTARES events with highest probabilities of being of cosmic origin has also been followed by the Swift and the INTEGRAL satellites, the Murchison Widefield Array radio telescope and the H.E.S.S. high-energy gamma-ray telescope. The results of twelve years of observations are reported. No optical counterpart has been significantly associated with an ANTARES candidate neutrino signal during image analysis. Constraints on transient neutrino emission have been set. In September 2015, ANTARES issued a neutrino alert and during the follow-up, a potential transient counterpart was identified by Swift and MASTER. A multi-wavelength follow-up campaign has allowed to identify the nature of this source and has proven its fortuitous association with the neutrino. The return of experience is particularly important for the design of the alert system of KM3NeT, the next generation neutrino telescope in the Mediterranean Sea. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 27 pages, 14 figures, submitted to JCAP

arXiv:2402.04900 [pdf, other]

Clouds and Seasonality on Terrestrial Planets with Varying Rotation Rates

Authors: Daniel A. Williams, Xuan Ji, Paul Corlies, Juan M. Lora

Abstract: Using an idealised climate model incorporating seasonal forcing, we investigate the impact of rotation rate on the abundance of clouds on an Earth-like aquaplanet, and the resulting impacts upon albedo and seasonality. We show that the cloud distribution varies significantly with season, depending strongly on the rotation rate, and is well explained by the large-scale circulation and atmospheric s… ▽ More Using an idealised climate model incorporating seasonal forcing, we investigate the impact of rotation rate on the abundance of clouds on an Earth-like aquaplanet, and the resulting impacts upon albedo and seasonality. We show that the cloud distribution varies significantly with season, depending strongly on the rotation rate, and is well explained by the large-scale circulation and atmospheric state. Planetary albedo displays non-monotonic behaviour with rotation rate, peaking at around 1/2$Ω_E$. Clouds reduce the surface temperature and total precipitation relative to simulations without clouds at all rotation rates, and reduce the dependence of total precipitation on rotation rate, causing non-monotonic behaviour and a local maximum around 1/8$Ω_E$ ; these effects are related to the impacts of clouds on the net atmospheric and surface radiative energy budgets. Clouds also affect the seasonality. The influence of clouds on the extent of the winter Hadley cell and the intertropical convergence zone is relatively minor at slow rotation rates ($<$1/8$Ω_E$ ), but becomes more pronounced at intermediate rotation rates, where clouds decrease their maximum latitudes. The timing of seasonal transitions varies with rotation rate, and the addition of clouds reduces the seasonal phase lag. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 21 pages, 9 figures

arXiv:2401.14963 [pdf, other]

On the Hardness of Gray Code Problems for Combinatorial Objects

Authors: Arturo Merino, Namrata, Aaron Williams

Abstract: Can a list of binary strings be ordered so that consecutive strings differ in a single bit? Can a list of permutations be ordered so that consecutive permutations differ by a swap? Can a list of non-crossing set partitions be ordered so that consecutive partitions differ by refinement? These are examples of Gray coding problems: Can a list of combinatorial objects (of a particular type and size) b… ▽ More Can a list of binary strings be ordered so that consecutive strings differ in a single bit? Can a list of permutations be ordered so that consecutive permutations differ by a swap? Can a list of non-crossing set partitions be ordered so that consecutive partitions differ by refinement? These are examples of Gray coding problems: Can a list of combinatorial objects (of a particular type and size) be ordered so that consecutive objects differ by a flip (of a particular type)? For example, 000, 001, 010, 100 is a no instance of the first question, while 1234, 1324, 1243 is a yes instance of the second question due to the order 1243, 1234, 1324. We prove that a variety of Gray coding problems are NP-complete using a new tool we call a Gray code reduction. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 15 pages, 5 figures, WALCOM 2024

arXiv:2401.12295 [pdf, other]

Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

Authors: Leonardo Castro-Gonzalez, Yi-Ling Chung, Hannak Rose Kirk, John Francis, Angus R. Williams, Pica Johansson, Jonathan Bright

Abstract: The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this ar… ▽ More The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this article we review three `cheap' techniques that have developed in recent years: weak supervision, transfer learning and prompt engineering. For the latter, we also review the particular case of zero-shot prompting of large language models. For each technique we provide a guide of how it works and demonstrate its application across six different realistic social science applications (two different tasks paired with three different dataset makeups). We show good performance for all techniques, and in particular we demonstrate how prompting of large language models can achieve high accuracy at very low cost. Our results are accompanied by a code repository to make it easy for others to duplicate our work and use it in their own research. Overall, our article is intended to stimulate further uptake of these techniques in the social sciences. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 39 pages, 10 figures, 6 tables

ACM Class: I.2.7; J.4

arXiv:2401.08884 [pdf]

Extreme Metastability of Diamond and its Transformation to BC8 Post-Diamond Phase of Carbon

Authors: Kien Nguyen-Cong, Jonathan T. Willman, Joseph M. Gonzalez, Ashley S. Williams, Anatoly B. Belonoshko, Stan G. Moore, Aidan P. Thompson, Mitchell A. Wood, Jon H. Eggert, Marius Millot, Luis A. Zepeda-Ruiz, Ivan I. Oleynik

Abstract: Diamond possesses exceptional physical properties due to its remarkably strong carbon-carbon bonding, leading to significant resilience to structural transformations at very high pressures and temperatures. Despite several experimental attempts, synthesis and recovery of the theoretically predicted post-diamond BC8 phase remains elusive. Through quantum accurate, multi-million atom molecular dynam… ▽ More Diamond possesses exceptional physical properties due to its remarkably strong carbon-carbon bonding, leading to significant resilience to structural transformations at very high pressures and temperatures. Despite several experimental attempts, synthesis and recovery of the theoretically predicted post-diamond BC8 phase remains elusive. Through quantum accurate, multi-million atom molecular dynamics (MD) simulations, we have uncovered the extreme metastability of diamond at very high pressures, significantly exceeding its range of thermodynamic stability. We predict the post-diamond BC8 phase to be experimentally accessible only within a narrow high pressure-temperature region of the carbon phase diagram. The diamond to BC8 transformation proceeds through pre-melting followed by BC8 nucleation and growth in the metastable carbon liquid. We propose a double-shock compression pathway to achieve BC8 synthesis, which is currently being explored in theory-inspired experiments at the National Ignition Facility. △ Less

Submitted 22 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07911 [pdf, other]

Optical spectroscopy of blazars for the Cherenkov Telescope Array -- III

Authors: F. D'Ammando, P. Goldoni, W. Max-Moerbeck, J. Becerra Gonzalez, E. Kasai, D. A. Williams, N. Alvarez-Crespo, M. Backes, U. Barres de Almeida, C. Boisson, G. Cotter, V. Fallah Ramazani, O. Hervet, E. Lindfors, D. Mukhi-Nilo, S. Pita, M. Splettstoesser, B. van Soelen

Abstract: Due to their almost featureless optical/UV spectra, it is challenging to measure the redshifts of BL Lacs. As a result, about 50% of gamma-ray BL Lacs lack a firm measurement of this property, which is fundamental for population studies, indirect estimates of the EBL, and fundamental physics probes. This paper is the third in a series of papers aimed at determining the redshift of a sample of blaz… ▽ More Due to their almost featureless optical/UV spectra, it is challenging to measure the redshifts of BL Lacs. As a result, about 50% of gamma-ray BL Lacs lack a firm measurement of this property, which is fundamental for population studies, indirect estimates of the EBL, and fundamental physics probes. This paper is the third in a series of papers aimed at determining the redshift of a sample of blazars selected as prime targets for future observations with the next generation, ground-based VHE gamma-ray astronomy observatory, Cherenkov Telescope Array Observatory (CTAO). The accurate determination of the redshift of these objects is an important aid in source selection and planning of future CTAO observations. The selected targets were expected to be detectable with CTAO in observations of 30 hours or less. We performed deep spectroscopic observations of 41 of these blazars using the Keck II, Lick, SALT, GTC, and ESO/VLT telescopes. We carefully searched for spectral lines in the spectra and whenever features of the host galaxy were detected, we attempted to model the properties of the host galaxy. The magnitudes of the targets at the time of the observations were also compared to their long-term light curves. Spectra from 24 objects display spectral features or a high S/N. From these, 12 spectroscopic redshifts were determined, ranging from 0.2223 to 0.7018. Furthermore, 1 tentative redshift (0.6622) and 2 redshift lower limits at z > 0.6185 and z > 0.6347 were obtained. The other 9 BL Lacs showed featureless spectra, despite the high S/N (> 100) observations. Our comparisons with long-term optical light curves tentatively suggest that redshift measurements are more straightforward during an optical low state of the AGN. Overall, we have determined 37 redshifts and 6 spectroscopic lower limits as part of our programme thus far. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted for publication in Astronomy and Astrophysics. 17 pages, 4 Figures, 10 Tables

arXiv:2401.04346 [pdf, other]

A commensal Fast Radio Burst search pipeline for the Murchison Widefield Array

Authors: M. Sokolowski, I. S. Morrison, D. Price, G. Sleap, B. Crosse, A. Williams, L. Williams, C. James, B. W. Meyers, S. McSweeney, N. D. R. Bhat, G. Anderson

Abstract: We present a demonstration version of a commensal pipeline for Fast Radio Burst (FRB) searches using a real-time incoherent beam from the Murchison Widefield Array (MWA). The main science target of the pipeline are bright nearby FRBs from the local Universe which are the best candidates to probe FRB progenitors and understand physical mechanisms powering these extremely energetic events. The new M… ▽ More We present a demonstration version of a commensal pipeline for Fast Radio Burst (FRB) searches using a real-time incoherent beam from the Murchison Widefield Array (MWA). The main science target of the pipeline are bright nearby FRBs from the local Universe which are the best candidates to probe FRB progenitors and understand physical mechanisms powering these extremely energetic events. The new MWA beamformer, known as the "MWAX multibeam beamformer", can form multiple incoherent and coherent beams commensally to any on-going MWA observations. One of the beams is currently used for FRB searches (tested in 10 kHz frequency resolution and time resolutions between 0.1 and 100 ms). A second beam is used for a Search for Extraterrestrial Intelligence (SETI). This paper focuses on the FRB search pipeline and its verification on selected known bright pulsars. The pipeline uses the FREDDA implementation of the Fast Dispersion Measure Transform algorithm (FDMT) for single pulse searches. Initially, it was tested during standard MWA observations, and more recently using dedicated observations of selected 11 bright pulsars. The pulsar PSR J0835-4510 (aka Vela) has been routinely used as the primary probe of the data quality because its folded profile was always detected in the frequency band 200 - 230 MHz with typical SNR >10. Similarly, the low DM pulsar PSR B0950+08 was always detected in folded profile in the frequency band 140 - 170 MHz, and so far has been the only object for which single pulses were detected. We present the estimated sensitivity of the search in the currently limited observing bandwidth of a single MWA coarse channel (1.28 MHz) and for the upgraded, future system with 12.8 MHz (10 channels) of bandwidth. Based on expected sensitivity and existing FRB rate measurements, we estimate an expected number of FRB detections to be between a few and a few tens per year. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 22 pages, 12 figures, 4 tables. Accepted for publication in PASA

arXiv:2401.00197 [pdf, other]

ODAQ: Open Dataset of Audio Quality

Authors: Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international labora… ▽ More Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international laboratories. ODAQ contains 240 audio samples and corresponding quality scores. Each audio sample is rated by 26 listeners. The audio samples are stereo audio signals sampled at 44.1 or 48 kHz and are processed by a total of 6 method classes, each operating at different quality levels. The processing method classes are designed to generate quality degradations possibly encountered during audio coding and source separation, and the quality levels for each method class span the entire quality range. The diversity of the processing methods, the large span of quality levels, the high sampling frequency, and the pool of international listeners make ODAQ particularly suited for further research into subjective and objective audio quality. The dataset is released with permissive licenses, and the software used to conduct the listening test is also made publicly available. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

arXiv:2312.16052 [pdf, other]

Pattern Avoidance for Fibonacci Sequences using $k$-Regular Words

Authors: Emily Downing, Elizabeth Hartung, Aaron Williams

Abstract: Two $k$-ary Fibonacci recurrences are $a_k(n) = a_k(n-1) + k \cdot a_k(n-2)$ and $b_k(n) = k \cdot b_k(n-1) + b_k(n-2)$. We provide a simple proof that $a_k(n)$ is the number of $k$-regular words over $[n] = \{1,2,\ldots,n\}$ that avoid patterns $\{121, 123, 132, 213\}$ when using base cases $a_k(0) = a_k(1) = 1$ for any $k \geq 1$. This was previously proven by Kuba and Panholzer in the context o… ▽ More Two $k$-ary Fibonacci recurrences are $a_k(n) = a_k(n-1) + k \cdot a_k(n-2)$ and $b_k(n) = k \cdot b_k(n-1) + b_k(n-2)$. We provide a simple proof that $a_k(n)$ is the number of $k$-regular words over $[n] = \{1,2,\ldots,n\}$ that avoid patterns $\{121, 123, 132, 213\}$ when using base cases $a_k(0) = a_k(1) = 1$ for any $k \geq 1$. This was previously proven by Kuba and Panholzer in the context of Wilf-equivalence for restricted Stirling permutations, and it creates Simion and Schmidt's classic result on the Fibonacci sequence when $k=1$, and the Jacobsthal sequence when $k=2$. We complement this theorem by proving that $b_k(n)$ is the number of $k$-regular words over $[n]$ that avoid $\{122, 213\}$ with $b_k(0) = b_k(1) = 1$ for any~$k \geq 2$. Finally, we conjecture that $|Av^{2}_{n}(\underline{121}, 123, 132, 213)| = a_1(n)^2$ for $n \geq 0$. That is, vincularizing the Stirling pattern in Kuba and Panholzer's Jacobsthal result gives the Fibonacci-squared numbers. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 15 pages, submitted to special journal issue for Permutation Patterns 2023 (PP23) in DMTCS

MSC Class: 05 (Primary) 68 (Secondary) ACM Class: G.2.1; G.4

arXiv:2312.15371 [pdf, other]

New three-dimensional dispersion in the type-II Dirac semimetals PtTe$_2$ and PdTe$_2$ revealed through Angle Resolved Photoemission Spectroscopy

Authors: Ivan Pelayo, Derek Bergner, Archibald J. Williams, Jiayuwen Qi, Penghao Zhu, Mahfuzun Nabi, Warren L. B. Huey, Luca Moreschini, Ziling Deng, Jonathan Denlinger, Alessandra Lanzara, Yuan-Ming Lu, Wolfgang Windl, Joshua Goldberger, Claudia Ojeda-Aristizabal

Abstract: PtTe$_2$ and PdTe$_2$ are among the first transition metal dichalcogenides that were predicted to host type-II Dirac fermions, exotic particles prohibited in free space. These materials are layered and air-stable, which makes them top candidates for technological applications that take advantage of their anisotropic magnetotransport properties. Here, we provide a detailed characterization of the e… ▽ More PtTe$_2$ and PdTe$_2$ are among the first transition metal dichalcogenides that were predicted to host type-II Dirac fermions, exotic particles prohibited in free space. These materials are layered and air-stable, which makes them top candidates for technological applications that take advantage of their anisotropic magnetotransport properties. Here, we provide a detailed characterization of the electronic structure of PtTe$_2$ and PdTe$_2$ using Angle Resolved Photoemission Spectroscopy (ARPES) and Density Functional Theory (DFT) calculations, unveiling a new three-dimensional dispersion in these materials. Through the use of circularly polarized light, we report a different behavior of such dispersion in PdTe$_2$ compared to PtTe$_2$, that we relate to a symmetry analysis of the dipole matrix element. Such analysis reveals a link between the observed circular dichroism and the different momentum-dependent terms in the dispersion of these two compounds, despite their close similarity in crystal structure. Additionally, our data shows a clear difference in the circular dichroic signal for the type-II Dirac cones characteristic of these materials, compared to their topologically protected surface states. Our work provides a useful reference for the ARPES characterization of other transition metal dichalcogenides with topological properties and illustrates the use of circular dichroism as a guide to identify the topological character of two otherwise equivalent band dispersions, and to recognize different attributes in the band structure of similar materials. △ Less

Submitted 16 May, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures

arXiv:2312.15227 [pdf, other]

Considering a Classical Upper Bound on the Frobenius Number

Authors: Aled Williams, Daiki Haijima

Abstract: In this paper we study the (classical) Frobenius problem, namely the problem of finding the largest integer that cannot be represented as a nonnegative integral combination of given relatively prime (strictly) positive integers (known as the Frobenius number). The main contribution of this paper are observations regarding a previously known upper bound on the Frobenius number where, in particular,… ▽ More In this paper we study the (classical) Frobenius problem, namely the problem of finding the largest integer that cannot be represented as a nonnegative integral combination of given relatively prime (strictly) positive integers (known as the Frobenius number). The main contribution of this paper are observations regarding a previously known upper bound on the Frobenius number where, in particular, we observe that a previously presented argument features a subtle error, which alters the value of the upper bound. Despite this, we demonstrate that the subtle error does not impact upon on the validity of the upper bound, although it does impact on the upper bounds tightness. Notably, we formally state the corrected result and additionally compare the relative tightness of the corrected upper bound with the original. In particular, we show that the updated bound is tighter in all but only a relatively "small" number of cases using both formal techniques and via Monte Carlo simulation techniques. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 13 pages, 1 figure

MSC Class: 11D07; 11D04

arXiv:2312.14069 [pdf, other]

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Authors: Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux

Abstract: We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a… ▽ More We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.07774

VERITAS contributions to the 38th International Cosmic Ray Conference

Authors: A. Acharyya, C. B. Adams, A. Archer, P. Bangale, J. T. Bartkoske, P. Batista, W. Benbow, J. L. Christiansen, A. J. Chromey, A. Duerr, M. Errando, Q. Feng, G. M. Foote, L. Fortson, A. Furniss, W. Hanlon, O. Hervet, C. E. Hinrichs, J. Hoang, J. Holder, Z. Hughes, T. B. Humensky, W. **, M. N. Johnson, M. Kertzman , et al. (39 additional authors not shown)

Abstract: Compilation of papers presented by the VERITAS Collaboration at the 38th International Cosmic Ray Conference (ICRC), held July 26 through August 3, 2023 in Nagoya, Japan. Compilation of papers presented by the VERITAS Collaboration at the 38th International Cosmic Ray Conference (ICRC), held July 26 through August 3, 2023 in Nagoya, Japan. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: html page. ICRC 2023, Nagoya, Japan

arXiv:2312.06027 [pdf, other]

Extending Global Fits of 4D Composite Higgs Models with Partially Composite Leptons

Authors: Ethan Carragher, Kenn Shern Goh, Wei Su, Martin White, Anthony G. Williams

Abstract: We perform the first convergent Bayesian global fits of 4D Composite Higgs Models with partially-composite third generation quarks and leptons based on the minimal $SO(5) \rightarrow SO(4)$ symmetry breaking pattern. We consider two models with the $τ$ lepton and its associated neutrino in different representations of $SO(5)$. Fitting each model with a wide array of experimental constraints allows… ▽ More We perform the first convergent Bayesian global fits of 4D Composite Higgs Models with partially-composite third generation quarks and leptons based on the minimal $SO(5) \rightarrow SO(4)$ symmetry breaking pattern. We consider two models with the $τ$ lepton and its associated neutrino in different representations of $SO(5)$. Fitting each model with a wide array of experimental constraints allows us to analyse the Bayesian evidence and currently-observed fine-tuning of each model by calculating the Kullback-Leibler divergence between their respective priors and posteriors. Notably both models are found to be capable of satisfying all constraints simultaneously at the $3σ$ level at scales of $< 5$ TeV. From a Bayesian viewpoint of naturalness the model with leptons in the $\mathbf{14}$ and $\mathbf{10}$ representations is preferred over those in the $\mathbf{5}$ representation due to its lower fine-tuning. Finally, we consider the experimental signatures for the preferred parameters in these models, including lepton partner decay signatures and gluon-fusion produced Higgs signal strengths, and discuss their potential phenomenology at future high-luminosity LHC runs. △ Less

Submitted 30 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: 32+12 pages, 13+8 figures; version 2 with improved content following comments from referee; ver 3 improved clarity

arXiv:2312.00131 [pdf, other]

doi 10.1051/0004-6361/202348761

The dense and non-homogeneous circumstellar medium revealed in radio wavelengths around the Type Ib SN 2019oys

Authors: Itai Sfaradi, Assaf Horesh, Jesper Sollerman, Rob Fender, Lauren Rhodes, David R. A. Williams, Joe Bright, Dave A. Green, Steve Schulze, Avishay Gal-Yam

Abstract: We present here broadband radio observations of the CSM interacting SN2019oys. SN2019oys was first detected in the optical and was classified as a Type Ib SN. Then, about $\sim 100$ days after discovery, it showed an optical rebrightening and a spectral transition to a spectrum dominated by strong narrow emission lines, which suggests strong interaction with a distant, dense, CSM shell. We modeled… ▽ More We present here broadband radio observations of the CSM interacting SN2019oys. SN2019oys was first detected in the optical and was classified as a Type Ib SN. Then, about $\sim 100$ days after discovery, it showed an optical rebrightening and a spectral transition to a spectrum dominated by strong narrow emission lines, which suggests strong interaction with a distant, dense, CSM shell. We modeled the broadband, multi-epoch, radio spectra, covering 2.2 to 36 GHz and spanning from 22 to 1425 days after optical discovery, as a synchrotron emitting source. Using this modeling we characterized the shockwave and the mass-loss rate of the progenitor. Our broadband radio observations show strong synchrotron emission. This emission, as observed 201 and 221 days after optical discovery, exhibits signs of free-free absorption from the material in front of the shock traveling in the CSM. In addition, the steep power law of the optically thin regime points towards synchrotron cooling of the radiating electrons. Analyzing these spectra in the context of the SN-CSM interaction model gives a shock velocity of 14,000 $\rm km \, s^{-1}$, and an electron number density of $2.6 \times 10^5 \, \rm cm^{-3}$ at a distance of $2.6 \times 10^{16}$ cm. This translates to a high mass-loss rate from the progenitor massive star of $6.7 \times 10^{-4} \, \rm M_{\odot} yr^{-1}$ for an assumed wind of 100 $\rm km s^{-1}$ (assuming constant mass-loss rate in steady winds). The late-time radio spectra, 392 and 557 days after optical discovery, are showing broad spectral peaks. We show that this can be explained by introducing a non-homogeneous CSM structure. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 14 pages, 7 figures, 2 tables, submitted to Astronomy & Astrophysics

Journal ref: A&A 686, A129 (2024)

arXiv:2311.18567 [pdf, other]

Grammatical Gender's Influence on Distributional Semantics: A Causal Perspective

Authors: Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell

Abstract: How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which… ▽ More How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which claims that even inanimate noun gender influences how people conceive of and talk about objects (using the choice of adjective used to modify inanimate nouns as a proxy for meaning). We offer a novel, causal graphical model that jointly represents the interactions between a noun's grammatical gender, its meaning, and adjective choice. In accordance with past results, we find a relationship between the gender of nouns and the adjectives which modify them. However, when we control for the meaning of the noun, we find that grammatical gender has a near-zero effect on adjective choice, thereby calling the neo-Whorfian hypothesis into question. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18140 [pdf, other]

ROBBIE: Robust Bias Evaluation of Large Generative Language Models

Authors: David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Michael Smith

Abstract: As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensur… ▽ More As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensure equal and equitable treatment of marginalized demographic groups. In this work, our focus is two-fold: (1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. Out of those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in the paper. The comparison of those benchmarks gives us insights about the bias and toxicity of the compared models. Therefore, we explore the frequency of demographic terms in common LLM pre-training corpora and how this may relate to model biases. (2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements. ROBBIE aims to provide insights for practitioners while deploying a model, emphasizing the need to not only measure potential harms, but also understand how they arise by characterizing the data, mitigate harms once found, and balance any trade-offs. We open-source our analysis code in hopes of encouraging broader measurements of bias in future LLMs. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: EMNLP 2023

arXiv:2311.14055 [pdf, ps, other]

Interval and $\ell$-interval Rational Parking Functions

Authors: Tomás Aguilar-Fraga, Jennifer Elder, Rebecca E. Garcia, Kimberly P. Hadaway, Pamela E. Harris, Kimberly J. Harry, Imhotep B. Hogan, Jakeyl Johnson, Jan Kretschmann, Kobe Lawson-Chavanu, J. Carlos Martínez Mori, Casandra D. Monroe, Daniel Quiñonez, Dirk Tolson III, Dwight Anderson Williams II

Abstract: Interval parking functions are a generalization of parking functions in which cars have an interval preference for their parking. We generalize this definition to parking functions with $n$ cars and $m\geq n$ parking spots, which we call interval rational parking functions and provide a formula for their enumeration. By specifying an integer parameter $\ell\geq 0$, we then consider the subset of i… ▽ More Interval parking functions are a generalization of parking functions in which cars have an interval preference for their parking. We generalize this definition to parking functions with $n$ cars and $m\geq n$ parking spots, which we call interval rational parking functions and provide a formula for their enumeration. By specifying an integer parameter $\ell\geq 0$, we then consider the subset of interval rational parking functions in which each car parks at most $\ell$ spots away from their initial preference. We call these $\ell$-interval rational parking functions and provide recursive formulas to enumerate this set for all positive integers $m\geq n$ and $\ell$. We also establish formulas for the number of nondecreasing $\ell$-interval rational parking functions via the outcome map on rational parking functions. We also consider the intersection between $\ell$-interval parking functions and Fubini rankings and show the enumeration of these sets is given by generalized Fibonacci numbers. We conclude by specializing $\ell=1$, and establish that the set of $1$-interval rational parking functions with $n$ cars and $m$ spots are in bijection with the set of barred preferential arrangements of $[n]$ with $m-n$ bars. This readily implies enumerative formulas. Further, in the case where $\ell=1$, we recover the results of Hadaway and Harris that unit interval parking functions are in bijection with the set of Fubini rankings, which are enumerated by the Fubini numbers. △ Less

Submitted 24 May, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: New updated version of Corollary 4.8

MSC Class: 05A05; 05A15; 05A18; 05A19

arXiv:2311.11437 [pdf]

Decoding the Molecular Universe -- Workshop Report

Authors: Thomas O. Metz, Joshua N. Adkins, Peter B. Armentrout, Patrick Chain, Fanny Chu, Courtney D Corley, John R. Cort, Elizabeth Denis, Daniel Drell, Katherine R. Duncan, Robert G. Ewing, Facundo M. Fernandez, Oliver Fiehn, Neha Garg, Stefan Grimme, Christopher Henry, Robert L. Hettich, Tobias Kind, Roger G. Linington, Gary W. Miller, Trent Northen, Kirsten Overdahl, Ari Patrinos, Daniel Raftery, Paul Rigor , et al. (8 additional authors not shown)

Abstract: On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology develop… ▽ More On August 9-10, 2023, a workshop was convened at the Pacific Northwest National Laboratory (PNNL) in Richland, WA that brought together a group of internationally recognized experts in metabolomics, natural products discovery, chemical ecology, chemical and biological threat assessment, cheminformatics, computational chemistry, cloud computing, artificial intelligence, and novel technology development. These experts were invited to assess the value and feasibility of a grand-scale project to create new technologies that would allow the identification and quantification of all small molecules, or to decode the molecular universe. The Decoding the Molecular Universe project would extend and complement the success of the Human Genome Project by develo** new capabilities and technologies to measure small molecules (defined as non-protein, non-polymer molecules less than 1500 Daltons) of any origin and generated in biological systems or produced abiotically. Workshop attendees 1) explored what new understanding of biological and environmental systems could be revealed through the lens of small molecules; 2) characterized the similarities in current needs and technical challenges between each science or mission area for unambiguous and comprehensive determination of the composition and quantities of small molecules of any sample; 3) determined the extent to which technologies or methods currently exist for unambiguously and comprehensively determining the small molecule composition of any sample and in a reasonable time; and 4) identified the attributes of the ideal technology or approach for universal small molecule measurement and identification. The workshop concluded with a discussion of how a project of this scale could be undertaken, possible thrusts for the project, early proof-of-principle applications, and similar efforts upon which the project could be modeled. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.11436 [pdf, other]

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

Authors: Sarah E. Harvey, Brett W. Larsen, Alex H. Williams

Abstract: A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit map**s between neural units to quantify similarity while accounting for e… ▽ More A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit map**s between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.09466 [pdf, other]

Soft Matching Distance: A metric on neural representations that captures single-neuron tuning

Authors: Meenakshi Khosla, Alex H. Williams

Abstract: Common measures of neural representational (dis)similarity are designed to be insensitive to rotations and reflections of the neural activation space. Motivated by the premise that the tuning of individual units may be important, there has been recent interest in develo** stricter notions of representational (dis)similarity that require neurons to be individually matched across networks. When tw… ▽ More Common measures of neural representational (dis)similarity are designed to be insensitive to rotations and reflections of the neural activation space. Motivated by the premise that the tuning of individual units may be important, there has been recent interest in develo** stricter notions of representational (dis)similarity that require neurons to be individually matched across networks. When two networks have the same size (i.e. same number of neurons), a distance metric can be formulated by optimizing over neuron index permutations to maximize tuning curve alignment. However, it is not clear how to generalize this metric to measure distances between networks with different sizes. Here, we leverage a connection to optimal transport theory to derive a natural generalization based on "soft" permutations. The resulting metric is symmetric, satisfies the triangle inequality, and can be interpreted as a Wasserstein distance between two empirical distributions. Further, our proposed metric avoids counter-intuitive outcomes suffered by alternative approaches, and captures complementary geometric insights into neural representations that are entirely missed by rotation-invariant metrics. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.06974 [pdf, other]

Generating Signed Permutations by Twisting Two-Sided Ribbons

Authors: Yuan, Qiu, Aaron Williams

Abstract: We provide a simple and natural solution to the problem of generating all $2^n \cdot n!$ signed permutations of $[n] = \{1,2,\ldots,n\}$. Our solution provides a pleasing generalization of the most famous ordering of permutations: plain changes (Steinhaus-Johnson-Trotter algorithm). In plain changes, the $n!$ permutations of $[n]$ are ordered so that successive permutations differ by swap** a pa… ▽ More We provide a simple and natural solution to the problem of generating all $2^n \cdot n!$ signed permutations of $[n] = \{1,2,\ldots,n\}$. Our solution provides a pleasing generalization of the most famous ordering of permutations: plain changes (Steinhaus-Johnson-Trotter algorithm). In plain changes, the $n!$ permutations of $[n]$ are ordered so that successive permutations differ by swap** a pair of adjacent symbols, and the order is often visualized as a weaving pattern involving $n$ ropes. Here we model a signed permutation using $n$ ribbons with two distinct sides, and each successive configuration is created by twisting (i.e., swap** and turning over) two neighboring ribbons or a single ribbon. By greedily prioritizing $2$-twists of the largest symbol before $1$-twists of the largest symbol, we create a signed version of plain change's memorable zig-zag pattern. We provide a loopless algorithm (i.e., worst-case $\mathcal{O}(1)$-time per object) by extending the well-known mixed-radix Gray code algorithm. △ Less

Submitted 14 June, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: 15 pages, 7 figures

MSC Class: 05A05 ACM Class: F.2.2; G.2.1

arXiv:2310.19320 [pdf, other]

A Near-Field Treatment of Aperture Synthesis Techniques using the Murchison Widefield Array

Authors: Steve Prabu, Steven J. Tingay, Andrew Williams

Abstract: Typical radio interferometer observations are performed assuming the source of radiation to be in the far-field of the instrument, resulting in a two-dimensional Fourier relationship between the observed visibilities in the aperture plane and the sky brightness distribution (over a small field of view). When near-field objects are present in an observation, the standard approach applies far-field… ▽ More Typical radio interferometer observations are performed assuming the source of radiation to be in the far-field of the instrument, resulting in a two-dimensional Fourier relationship between the observed visibilities in the aperture plane and the sky brightness distribution (over a small field of view). When near-field objects are present in an observation, the standard approach applies far-field delays during correlation, resulting in loss of signal coherence for the signal from the near-field object. In this paper, we demonstrate near-field aperture synthesis techniques using a Murchison Widefield Array observation of the International Space Station (ISS), as it appears as a bright near-field object. We perform visibility phase corrections to restore coherence across the array for the near-field object (however not restoring coherence losses due to time and frequency averaging at the correlator). We illustrate the impact of the near-field corrections in the aperture plane and the sky plane. The aperture plane curves to match the curvature of the near-field wavefront, and in the sky plane near-field corrections manifest as fringe rotations at different rates as we bring the focal point of the array from infinity to the desired near-field distance. We also demonstrate the inverse scenario of inferring the line-of-sight range of the ISS by inverting the apparent curvature of the wavefront seen by the aperture. We conclude the paper by briefly discussing the limitations of the methods developed and the near-field science cases where our approach can be exploited. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted in Publications of the Astronomical Society of Australia (PASA). 10 pages, 7 figures, and lots of linked animations

arXiv:2310.17514 [pdf, other]

The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks

Authors: Kaiser Sun, Adina Williams, Dieuwke Hupkes

Abstract: NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance. Questions remain, however, about how particular dataset design choices may impact the conclusions we draw about model capabilities. In this work, we investigate this question in the domain of compositional generalization. We examine the performance of six modeling approaches acr… ▽ More NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance. Questions remain, however, about how particular dataset design choices may impact the conclusions we draw about model capabilities. In this work, we investigate this question in the domain of compositional generalization. We examine the performance of six modeling approaches across 4 datasets, split according to 8 compositional splitting strategies, ranking models by 18 compositional generalization splits in total. Our results show that: i) the datasets, although all designed to evaluate compositional generalization, rank modeling approaches differently; ii) datasets generated by humans align better with each other than they with synthetic datasets, or than synthetic datasets among themselves; iii) generally, whether datasets are sampled from the same source is more predictive of the resulting model ranking than whether they maintain the same interpretation of compositionality; and iv) which lexical items are used in the data can strongly impact conclusions. Overall, our results demonstrate that much work remains to be done when it comes to assessing whether popular evaluation datasets measure what they intend to measure, and suggest that elucidating more rigorous standards for establishing the validity of evaluation sets could benefit the field. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: CoNLL2023

Showing 1–50 of 1,186 results for author: Williams, A