Skip to main content

Showing 1–50 of 120 results for author: Williams, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19470  [pdf, other

    cs.CL

    Changing Answer Order Can Decrease MMLU Accuracy

    Authors: Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, Megan Ung

    Abstract: As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities. Most commonly, we use test accuracy averaged across multiple subtasks in order to rank models on leaderboards, to determine which model is best for our purposes. In this paper, we investigate the robustness of the accurac… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Short paper, 9 pages

  2. arXiv:2406.11988  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Decomposed evaluations of geographic disparities in text-to-image models

    Authors: Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall

    Abstract: Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these dispa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.05183  [pdf, other

    cs.LG cs.AI cs.CL

    The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

    Authors: Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

    Abstract: Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorizatio… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures

  4. arXiv:2405.13099  [pdf, other

    cs.AI cs.SI

    The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach

    Authors: Mohsen Jozani, Jason A. Williams, Ahmed Aleroud, Sarbottam Bhagat

    Abstract: This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 37 pages, 15 figures

    ACM Class: H.4.3; I.2.7

  5. arXiv:2405.09464  [pdf, other

    quant-ph cs.PF

    Scalable Scheduling Policies for Quantum Satellite Networks

    Authors: Albert Williams, Nitish K. Panigrahy, Andrew McGregor, Don Towsley

    Abstract: As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission schedulin… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  6. arXiv:2405.04457  [pdf, other

    cs.CV cs.CY cs.HC

    Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

    Authors: Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

    Abstract: Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated met… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2404.16019  [pdf, other

    cs.CL

    The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

    Authors: Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

    Abstract: Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, t… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  8. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2404.06214  [pdf, other

    cs.CL

    [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

    Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

    Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  10. arXiv:2404.03814  [pdf, other

    cs.HC

    I Did Not Notice: A Comparison of Immersive Analytics with Augmented and Virtual Reality

    Authors: Xiaoyan Zhou, Anil Ufuk Batmaz, Adam S. Williams, Dylan Schreiber, Francisco Ortega

    Abstract: Immersive environments enable users to engage in embodied interaction, enhancing the sensemaking processes involved in completing tasks such as immersive analytics. Previous comparative studies on immersive analytics using augmented and virtual realities have revealed that users employ different strategies for data interpretation and text-based analytics depending on the environment. Our study see… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  11. arXiv:2403.17804  [pdf, other

    cs.CV cs.CL

    Improving Text-to-Image Consistency via Automatic Prompt Optimization

    Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

    Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  12. arXiv:2403.13858  [pdf, other

    physics.acc-ph cs.CV cs.LG

    A conditional latent autoregressive recurrent model for generation and forecasting of beam dynamics in particle accelerators

    Authors: Mahindra Rautela, Alan Williams, Alexander Scheinker

    Abstract: Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Re… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  13. arXiv:2403.12201  [pdf, other

    cs.AI cs.HC cs.LG

    Compositional learning of functions in humans and machines

    Authors: Yanli Zhou, Brenden M. Lake, Adina Williams

    Abstract: The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 7 pages, 6 figures

  14. arXiv:2401.14963  [pdf, other

    cs.DM

    On the Hardness of Gray Code Problems for Combinatorial Objects

    Authors: Arturo Merino, Namrata, Aaron Williams

    Abstract: Can a list of binary strings be ordered so that consecutive strings differ in a single bit? Can a list of permutations be ordered so that consecutive permutations differ by a swap? Can a list of non-crossing set partitions be ordered so that consecutive partitions differ by refinement? These are examples of Gray coding problems: Can a list of combinatorial objects (of a particular type and size) b… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 15 pages, 5 figures, WALCOM 2024

  15. arXiv:2401.12295  [pdf, other

    cs.CL

    Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

    Authors: Leonardo Castro-Gonzalez, Yi-Ling Chung, Hannak Rose Kirk, John Francis, Angus R. Williams, Pica Johansson, Jonathan Bright

    Abstract: The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this ar… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 39 pages, 10 figures, 6 tables

    ACM Class: I.2.7; J.4

  16. arXiv:2312.16052  [pdf, other

    math.CO cs.DM

    Pattern Avoidance for Fibonacci Sequences using $k$-Regular Words

    Authors: Emily Downing, Elizabeth Hartung, Aaron Williams

    Abstract: Two $k$-ary Fibonacci recurrences are $a_k(n) = a_k(n-1) + k \cdot a_k(n-2)$ and $b_k(n) = k \cdot b_k(n-1) + b_k(n-2)$. We provide a simple proof that $a_k(n)$ is the number of $k$-regular words over $[n] = \{1,2,\ldots,n\}$ that avoid patterns $\{121, 123, 132, 213\}$ when using base cases $a_k(0) = a_k(1) = 1$ for any $k \geq 1$. This was previously proven by Kuba and Panholzer in the context o… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 15 pages, submitted to special journal issue for Permutation Patterns 2023 (PP23) in DMTCS

    MSC Class: 05 (Primary) 68 (Secondary) ACM Class: G.2.1; G.4

  17. arXiv:2312.14069  [pdf, other

    cs.CL cs.SD eess.AS

    EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

    Authors: Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux

    Abstract: We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  18. arXiv:2311.18567  [pdf, other

    cs.CL

    Grammatical Gender's Influence on Distributional Semantics: A Causal Perspective

    Authors: Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell

    Abstract: How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  19. arXiv:2311.18140  [pdf, other

    cs.CL

    ROBBIE: Robust Bias Evaluation of Large Generative Language Models

    Authors: David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Michael Smith

    Abstract: As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensur… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  20. arXiv:2311.11436  [pdf, other

    stat.ML cs.LG

    Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

    Authors: Sarah E. Harvey, Brett W. Larsen, Alex H. Williams

    Abstract: A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit map**s between neural units to quantify similarity while accounting for e… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  21. arXiv:2311.09466  [pdf, other

    cs.LG cs.NE stat.ML

    Soft Matching Distance: A metric on neural representations that captures single-neuron tuning

    Authors: Meenakshi Khosla, Alex H. Williams

    Abstract: Common measures of neural representational (dis)similarity are designed to be insensitive to rotations and reflections of the neural activation space. Motivated by the premise that the tuning of individual units may be important, there has been recent interest in develo** stricter notions of representational (dis)similarity that require neurons to be individually matched across networks. When tw… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  22. arXiv:2311.06974  [pdf, other

    cs.DS

    Generating Signed Permutations by Twisting Two-Sided Ribbons

    Authors: Yuan, Qiu, Aaron Williams

    Abstract: We provide a simple and natural solution to the problem of generating all $2^n \cdot n!$ signed permutations of $[n] = \{1,2,\ldots,n\}$. Our solution provides a pleasing generalization of the most famous ordering of permutations: plain changes (Steinhaus-Johnson-Trotter algorithm). In plain changes, the $n!$ permutations of $[n]$ are ordered so that successive permutations differ by swap** a pa… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures

    MSC Class: 05A05 ACM Class: F.2.2; G.2.1

  23. arXiv:2310.17514  [pdf, other

    cs.CL

    The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks

    Authors: Kaiser Sun, Adina Williams, Dieuwke Hupkes

    Abstract: NLP models have progressed drastically in recent years, according to numerous datasets proposed to evaluate performance. Questions remain, however, about how particular dataset design choices may impact the conclusions we draw about model capabilities. In this work, we investigate this question in the domain of compositional generalization. We examine the performance of six modeling approaches acr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: CoNLL2023

  24. arXiv:2310.08278  [pdf, other

    cs.LG cs.AI

    Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

    Authors: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, Irina Rish

    Abstract: Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a… ▽ More

    Submitted 8 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: First two authors contributed equally. All data, models and code used are open-source. GitHub: https://github.com/time-series-foundation-models/lag-llama

  25. arXiv:2310.05742  [pdf, other

    stat.ML cs.LG q-bio.NC

    Estimating Shape Distances on Neural Representations with Limited Samples

    Authors: Dean A. Pospisil, Brett W. Larsen, Sarah E. Harvey, Alex H. Williams

    Abstract: Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning. Although many methods have been proposed, only a few works have rigorously analyzed their statistical efficiency or quantified estimator uncertainty in data-limited regimes. Here, we derive upper and lower bounds on the worst-case convergence of sta… ▽ More

    Submitted 9 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  26. arXiv:2310.01430  [pdf, other

    cs.CL cs.AI

    Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

    Authors: Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

    Abstract: The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression). With this work, we aim to perform a rigorous benchmarking of the MUStARD++ dataset by considering state-… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  27. arXiv:2309.16703  [pdf, ps, other

    cs.CR

    Incompatibilities Between Current Practices in Statistical Data Analysis and Differential Privacy

    Authors: Joshua Snoke, Claire McKay Bowen, Aaron R. Williams, Andrés F. Barrientos

    Abstract: The authors discuss their experience applying differential privacy with a complex data set with the goal of enabling standard approaches to statistical data analysis. They highlight lessons learned and roadblocks encountered, distilling them into incompatibilities between current practices in statistical data analysis and differential privacy that go beyond issues which can be solved with a noisy… ▽ More

    Submitted 16 August, 2023; originally announced September 2023.

    Comments: 8 pages, no figures or tables

  28. arXiv:2309.02539  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

    Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More

    Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

  29. arXiv:2308.16871  [pdf, other

    cs.CL cs.AI

    The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

    Authors: Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà

    Abstract: Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting wil… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 15 pages

  30. arXiv:2308.12405  [pdf, other

    math.CO cs.DM

    Concatenation trees: A framework for efficient universal cycle and de Bruijn sequence constructions

    Authors: J. Sawada, J. Sears, A. Trautrim, A. Williams

    Abstract: Classic cycle-joining techniques have found widespread application in creating universal cycles for a diverse range of combinatorial objects, such as shorthand permutations, weak orders, orientable sequences, and various subsets of $k$-ary strings, including de Bruijn sequences. In the most favorable scenarios, these algorithms operate with a space complexity of $O(n)$ and require $O(n)$ time to g… ▽ More

    Submitted 25 November, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

  31. arXiv:2308.06198  [pdf, other

    cs.CV cs.HC

    DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

    Authors: Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano

    Abstract: The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects f… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

  32. arXiv:2307.16811  [pdf, other

    cs.CL cs.CY

    DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

    Authors: Angus R. Williams, Hannah Rose Kirk, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale

    Abstract: Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of… ▽ More

    Submitted 25 April, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: 15 pages, 7 figures, 4 tables

  33. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  34. arXiv:2307.06951   

    cs.AI cs.LG

    AI For Global Climate Cooperation 2023 Competition Proceedings

    Authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng

    Abstract: The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agree… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  35. arXiv:2307.05775  [pdf, other

    cs.LG cs.SI

    Weisfeiler and Leman Go Measurement Modeling: Probing the Validity of the WL Test

    Authors: Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun

    Abstract: The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Leman ($k$-WL) test. In this paper, we uncover misalignments between graph machine learning practitioners' conceptualizations of expressive power and $k$-WL through a sy… ▽ More

    Submitted 31 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  36. arXiv:2301.11796  [pdf, other

    cs.CL

    Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

    Authors: Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

    Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  37. arXiv:2212.08979  [pdf, other

    cs.CL cs.LG

    Language model acceptability judgements are not always robust to context

    Authors: Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams

    Abstract: Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly con… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

  38. arXiv:2211.11665  [pdf, other

    cs.LG q-bio.NC

    Representational dissimilarity metric spaces for stochastic neural networks

    Authors: Lyndon R. Duong, **gyang Zhou, Josue Nassar, Jules Berman, Jeroen Olieslagers, Alex H. Williams

    Abstract: Quantifying similarity between neural representations -- e.g. hidden layer activation vectors -- is a perennial problem in deep learning and neuroscience research. Existing methods compare deterministic responses (e.g. artificial networks that lack stochastic layers) or averaged responses (e.g., trial-averaged firing rates in biological data). However, these measures of _deterministic_ representat… ▽ More

    Submitted 3 February, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Published as a conference paper at ICLR 2023

    Journal ref: International Conference on Learning Representations 2023

  39. Incorporating Crowdsourced Annotator Distributions into Ensemble Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri

    Authors: Graham West, Matthew I. Swindall, Ben Keener, Timothy Player, Alex C. Williams, James H. Brusuelas, John F. Wallin

    Abstract: Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues… ▽ More

    Submitted 26 January, 2024; v1 submitted 28 October, 2022; originally announced October 2022.

    Journal ref: Journal of Data Mining & Digital Humanities, Historical Documents and automatic text recognition, Digital humanities in languages (February 7, 2024) jdmdh:10297

  40. arXiv:2210.12574  [pdf, other

    cs.CL cs.LG

    The Curious Case of Absolute Position Embeddings

    Authors: Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau, Dieuwke Hupkes, Adina Williams

    Abstract: Transformer language models encode the notion of word order using positional information. Most commonly, this positional information is represented by absolute position embeddings (APEs), that are learned from the pretraining data. However, in natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 Findings; 5 pages and 15 pages Appendix

  41. arXiv:2209.04732  [pdf

    cs.DB cs.AI

    Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

    Authors: Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner Jr., Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne-Davies, James A. Feinstein, Melissa A. Haendel, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. Taneja, Katy E. Trinkley, Nicole A. Vasilevsky, Andrew Williams, Xingman A. Zhang , et al. (7 additional authors not shown)

    Abstract: Background: Common data models solve many challenges of standardizing electronic health record (EHR) data, but are unable to semantically integrate all the resources needed for deep phenoty**. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, map** EHR data to OB… ▽ More

    Submitted 30 January, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

    Comments: Supplementary Material is included at the end of the manuscript

    ACM Class: J.3

  42. arXiv:2208.08195  [pdf, other

    cs.CL

    Benchmarking Compositionality with Formal Languages

    Authors: Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell

    Abstract: Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Published at COLING 2022. This version fixes a mistake in Figure 4 and adds a clarifying note in teal. Code is available at https://github.com/valvoda/neuralTransducer

  43. arXiv:2208.07004  [pdf, other

    cs.LG cs.MA

    AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

    Authors: Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng

    Abstract: Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: 12 pages (21 with appendices), 5 figures. For associated working group, see https://www.ai4climatecoop.org/

    MSC Class: 93A16; 91-10; 68T07 ACM Class: I.2.11; J.2; J.4

  44. Eliciting Multimodal Gesture+Speech Interactions in a Multi-Object Augmented Reality Environment

    Authors: Xiaoyan Zhou, Adam S. Williams, Francisco R. Ortega

    Abstract: As augmented reality technology and hardware become more mature and affordable, researchers have been exploring more intuitive and discoverable interaction techniques for immersive environments. In this paper, we investigate multimodal interaction for 3D object manipulation in a multi-object virtual environment. To identify the user-defined gestures, we conducted an elicitation study involving 24… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  45. arXiv:2207.12558  [pdf, other

    cs.HC

    A Pilot Study on The Impact of Stereoscopic Display Type on User Interactions Within A Immersive Analytics Environment

    Authors: Adam S. Williams, Xiaoyan Zhou, Michel Pahud, Francisco R. Ortega

    Abstract: Immersive Analytics (IA) and consumer adoption of augmented reality (AR) and virtual reality (VR) head-mounted displays (HMDs) are both rapidly growing. When used in conjunction, stereoscopic IA environments can offer improved user understanding and engagement; however, it is unclear how the choice of stereoscopic display impacts user interactions within an IA environment. This paper presents a pi… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  46. arXiv:2207.10062  [pdf, other

    cs.LG

    DataPerf: Benchmarks for Data-Centric AI Development

    Authors: Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman , et al. (20 additional authors not shown)

    Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing datase… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  47. arXiv:2205.12586  [pdf, other

    cs.CL cs.AI

    Perturbation Augmentation for Fairer NLP

    Authors: Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams

    Abstract: Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i)… ▽ More

    Submitted 12 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  48. arXiv:2205.09209  [pdf, other

    cs.CL cs.CY

    "I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset

    Authors: Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, Adina Williams

    Abstract: As language models grow in popularity, it becomes increasingly important to clearly measure all possible markers of demographic identity in order to avoid perpetuating existing societal harms. Many datasets for measuring bias currently exist, but they are restricted in their coverage of demographic axes and are commonly used with preset bias tests that presuppose which types of biases models can e… ▽ More

    Submitted 27 October, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  49. arXiv:2205.04616  [pdf, other

    cs.LG stat.AP

    Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach

    Authors: Allen R. Williams, Yoolim **, Anthony Duer, Tuka Alhanai, Mohammad Ghassemi

    Abstract: In recent years it has become possible to collect GPS data from drivers and to incorporate this data into automobile insurance pricing for the driver. This data is continuously collected and processed nightly into metadata consisting of mileage and time summaries of each discrete trip taken, and a set of behavioral scores describing attributes of the trip (e.g, driver fatigue or driver distraction… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  50. arXiv:2205.00666  [pdf, other

    cs.CY econ.GN

    (Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment

    Authors: Yoshua Bengio, Prateek Gupta, Dylan Radovic, Maarten Scholl, Andrew Williams, Christian Schroeder de Witt, Tianyu Zhang, Yang Zhang

    Abstract: Insufficient Social Cost of Carbon (SCC) estimation methods and short-term decision-making horizons have hindered the ability of carbon emitters to properly correct for the negative externalities of climate change, as well as the capacity of nations to balance economic and climate policy. To overcome these limitations, we introduce Retrospective Social Cost of Carbon Updating (ReSCCU), a novel mec… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    MSC Class: 91B18 (Primary) 91B76; 91G40 (Secondary) ACM Class: J.4