-
A Closer Look into Mixture-of-Experts in Large Language Models
Authors:
Ka Man Lo,
Zeyu Huang,
Zihan Qiu,
Zili Wang,
Jie Fu
Abstract:
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechani…
▽ More
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Authors:
Shayne Longpre,
Stella Biderman,
Alon Albalak,
Hailey Schoelkopf,
Daniel McDuff,
Sayash Kapoor,
Kevin Klyman,
Kyle Lo,
Gabriel Ilharco,
Nay San,
Maribeth Rauh,
Aviya Skowron,
Bertie Vidgen,
Laura Weidinger,
Arvind Narayanan,
Victor Sanh,
David Adelani,
Percy Liang,
Rishi Bommasani,
Peter Henderson,
Sasha Luccioni,
Yacine Jernite,
Luca Soldaini
Abstract:
Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation,…
▽ More
Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. We hope this curated collection of resources helps guide more responsible development. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed so that capabilities and impact are assessed in context.
△ Less
Submitted 25 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
One Thousand and One Pairs: A "novel" challenge for long-context language models
Authors:
Marzena Karpinska,
Katherine Thai,
Kyle Lo,
Tanya Goyal,
Mohit Iyyer
Abstract:
Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, wr…
▽ More
Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, written by human readers of those books. In contrast to existing long-context benchmarks, our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify. Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%. Further analysis reveals that (1) on average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning; (2) model-generated explanations for their decisions are often inaccurate even for correctly-labeled claims; and (3) models perform substantially worse on speculative fiction books that contain extensive world-building. The methodology proposed in NoCha allows for the evolution of the benchmark dataset and the easy analysis of future models.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
DataComp-LM: In search of the next generation of training sets for language models
Authors:
Jeffrey Li,
Alex Fang,
Georgios Smyrnis,
Maor Ivgi,
Matt Jordan,
Samir Gadre,
Hritik Bansal,
Etash Guha,
Sedrick Keh,
Kushal Arora,
Saurabh Garg,
Rui Xin,
Niklas Muennighoff,
Reinhard Heckel,
Jean Mercat,
Mayee Chen,
Suchin Gururangan,
Mitchell Wortsman,
Alon Albalak,
Yonatan Bitton,
Marianna Nezhurina,
Amro Abbas,
Cheng-Yu Hsieh,
Dhruba Ghosh,
Josh Gardner
, et al. (34 additional authors not shown)
Abstract:
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat…
▽ More
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.
△ Less
Submitted 20 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
Authors:
David Wadden,
Kejian Shi,
Jacob Morrison,
Aakanksha Naik,
Shruti Singh,
Nitzan Barzilay,
Kyle Lo,
Tom Hope,
Luca Soldaini,
Shannon Zejiang Shen,
Doug Downey,
Hannaneh Hajishirzi,
Arman Cohan
Abstract:
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed t…
▽ More
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research.
△ Less
Submitted 18 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
lenscat: a Public and Community-Contributed Catalog of Known Strong Gravitational Lenses
Authors:
L. Vujeva,
R. K. L. Lo,
J. M. Ezquiaga,
J. C. L. Chan
Abstract:
We present lenscat, a public and community-contributed catalog of strong gravitational lenses found by electromagnetic surveys. The main objective of lenscat is to compile a simple, easy-to-access catalog that can be used in a variety of lensing studies, such as facilitating the search for the host galaxy of a candidate strongly lensed transient event. We also provide a python package to interact…
▽ More
We present lenscat, a public and community-contributed catalog of strong gravitational lenses found by electromagnetic surveys. The main objective of lenscat is to compile a simple, easy-to-access catalog that can be used in a variety of lensing studies, such as facilitating the search for the host galaxy of a candidate strongly lensed transient event. We also provide a python package to interact with tools commonly used by the community. This allows end users both with and without lensing expertise to obtain a list of known strong lenses within a given search area, and to also rank them by their respective searched probabilities. Here, we exemplify this by crossmatching the gravitational wave joint sky localization region of an interesting pair of events GW170104-GW170814. Other examples with short gamma-ray bursts are given. Thanks to the open and simple infrastructure of lenscat, members of the lensing community can directly add newly found lenses from their own studies to help create a long-lasting catalog that is as exhaustive and accessible as possible.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Determining state space anomalies in mean field games
Authors:
Hongyu Liu,
Catharine W. K. Lo
Abstract:
In this paper, we are concerned with the inverse problem of determining anomalies in the state space associated with the stationary mean field game (MFG) system. We establish novel unique identifiability results for the intrinsic structure of these anomalies in mean field games systems, including their topological structure and parameter configurations, in several general scenarios of practical in…
▽ More
In this paper, we are concerned with the inverse problem of determining anomalies in the state space associated with the stationary mean field game (MFG) system. We establish novel unique identifiability results for the intrinsic structure of these anomalies in mean field games systems, including their topological structure and parameter configurations, in several general scenarios of practical interest, including traffic flow, market economics and epidemics. To the best of our knowledge, this is the first work that considers anomalies in the state space for the nonlinear coupled MFG system.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Decoding a mean field game by the Cauchy data around its unknown stationary states
Authors:
Hongyu Liu,
Catharine W. K. Lo,
Shen Zhang
Abstract:
In recent years, mean field games (MFGs) have garnered considerable attention and emerged as a dynamic and actively researched field across various domains, including economics, social sciences, finance, and transportation. The inverse design and decoding of MFGs offer valuable means to extract information from observed data and gain insights into the intricate underlying dynamics and strategies o…
▽ More
In recent years, mean field games (MFGs) have garnered considerable attention and emerged as a dynamic and actively researched field across various domains, including economics, social sciences, finance, and transportation. The inverse design and decoding of MFGs offer valuable means to extract information from observed data and gain insights into the intricate underlying dynamics and strategies of these complex physical systems. This paper presents a novel approach to the study of inverse problems in MFGs by analyzing the Cauchy data around their unknown stationary states. This study distinguishes itself from existing inverse problem investigations in three key significant aspects: Firstly, we consider MFG problems in a highly general form. Secondly, we address the technical challenge of the probability measure constraint by utilizing Cauchy data in our inverse problem study. Thirdly, we enhance existing high order linearization methods by introducing a novel approach that involves conducting linearization around non-trivial stationary states of the MFG system, which are not a-priori known. These contributions provide new insights and offer promising avenues for studying inverse problems for MFGs. By unraveling the hidden structure of MFGs, researchers and practitioners can make informed decisions, optimize system performance, and address real-world challenges more effectively.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
JUNO Sensitivity to Invisible Decay Modes of Neutrons
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli,
Daniel Bick
, et al. (635 additional authors not shown)
Abstract:
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode…
▽ More
We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
On the Obstacle Problem in Fractional Generalised Orlicz Spaces
Authors:
Catharine W. K. Lo,
José Francisco Rodrigues
Abstract:
We consider the one and the two obstacles problems for the nonlocal nonlinear anisotropic $g$-Laplacian $\mathcal{L}_g^s$, with $0<s<1$. We prove the strict T-monotonicity of $\mathcal{L}_g^s$ and we obtain the Lewy-Stampacchia inequalities. We consider the approximation of the solutions through semilinear problems, for which we prove a global $L^\infty$-estimate, and we extend the local Hölder re…
▽ More
We consider the one and the two obstacles problems for the nonlocal nonlinear anisotropic $g$-Laplacian $\mathcal{L}_g^s$, with $0<s<1$. We prove the strict T-monotonicity of $\mathcal{L}_g^s$ and we obtain the Lewy-Stampacchia inequalities. We consider the approximation of the solutions through semilinear problems, for which we prove a global $L^\infty$-estimate, and we extend the local Hölder regularity to the solutions of the obstacle problems in the case of the fractional $p(x,y)$-Laplacian operator. We make further remarks on a few elementary properties of related capacities in the fractional generalised Orlicz framework, with a special reference to the Hilbertian nonlinear case in fractional Sobolev spaces.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
Authors:
Yue Liu,
Sin Kit Lo,
Qinghua Lu,
Liming Zhu,
Dehai Zhao,
Xiwei Xu,
Stefan Harrer,
Jon Whittle
Abstract:
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking…
▽ More
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 17 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
△ Less
Submitted 24 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
On inverse problems in multi-population aggregation models
Authors:
Yuhan Li,
Hongyu Liu,
Catharine W. K. Lo
Abstract:
This paper focuses on inverse problems arising in studying multi-population aggregations. The goal is to reconstruct the diffusion coefficient, advection coefficient, and interaction kernels of the aggregation system, which characterize the dynamics of different populations. In the theoretical analysis of the physical setup, it is crucial to ensure non-negativity of solutions. To address this, we…
▽ More
This paper focuses on inverse problems arising in studying multi-population aggregations. The goal is to reconstruct the diffusion coefficient, advection coefficient, and interaction kernels of the aggregation system, which characterize the dynamics of different populations. In the theoretical analysis of the physical setup, it is crucial to ensure non-negativity of solutions. To address this, we employ the high-order variation method and introduce modifications to the systems. Additionally, we propose a novel approach called transformative asymptotic technique that enables the recovery of the diffusion coefficient preceding the Laplace operator, presenting a pioneering method for this type of problems. Through these techniques, we offer comprehensive insights into the unique identifiability aspect of inverse problems associated with multi-population aggregation models.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
Authors:
Kyle Shih-Huang Lo,
Jörg Peters,
Eric Spellman
Abstract:
Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings. Repairing sparse points can enhance low-cost sensor use and reduce UAV flight overlap. RoofDiffusion is a new end-to-end self-supervised diffusion technique for robustly completing, in particular difficult, roof height maps. RoofDiffusion leverages widely-available curated footprints and…
▽ More
Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings. Repairing sparse points can enhance low-cost sensor use and reduce UAV flight overlap. RoofDiffusion is a new end-to-end self-supervised diffusion technique for robustly completing, in particular difficult, roof height maps. RoofDiffusion leverages widely-available curated footprints and can so handle up to 99\% point sparsity and 80\% roof area occlusion (regional incompleteness). A variant, No-FP RoofDiffusion, simultaneously predicts building footprints and heights. Both quantitatively outperform state-of-the-art unguided depth completion and representative inpainting methods for Digital Elevation Models (DEM), on both a roof-specific benchmark and the BuildingNet dataset. Qualitative assessments show the effectiveness of RoofDiffusion for datasets with real-world scans including AHN3, Dales3D, and USGS 3DEP LiDAR. Tested with the leading City3D algorithm, preprocessing height maps with RoofDiffusion noticeably improves 3D building reconstruction. RoofDiffusion is complemented by a new dataset of 13k complex roof geometries, focusing on long-tail issues in remote sensing; a novel simulation of tree occlusion; and a wide variety of large-area roof cut-outs for data augmentation and benchmarking.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
MuPT: A Generative Symbolic Music Pretrained Transformer
Authors:
Xingwei Qu,
Yuelin Bai,
Yinghao Ma,
Ziya Zhou,
Ka Man Lo,
Jiaheng Liu,
Ruibin Yuan,
Lejun Min,
Xueling Liu,
Tianyu Zhang,
Xinrun Du,
Shuyue Guo,
Yiming Liang,
Yizhi Li,
Shangda Wu,
Junting Zhou,
Tianyu Zheng,
Ziyang Ma,
Fengze Han,
Wei Xue,
Gus Xia,
Emmanouil Benetos,
Xiang Yue,
Chenghua Lin,
Xu Tan
, et al. (4 additional authors not shown)
Abstract:
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal…
▽ More
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.
△ Less
Submitted 10 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
FABLES: Evaluating faithfulness and content selection in book-length summarization
Authors:
Yekyung Kim,
Yapei Chang,
Marzena Karpinska,
Aparna Garimella,
Varun Manjunatha,
Kyle Lo,
Tanya Goyal,
Mohit Iyyer
Abstract:
While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study miti…
▽ More
While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study mitigates the issue of data contamination by focusing on summaries of books published in 2023 or 2024, and we hire annotators who have fully read each book prior to the annotation task to minimize cost and cognitive burden. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD, which allows us to rank LLM summarizers based on faithfulness: Claude-3-Opus significantly outperforms all closed-source LLMs, while the open-source Mixtral is on par with GPT-3.5-Turbo. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate. While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims. Our experiments suggest that detecting unfaithful claims is an important future direction not only for summarization evaluation but also as a testbed for long-context understanding. Finally, we move beyond faithfulness by exploring content selection errors in book-length summarization: we develop a typology of omission errors related to crucial narrative elements and also identify a systematic over-emphasis on events occurring towards the end of the book.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Authors:
Orion Weller,
Benjamin Chang,
Sean MacAvaney,
Kyle Lo,
Arman Cohan,
Benjamin Van Durme,
Dawn Lawrie,
Luca Soldaini
Abstract:
Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, w…
▽ More
Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, we study the use of instructions in IR systems. First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for hel** IR models learn to better follow real-world instructions. FollowIR repurposes detailed instructions -- also known as narratives -- developed for professional assessors to evaluate retrieval systems. In particular, we build our benchmark from three collections curated for shared tasks at the Text REtrieval Conference (TREC). These collections contains hundreds to thousands of labeled documents per query, making them suitable for our exploration. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements after fine-tuning on our training set.
△ Less
Submitted 7 May, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Loop Improvement: An Efficient Approach for Extracting Shared Features from Heterogeneous Data without Central Server
Authors:
Fei Li,
Chu Kiong Loo,
Wei Shiung Liew,
Xiaofeng Liu
Abstract:
In federated learning, data heterogeneity significantly impacts performance. A typical solution involves segregating these parameters into shared and personalized components, a concept also relevant in multi-task learning. Addressing this, we propose "Loop Improvement" (LI), a novel method enhancing this separation and feature extraction without necessitating a central server or data interchange a…
▽ More
In federated learning, data heterogeneity significantly impacts performance. A typical solution involves segregating these parameters into shared and personalized components, a concept also relevant in multi-task learning. Addressing this, we propose "Loop Improvement" (LI), a novel method enhancing this separation and feature extraction without necessitating a central server or data interchange among participants. Our experiments reveal LI's superiority in several aspects: In personalized federated learning environments, LI consistently outperforms the advanced FedALA algorithm in accuracy across diverse scenarios. Additionally, LI's feature extractor closely matches the performance achieved when aggregating data from all clients. In global model contexts, employing LI with stacked personalized layers and an additional network also yields comparable results to combined client data scenarios. Furthermore, LI's adaptability extends to multi-task learning, streamlining the extraction of common features across tasks and obviating the need for simultaneous training. This approach not only enhances individual task performance but also achieves accuracy levels on par with classic multi-task learning methods where all tasks are trained simultaneously. LI integrates a loop topology with layer-wise and end-to-end training, compatible with various neural network models. This paper also delves into the theoretical underpinnings of LI's effectiveness, offering insights into its potential applications. The code is on https://github.com/axedge1983/LI
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Know Your Audience: The benefits and pitfalls of generating plain language summaries beyond the "general" audience
Authors:
Tal August,
Kyle Lo,
Noah A. Smith,
Katharina Reinecke
Abstract:
Language models (LMs) show promise as tools for communicating science to the general public by simplifying and summarizing complex language. Because models can be prompted to generate text for a specific audience (e.g., college-educated adults), LMs might be used to create multiple versions of plain language summaries for people with different familiarities of scientific topics. However, it is not…
▽ More
Language models (LMs) show promise as tools for communicating science to the general public by simplifying and summarizing complex language. Because models can be prompted to generate text for a specific audience (e.g., college-educated adults), LMs might be used to create multiple versions of plain language summaries for people with different familiarities of scientific topics. However, it is not clear what the benefits and pitfalls of adaptive plain language are. When is simplifying necessary, what are the costs in doing so, and do these costs differ for readers with different background knowledge? Through three within-subjects studies in which we surface summaries for different envisioned audiences to participants of different backgrounds, we found that while simpler text led to the best reading experience for readers with little to no familiarity in a topic, high familiarity readers tended to ignore certain details in overly plain summaries (e.g., study limitations). Our work provides methods and guidance on ways of adapting plain language summaries beyond the single "general" audience.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Authors:
Fangyuan Xu,
Kyle Lo,
Luca Soldaini,
Bailey Kuehl,
Eunsol Choi,
David Wadden
Abstract:
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientifi…
▽ More
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain. Given a research question, an initial model-generated answer and a set of relevant papers, an expert annotator iteratively issues instructions for the model to revise and improve its answer. We collect 1,260 interaction turns from 234 interaction sessions with three state-of-the-art LLMs. Each turn includes a user instruction, a model response, and a human evaluation of the model response. Through a detailed analysis of the collected responses, we find that all models struggle to incorporate new information into an existing answer, and to perform precise and unambiguous edits. Further, we find that models struggle to judge whether their outputs successfully followed user instructions, with accuracy at least 10 points short of human agreement. Our findings indicate that KIWI will be a valuable resource to measure progress and improve LLMs' instruction-following capabilities for knowledge intensive writing tasks.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
On the Stability of the $s$-Nonlocal $p$-Obstacle Problem and their Coincidence Sets and Free Boundaries
Authors:
Catharine W. K. Lo,
José Francisco Rodrigues
Abstract:
We show that the solutions to the nonlocal obstacle problems for the nonlocal $-Δ_p^s$ operator, when the fractional parameter $s\toσ$ for $0<σ\leq1$, converge to the solution of the corresponding obstacle problem for $-Δ_p^σ$, being $σ=1$ the classical obstacle problem for the local $p$-Laplacian. We discuss the weak stability of the quasi-characteristic functions of coincidence sets of the solut…
▽ More
We show that the solutions to the nonlocal obstacle problems for the nonlocal $-Δ_p^s$ operator, when the fractional parameter $s\toσ$ for $0<σ\leq1$, converge to the solution of the corresponding obstacle problem for $-Δ_p^σ$, being $σ=1$ the classical obstacle problem for the local $p$-Laplacian. We discuss the weak stability of the quasi-characteristic functions of coincidence sets of the solution with the obstacle, which is a strong convergence of their characteristic functions when $s\nearrow 1$ under a nondegeneracy condition. This stability can be shown also in terms of the convergence of the free boundaries, as well as of the coincidence sets, in Hausdorff distance when $s\nearrow 1$, under non-degeneracy local assumptions on the external force and a local topological property of the coincidence set of the limit classical obstacle problem for the local $p$-Laplacian, essentially when the limit coincidence set is the closure of its interior.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
Authors:
Ka Man Lo,
Yiming Liang,
Wenyu Du,
Yuantao Fan,
Zili Wang,
Wenhao Huang,
Lei Ma,
Jie Fu
Abstract:
Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse…
▽ More
Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse knowledge. Nevertheless, conventional knowledge distillation approaches are not tailored to modular models and struggle with unique architectures and enormous parameter counts. Motivated by these challenges, we propose module-to-module knowledge distillation (m2mKD) for transferring knowledge between modules. m2mKD combines teacher modules of a pretrained monolithic model and student modules of a modular model with a shared meta model respectively to encourage the student module to mimic the behaviour of the teacher module. We evaluate m2mKD on two modular neural architectures: Neural Attentive Circuits (NACs) and Vision Mixture-of-Experts (V-MoE). Applying m2mKD to NACs yields significant improvements in IID accuracy on Tiny-ImageNet (up to 5.6%) and OOD robustness on Tiny-ImageNet-R (up to 4.2%). Additionally, the V-MoE-Base model trained with m2mKD achieves 3.5% higher accuracy than end-to-end training on ImageNet-1k. Code is available at https://github.com/kamanphoebe/m2mKD.
△ Less
Submitted 7 July, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Developments in Sheaf-Theoretic Models of Natural Language Ambiguities
Authors:
Kin Ian Lo,
Mehrnoosh Sadrzadeh,
Shane Mansfield
Abstract:
Sheaves are mathematical objects consisting of a base which constitutes a topological space and the data associated with each open set thereof, e.g. continuous functions defined on the open sets. Sheaves have originally been used in algebraic topology and logic. Recently, they have also modelled events such as physical experiments and natural language disambiguation processes. We extend the latter…
▽ More
Sheaves are mathematical objects consisting of a base which constitutes a topological space and the data associated with each open set thereof, e.g. continuous functions defined on the open sets. Sheaves have originally been used in algebraic topology and logic. Recently, they have also modelled events such as physical experiments and natural language disambiguation processes. We extend the latter models from lexical ambiguities to discourse ambiguities arising from anaphora. To begin, we calculated a new measure of contextuality for a dataset of basic anaphoric discourses, resulting in a higher proportion of contextual models--82.9%--compared to previous work which only yielded 3.17% contextual models. Then, we show how an extension of the natural language processing challenge, known as the Winograd Schema, which involves anaphoric ambiguities can be modelled on the Bell-CHSH scenario with a contextual fraction of 0.096.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
OLMo: Accelerating the Science of Language Models
Authors:
Dirk Groeneveld,
Iz Beltagy,
Pete Walsh,
Akshita Bhagia,
Rodney Kinney,
Oyvind Tafjord,
Ananya Harsh Jha,
Hamish Ivison,
Ian Magnusson,
Yizhong Wang,
Shane Arora,
David Atkinson,
Russell Authur,
Khyathi Raghavi Chandu,
Arman Cohan,
Jennifer Dumas,
Yanai Elazar,
Yuling Gu,
Jack Hessel,
Tushar Khot,
William Merrill,
Jacob Morrison,
Niklas Muennighoff,
Aakanksha Naik,
Crystal Nam
, et al. (18 additional authors not shown)
Abstract:
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models…
▽ More
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.
△ Less
Submitted 7 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Authors:
Luca Soldaini,
Rodney Kinney,
Akshita Bhagia,
Dustin Schwenk,
David Atkinson,
Russell Authur,
Ben Bogin,
Khyathi Chandu,
Jennifer Dumas,
Yanai Elazar,
Valentin Hofmann,
Ananya Harsh Jha,
Sachin Kumar,
Li Lucy,
Xinxi Lyu,
Nathan Lambert,
Ian Magnusson,
Jacob Morrison,
Niklas Muennighoff,
Aakanksha Naik,
Crystal Nam,
Matthew E. Peters,
Abhilasha Ravichander,
Kyle Richardson,
Zejiang Shen
, et al. (11 additional authors not shown)
Abstract:
Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat…
▽ More
Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation.
△ Less
Submitted 6 June, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Authors:
Jan Trienes,
Sebastian Joseph,
Jörg Schlötterer,
Christin Seifert,
Kyle Lo,
Wei Xu,
Byron C. Wallace,
Junyi Jessy Li
Abstract:
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their…
▽ More
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
△ Less
Submitted 4 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Paloma: A Benchmark for Evaluating Language Model Fit
Authors:
Ian Magnusson,
Akshita Bhagia,
Valentin Hofmann,
Luca Soldaini,
Ananya Harsh Jha,
Oyvind Tafjord,
Dustin Schwenk,
Evan Pete Walsh,
Yanai Elazar,
Kyle Lo,
Dirk Groeneveld,
Iz Beltagy,
Hannaneh Hajishirzi,
Noah A. Smith,
Kyle Richardson,
Jesse Dodge
Abstract:
Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com…
▽ More
Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com to r/depression on Reddit. We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining. Submissions can also record parameter and training token count to make comparisons of Pareto efficiency for performance as a function of these measures of cost. We populate our benchmark with results from 6 baselines pretrained on popular corpora. In case studies, we demonstrate analyses that are possible with Paloma, such as finding that pretraining without data beyond Common Crawl leads to inconsistent fit to many domains.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
On inverse problems in predator-prey models
Authors:
Yuhan Li,
Hongyu Liu,
Catharine W. K. Lo
Abstract:
In this paper, we consider the inverse problem of determining the coefficients of interaction terms within some Lotka-Volterra models, with support from boundary observation of its non-negative solutions. In the physical background, the solutions to the predator-prey model stand for the population densities for predator and prey and are non-negative, which is a critical challenge in our inverse pr…
▽ More
In this paper, we consider the inverse problem of determining the coefficients of interaction terms within some Lotka-Volterra models, with support from boundary observation of its non-negative solutions. In the physical background, the solutions to the predator-prey model stand for the population densities for predator and prey and are non-negative, which is a critical challenge in our inverse problem study. We mainly focus on the unique identifiability issue and tackle it with the high-order variation method, a relatively new technique introduced by the second author and his collaborators. This method can ensure the positivity of solutions and has broader applicability in other physical models with non-negativity requirements. Our study improves this method by choosing a more general solution $(u_0,v_0)$ to expand around, achieving recovery for all interaction terms. By this means, we improve on the previous results and apply this to physical models to recover coefficients concerning compression, prey attack, crowding, carrying capacity, and many other interaction factors in the system. Finally, we apply our results to study three specific cases: the hydra-effects model, the Holling-Tanner model and the classic Lotka-Volterra model.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Evidence for the novel type of orbital Fulde-Ferrell-Larkin-Ovchinnikov state in the bulk limit of 2H-NbSe2
Authors:
Chang-woo Cho,
Kwan To Lo,
Cheuk Yin Ng,
Timothée T. Lortz,
Abdel Rahman Allan,
Mahmoud Abdel-Hafiez,
Jaemun Park,
Beopgil Cho,
Keeseong Park,
Rolf Lortz
Abstract:
The Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, an unusual superconducting state, defies high magnetic fields beyond the Pauli paramagnetic limit. It exhibits a spatial modulation of the superconducting order parameter in real space and is exceptionally rare. Recently, an even more exotic variant - the orbital FFLO state - was predicted and identified in the transition metal dichalcogenide supe…
▽ More
The Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, an unusual superconducting state, defies high magnetic fields beyond the Pauli paramagnetic limit. It exhibits a spatial modulation of the superconducting order parameter in real space and is exceptionally rare. Recently, an even more exotic variant - the orbital FFLO state - was predicted and identified in the transition metal dichalcogenide superconductor 2H-NbSe2. This state emerges in thin samples with thicknesses below ~40 nm, at the boundary between two and three dimensions. The complex interplay between Ising spin orbit coupling and the Pauli paramagnetic effect can lead to a stabilization of the FFLO state in a relatively large range of the magnetic phase diagram, even well below the Pauli limit. In this study, we present experimental evidence of the formation of this orbital FFLO state in bulk 2H-NbSe2 samples. This evidence was obtained using high-resolution DC magnetization and magnetic torque experiments in magnetic fields applied strictly parallel to the NbSe2 basal plane. Both quantities display a crossover to a discontinuous first-order superconducting transition at the normal state boundary in magnetic fields of 4 T and above. This is usually seen as a sign that Pauli paramagnetic pair breaking effects affect the superconducting state. The magnetic torque reveals a small step-like reversible anomaly, indicating a magnetic field-induced thermodynamic phase transition within the superconducting state. This anomaly bears many similarities to the FFLO transitions in other FFLO superconductors, suggesting the potential existence of an orbital FFLO state in bulk 2H-NbSe2 samples. Additionally, we observe a pronounced in-plane 6-fold symmetry of the upper critical field in the field range above this phase transition, which has previously been interpreted as a hallmark of the orbital FFLO state in thin 2H-NbSe2.
△ Less
Submitted 19 February, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
New Evidence for DM-like Anomalies in Neutron Multiplicity Spectra
Authors:
W. H. Trzaska,
A. Barzilov,
T. Enqvist,
K. Jedrzejczak,
M. Kasztelan,
P. Kuusiniemi,
K. K. Loo,
J. Orzechowski,
M. Slupecki,
J. Szabelski,
T. E. Ward
Abstract:
Subterrestrial neutron spectra show weak but consistent anomalies at multiplicities ~100 and above. The origin of the excess events remains ambiguous, but, in principle, it could be a signature of Dark Matter WIMP annihilation-like interaction with a massive Pb target. However, since the results of the available measurements are below the 5-sigma discovery level, and the observed anomalous structu…
▽ More
Subterrestrial neutron spectra show weak but consistent anomalies at multiplicities ~100 and above. The origin of the excess events remains ambiguous, but, in principle, it could be a signature of Dark Matter WIMP annihilation-like interaction with a massive Pb target. However, since the results of the available measurements are below the 5-sigma discovery level, and the observed anomalous structures are on a significant muon-induced background, an independent verification at even greater depth is needed. For that purpose, we have launched NEMESIS 1.4 - a new dedicated experiment consisting of a 1134 kg Pb target and 14 He-3 detectors with PE moderators and a fully digital readout. NEMESIS 1.4 has been taking data at the deepest level (1.4 km, 4000 m.w.e.) of the Pyhasalmi mine, Finland, since November 2022. We describe the idea behind the new setup, compare the first results with the previous data and Monte Carlo simulations, and give the outlook for further research. If the existence of the anomalies is unambiguously confirmed and the model interpretation positively verified, this will be the first Indirect Detection of Dark Matter in the laboratory.
△ Less
Submitted 27 February, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Irregular Fibonacci Conformal Blocks
Authors:
Xia Gu,
Babak Haghighat,
Kevin Loo
Abstract:
This work studies Liouville conformal blocks of irregular type with the insertion of at least one level-$3$ degenerate field admitting a Fibonacci fusion rule. We algebraically derive the corresponding third-order BPZ equations for regular blocks and their modifications when a rank one irregular operator is inserted. Employing Lefschetz thimbles as integration cycles, we then successively proceed…
▽ More
This work studies Liouville conformal blocks of irregular type with the insertion of at least one level-$3$ degenerate field admitting a Fibonacci fusion rule. We algebraically derive the corresponding third-order BPZ equations for regular blocks and their modifications when a rank one irregular operator is inserted. Employing Lefschetz thimbles as integration cycles, we then successively proceed to construct integral representations and prove that they satisfy the corresponding BPZ equations. Finally, we show that taking a semiclassical limit, these integral representations can be expressed in terms of Heun functions and have correct leading behaviors consistent with conformal weights and fusion rules.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders
Authors:
Hyunji Lee,
Luca Soldaini,
Arman Cohan,
Minjoon Seo,
Kyle Lo
Abstract:
Prevailing research practice today often relies on training dense retrievers on existing large datasets such as MSMARCO and then experimenting with ways to improve zero-shot generalization capabilities to unseen domains. While prior work has tackled this challenge through resource-intensive steps such as data augmentation, architectural modifications, increasing model size, or even further base mo…
▽ More
Prevailing research practice today often relies on training dense retrievers on existing large datasets such as MSMARCO and then experimenting with ways to improve zero-shot generalization capabilities to unseen domains. While prior work has tackled this challenge through resource-intensive steps such as data augmentation, architectural modifications, increasing model size, or even further base model pretraining, comparatively little investigation has examined whether the training procedures themselves can be improved to yield better generalization capabilities in the resulting models. In this work, we recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives. We validate these recommendations using the BEIR benchmark and find results are persistent across choice of dense encoder and base model size and are complementary to other resource-intensive strategies for out-of-domain generalization such as architectural modifications or additional pretraining. We hope that this thorough and impartial study around various training techniques, which augments other resource-intensive methods, offers practical insights for develo** a dense retrieval model that effectively generalizes, even when trained on a single dataset.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Determining Sources in the Bioluminescence Tomography Problem
Authors:
Ming-Hui Ding,
Rongfang Gong,
Hongyu Liu,
Catharine W. K. Lo
Abstract:
In this paper, we revisit the bioluminescence tomography (BLT) problem, where one seeks to reconstruct bioluminescence signals (an internal light source) from external measurements of the Cauchy data. As one kind of optical imaging, the BLT has many merits such as high signal-to-noise ratio, non-destructivity and cost-effectiveness etc., and has potential applications such as cancer diagnosis, dru…
▽ More
In this paper, we revisit the bioluminescence tomography (BLT) problem, where one seeks to reconstruct bioluminescence signals (an internal light source) from external measurements of the Cauchy data. As one kind of optical imaging, the BLT has many merits such as high signal-to-noise ratio, non-destructivity and cost-effectiveness etc., and has potential applications such as cancer diagnosis, drug discovery and development as well as gene therapies and so on. In the literature, BLT is extensively studied based on diffusion approximation (DA) equation, where the distribution of peak sources is to be reconstructed and no solution uniqueness is guaranteed without adequate a priori information. Motivated by the solution uniqueness issue, several theoretical results are explored. The major contributions in this work that are new to the literature are two-fold: first, we show the theoretical uniqueness of the BLT problem where the light sources are in the shape of $C^2$ domains or polyhedral- or corona-shaped; second, we support our results with plenty of problem-orientated numerical experiments.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices
Authors:
Hancheng Cao,
Jesse Dodge,
Kyle Lo,
Daniel A. McFarland,
Lucy Lu Wang
Abstract:
In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and co…
▽ More
In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
On Training Derivative-Constrained Neural Networks
Authors:
KaiChieh Lo,
Daniel Huang
Abstract:
We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalizat…
▽ More
We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.
△ Less
Submitted 11 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Authors:
Yapei Chang,
Kyle Lo,
Tanya Goyal,
Mohit Iyyer
Abstract:
Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book…
▽ More
Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K USD and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than those generated by open-source models. While LLaMA 2 falls behind other models, Mixtral achieves performance on par with GPT-3.5-Turbo. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by annotators.
△ Less
Submitted 13 April, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
Authors:
Orion Weller,
Kyle Lo,
David Wadden,
Dawn Lawrie,
Benjamin Van Durme,
Arman Cohan,
Luca Soldaini
Abstract:
Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular retrieval models, dataset domains, or query types. To answer this, we conduct the first comprehensive analysis of LM-based expansion. We find t…
▽ More
Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular retrieval models, dataset domains, or query types. To answer this, we conduct the first comprehensive analysis of LM-based expansion. We find that there exists a strong negative correlation between retriever performance and gains from expansion: expansion improves scores for weaker models, but generally harms stronger models. We show this trend holds across a set of eleven expansion techniques, twelve datasets with diverse distribution shifts, and twenty-four retrieval models. Through qualitative error analysis, we hypothesize that although expansions provide extra information (potentially improving recall), they add additional noise that makes it difficult to discern between the top relevant documents (thus introducing false positives). Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format; otherwise, avoid expansions to keep the relevance signal clear.
△ Less
Submitted 26 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Authors:
Angel Abusleme,
Thomas Adam,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Muhammad Akram,
Abid Aleem,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli
, et al. (606 additional authors not shown)
Abstract:
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neu…
▽ More
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), a 20 kton liquid scintillator detector currently under construction in South China. The real-time monitoring system is designed to ensure both prompt alert speed and comprehensive coverage of progenitor stars. It incorporates prompt monitors on the electronic board as well as online monitors at the data acquisition stage. Assuming a false alert rate of 1 per year, this monitoring system exhibits sensitivity to pre-SN neutrinos up to a distance of approximately 1.6 (0.9) kiloparsecs and SN neutrinos up to about 370 (360) kiloparsecs for a progenitor mass of 30 solar masses, considering both normal and inverted mass ordering scenarios. The pointing ability of the CCSN is evaluated by analyzing the accumulated event anisotropy of inverse beta decay interactions from pre-SN or SN neutrinos. This, along with the early alert, can play a crucial role in facilitating follow-up multi-messenger observations of the next galactic or nearby extragalactic CCSN.
△ Less
Submitted 4 December, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory
Authors:
Naoki Masuyama,
Yusuke Nojima,
Yuichiro Toda,
Chu Kiong Loo,
Hisao Ishibuchi,
Naoyuki Kubota
Abstract:
With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorit…
▽ More
With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Generalised Winograd Schema and its Contextuality
Authors:
Kin Ian Lo,
Mehrnoosh Sadrzadeh,
Shane Mansfield
Abstract:
Ambiguities in natural language give rise to probability distributions over interpretations. The distributions are often over multiple ambiguous words at a time; a multiplicity which makes them a suitable topic for sheaf-theoretic models of quantum contextuality. Previous research showed that different quantitative measures of contextuality correlate well with Psycholinguistic research on lexical…
▽ More
Ambiguities in natural language give rise to probability distributions over interpretations. The distributions are often over multiple ambiguous words at a time; a multiplicity which makes them a suitable topic for sheaf-theoretic models of quantum contextuality. Previous research showed that different quantitative measures of contextuality correlate well with Psycholinguistic research on lexical ambiguities. In this work, we focus on coreference ambiguities and investigate the Winograd Schema Challenge (WSC), a test proposed by Levesque in 2011 to evaluate the intelligence of machines. The WSC consists of a collection of multiple-choice questions that require disambiguating pronouns in sentences structured according to the Winograd schema, in a way that makes it difficult for machines to determine the correct referents but remains intuitive for human comprehension. In this study, we propose an approach that analogously models the Winograd schema as an experiment in quantum physics. However, we argue that the original Winograd Schema is inherently too simplistic to facilitate contextuality. We introduce a novel mechanism for generalising the schema, rendering it analogous to a Bell-CHSH measurement scenario. We report an instance of this generalised schema, complemented by the human judgements we gathered via a crowdsourcing platform. The resulting model violates the Bell-CHSH inequality by 0.192, thus exhibiting contextuality in a coreference resolution setting.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
A Joint Fermi-GBM and Swift-BAT Analysis of Gravitational-Wave Candidates from the Third Gravitational-wave Observing Run
Authors:
C. Fletcher,
J. Wood,
R. Hamburg,
P. Veres,
C. M. Hui,
E. Bissaldi,
M. S. Briggs,
E. Burns,
W. H. Cleveland,
M. M. Giles,
A. Goldstein,
B. A. Hristov,
D. Kocevski,
S. Lesage,
B. Mailyan,
C. Malacaria,
S. Poolakkil,
A. von Kienlin,
C. A. Wilson-Hodge,
The Fermi Gamma-ray Burst Monitor Team,
M. Crnogorčević,
J. DeLaunay,
A. Tohuvavohu,
R. Caputo,
S. B. Cenko
, et al. (1674 additional authors not shown)
Abstract:
We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses,…
▽ More
We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses, the Targeted Search and the Untargeted Search, we investigate whether there are any coincident GRBs associated with the GWs. We also search the Swift-BAT rate data around the GW times to determine whether a GRB counterpart is present. No counterparts are found. Using both the Fermi-GBM Targeted Search and the Swift-BAT search, we calculate flux upper limits and present joint upper limits on the gamma-ray luminosity of each GW. Given these limits, we constrain theoretical models for the emission of gamma-rays from binary black hole mergers.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Identifying strongly lensed gravitational waves through their phase consistency
Authors:
Jose María Ezquiaga,
Wayne Hu,
Rico K. L. Lo
Abstract:
Strongly lensed gravitational waves (GWs) from binary coalescence manifest as repeated chirps from the original merger. At the detectors, the phase of the lensed GWs and its arrival time differences will be consistent modulo a fixed constant phase shift. We develop a fast and reliable method to efficiently reject event pairs that are not-lensed copies and appropriately rank the most interesting ca…
▽ More
Strongly lensed gravitational waves (GWs) from binary coalescence manifest as repeated chirps from the original merger. At the detectors, the phase of the lensed GWs and its arrival time differences will be consistent modulo a fixed constant phase shift. We develop a fast and reliable method to efficiently reject event pairs that are not-lensed copies and appropriately rank the most interesting candidates. Our method exploits that detector phases are the best measured GW parameter, with errors only of a fraction of a radian and differences across the frequency band that are better measured than the chirp mass. The arrival time phase differences also avoid the shortcomings of looking for overlaps in highly non-Gaussian sky maps. Our basic statistic determining the consistency with lensing is the distance between the phase posteriors of two events and it directly provides information about the lens-source geometry which helps inform electromagnetic followups. We demonstrate that for simulated signals of not-lensed binaries specifically chosen with many coincident properties so as to trigger false lensing alarms none of the pairs have phases closer than $3σ$, and most cases reject the lensing hypothesis by $5σ$. Looking at the latest catalog, GWTC3, we find that only $6\%$ of the pairs are consistent with lensing at 99\% confidence level. Moreover, we reject about half of the pairs that would otherwise favor lensing by their parameter overlaps and demonstrate good correlation with detailed joint parameter estimation results. This reduction of the false alarm rate will be of paramount importance in the upcoming observing runs and the eventual discovery of lensed GWs. Our code is publicly available and could be applied beyond lensing to test possible deviations in the phase evolution from modified theories of gravity and constrain GW birefringence.
△ Less
Submitted 23 October, 2023; v1 submitted 12 August, 2023;
originally announced August 2023.
-
Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi
, et al. (1750 additional authors not shown)
Abstract:
Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect…
▽ More
Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effects of eccentricity. Here, we present observational results for a waveform-independent search sensitive to eccentric black hole coalescences, covering the third observing run (O3) of the LIGO and Virgo detectors. We identified no new high-significance candidates beyond those that were already identified with searches focusing on quasi-circular binaries. We determine the sensitivity of our search to high-mass (total mass $M>70$ $M_\odot$) binaries covering eccentricities up to 0.3 at 15 Hz orbital frequency, and use this to compare model predictions to search results. Assuming all detections are indeed quasi-circular, for our fiducial population model, we place an upper limit for the merger rate density of high-mass binaries with eccentricities $0 < e \leq 0.3$ at $0.33$ Gpc$^{-3}$ yr$^{-1}$ at 90\% confidence level.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Authors:
Hao Peng,
Qingqing Cao,
Jesse Dodge,
Matthew E. Peters,
Jared Fernandez,
Tom Sherborne,
Kyle Lo,
Sam Skjonsberg,
Emma Strubell,
Darrell Plessas,
Iz Beltagy,
Evan Pete Walsh,
Noah A. Smith,
Hannaneh Hajishirzi
Abstract:
Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across diffe…
▽ More
Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across different institutions. Moreover, improvements in metrics such as FLOPs often fail to translate to progress in real-world applications. In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It offers a strictly-controlled hardware platform, and is designed to mirror real-world applications scenarios. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption. Pentathlon also comes with a software library that can be seamlessly integrated into any codebase and enable evaluation. As a standardized and centralized evaluation platform, Pentathlon can drastically reduce the workload to make fair and reproducible efficiency comparisons. While initially focused on natural language processing (NLP) models, Pentathlon is designed to allow flexible extension to other fields. We envision Pentathlon will stimulate algorithmic innovations in building efficient models, and foster an increased awareness of the social and environmental implications in the development of future-generation NLP models.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Strong uniqueness principle for fractional polyharmonic operators and applications to inverse problems
Authors:
Ching-Lung Lin,
Hongyu Liu,
Catharine W. K. Lo
Abstract:
In this work, we are concerned with inverse problems involving poly-fractional operators, where the poly-fractional operator is of the form
\[P( (-Δ_g)^s)u := \sum_{i=1}^M α_i(-Δ_{g_i})^{s_i}u\]
for $s=(s_1,\dots,s_M)$, $0<s_1<\cdots<s_M<\infty$, $s_M\in\mathbb{R}_+\backslash\mathbb{Z}$, $g=(g_1,\dots,g_M)$. There are three major contributions in this work that are new to the literature. First…
▽ More
In this work, we are concerned with inverse problems involving poly-fractional operators, where the poly-fractional operator is of the form
\[P( (-Δ_g)^s)u := \sum_{i=1}^M α_i(-Δ_{g_i})^{s_i}u\]
for $s=(s_1,\dots,s_M)$, $0<s_1<\cdots<s_M<\infty$, $s_M\in\mathbb{R}_+\backslash\mathbb{Z}$, $g=(g_1,\dots,g_M)$. There are three major contributions in this work that are new to the literature. First, we propose equations involving such poly-fractional operators $P$, which have not been previously considered in the general setting. Such equations arise naturally from the superposition of multiple stochastic processes with different scales, including classical random walks and Lévy flights. Secondly, we give novel results for the unique continuation properties for fractional polyharmonic $u$, in the sense that $u$ satisfies $\tilde{P}((-Δ_{\tilde{g}})^{\tilde{s}})=0$ in a bounded Lipschitz domain $Ω$ for some $\tilde{P}$. With these results in hand, we consider the inverse problems for $P$, and proved the uniqueness in recovering the potential, the source function in the semilinear case, and the coefficients associated to the non-isotropy of the fractional operator.
△ Less
Submitted 2 August, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Recipes for computing radiation from a Kerr black hole using Generalized Sasaki-Nakamura formalism: I. Homogeneous solutions
Authors:
Rico K. L. Lo
Abstract:
Central to black hole perturbation theory calculations is the Teukolsky equation that governs the propagation and the generation of radiation emitted by Kerr black holes. However, it is plagued by a long-ranged potential associated to the perturbation equation and hence a direct numerical integration of the equation is challenging. Sasaki and Nakamura devised a formulation that transforms the equa…
▽ More
Central to black hole perturbation theory calculations is the Teukolsky equation that governs the propagation and the generation of radiation emitted by Kerr black holes. However, it is plagued by a long-ranged potential associated to the perturbation equation and hence a direct numerical integration of the equation is challenging. Sasaki and Nakamura devised a formulation that transforms the equation into a new equation that is free from the issue for the case of out-going gravitational radiation. The formulation was later generalized by Hughes to work for any type of radiation. In this work, we revamp the Generalized Sasaki-Nakamura (GSN) formalism and explicitly show the transformations that convert solutions between the Teukolsky and the GSN formalism for both in-going and out-going radiation of scalar, electromagnetic and gravitational type. We derive all necessary ingredients for the GSN formalism to be used in numerical computations. In particular, we describe a new numerical implementation of the formalism, GeneralizedSasakiNakamura.jl, that computes homogeneous solutions to both perturbation equation in the Teukolsky and the GSN formalism. The code works well at low frequencies and is even better at high frequencies by leveraging the fact that black holes are highly permeable to waves at high frequencies. This work lays the foundation for an efficient scheme to compute gravitational radiation from Kerr black holes and an alternative way to compute quasi-normal modes of Kerr black holes.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
Authors:
Chu Kiong Loo,
Wei Shiung Liew,
Stefan Wermter
Abstract:
Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study pres…
▽ More
Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study presents the Explainable Lifelong Learning (ExLL) model, which incorporates several important traits: 1) learning to learn, in a single pass, from streaming data with scarce examples and resources; 2) a self-organizing prototype-based architecture that expands as needed and clusters streaming data into separable groups by similarity and preserves data against catastrophic forgetting; 3) an interpretable architecture to convert the clusters into explainable IF-THEN rules as well as to justify model predictions in terms of what is similar and dissimilar to the inference; and 4) inferences at the global and local level using a pairwise decision fusion process to enhance the accuracy of the inference, hence ``Glocal Pairwise Fusion.'' We compare ExLL against contemporary online learning algorithms for image recognition, using OpenLoris, F-SIOL-310, and Places datasets to evaluate several continual learning scenarios for video streams, low-sample learning, ability to scale, and imbalanced data streams. The algorithms are evaluated for their performance in accuracy, number of parameters, and experiment runtime requirements. ExLL outperforms all algorithms for accuracy in the majority of the tested scenarios.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Muhammad Akram,
Abid Aleem,
Tsagkarakis Alexandros,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato
, et al. (581 additional authors not shown)
Abstract:
We discuss JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo via detecting inverse beta decay reactions of electron anti-neutrinos resulting from the annihilation. We study possible backgrounds to the signature, including the reactor neutrinos, diffuse supernova neutrino background, charged- and neutral-current interactions of atmospheric neutrinos, backgrounds from muon…
▽ More
We discuss JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo via detecting inverse beta decay reactions of electron anti-neutrinos resulting from the annihilation. We study possible backgrounds to the signature, including the reactor neutrinos, diffuse supernova neutrino background, charged- and neutral-current interactions of atmospheric neutrinos, backgrounds from muon-induced fast neutrons and cosmogenic isotopes. A fiducial volume cut, as well as the pulse shape discrimination and the muon veto are applied to suppress the above backgrounds. It is shown that JUNO sensitivity to the thermally averaged dark matter annihilation rate in 10 years of exposure would be significantly better than the present-day best limit set by Super-Kamiokande and would be comparable to that expected by Hyper-Kamiokande.
△ Less
Submitted 13 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Distributed Trust Through the Lens of Software Architecture
Authors:
Sin Kit Lo,
Yue Liu,
Guangsheng Yu,
Qinghua Lu,
Xiwei Xu,
Liming Zhu
Abstract:
Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues across organizational boundaries, and zero trust…
▽ More
Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues across organizational boundaries, and zero trust cybersecurity. This paper will survey the concept of distributed trust in multiple disciplines. It will take a system/software architecture point of view to look at trust redistribution/shift and the associated tradeoffs in systems and applications enabled by distributed trust technologies.
△ Less
Submitted 25 May, 2023;
originally announced June 2023.
-
Follow-up Analyses to the O3 LIGO-Virgo-KAGRA Lensing Searches
Authors:
Justin Janquart,
Mick Wright,
Srashti Goyal,
Juno C. L. Chan,
Apratim Ganguly,
Ángel Garrón,
David Keitel,
Alvin K. Y. Li,
Anna Liu,
Rico K. L. Lo,
Anuj Mishra,
Anupreeta More,
Hemantakumar Phurailatpam,
Prasia Pankunni,
Sylvia Biscoveanu,
Paolo Cremonese,
Jean-René Cudell,
José M. Ezquiaga,
Juan Garcia-Bellido,
Otto A. Hannuksela,
K. Haris,
Ian Harry,
Martin Hendry,
Sascha Husa,
Shasvath Kapadia
, et al. (6 additional authors not shown)
Abstract:
Along their path from source to observer, gravitational waves may be gravitationally lensed by massive objects. This results in distortions of the observed signal which can be used to extract new information about fundamental physics, astrophysics, and cosmology. Searches for these distortions amongst the observed signals from the current detector network have already been carried out, though ther…
▽ More
Along their path from source to observer, gravitational waves may be gravitationally lensed by massive objects. This results in distortions of the observed signal which can be used to extract new information about fundamental physics, astrophysics, and cosmology. Searches for these distortions amongst the observed signals from the current detector network have already been carried out, though there have as yet been no confident detections. However, predictions of the observation rate of lensing suggest detection in the future is a realistic possibility. Therefore, preparations need to be made to thoroughly investigate the candidate lensed signals. In this work, we present some of the follow-up analyses and strategies that could be applied to assess the significance of such events and ascertain what information may be extracted about the lens-source system from such candidate signals by applying them to a number of O3 candidate events, even if these signals did not yield a high significance for any of the lensing hypotheses. For strongly-lensed candidates, we verify their significance using a background of simulated unlensed events and statistics computed from lensing catalogs. We also look for potential electromagnetic counterparts. In addition, we analyse in detail a candidate for a strongly-lensed sub-threshold counterpart that is identified by a new method. For microlensing candidates, we perform model selection using a number of lens models to investigate our ability to determine the mass density profile of the lens and constrain the lens parameters. We also look for millilensing signatures in one of the lensed candidates. Applying these additional analyses does not lead to any additional evidence for lensing in the candidates that have been examined. However, it does provide important insight into potential avenues to deal with high-significance candidates in future observations.
△ Less
Submitted 15 August, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.