Search | arXiv e-print repository

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Authors: Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen

Abstract: Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defens… ▽ More Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Under Review; Waiting for updates

arXiv:2407.08009 [pdf, other]

Long-fiber Sagnac interferometers for twin field quantum key distribution networks

Authors: Reem Mandil, Li Qian, Hoi-Kwong Lo

Abstract: A Sagnac loop structure can help overcome the major difficulty in the practical implementation of a twin field quantum key distribution (TFQKD) network, namely, the need to stabilize the phase of a quantum state over many kilometers of fiber. Unfortunately, Rayleigh backscattering noise limits the signal-to-noise ratio for Sagnac systems containing long fibers and lossy photonic devices. Here, we… ▽ More A Sagnac loop structure can help overcome the major difficulty in the practical implementation of a twin field quantum key distribution (TFQKD) network, namely, the need to stabilize the phase of a quantum state over many kilometers of fiber. Unfortunately, Rayleigh backscattering noise limits the signal-to-noise ratio for Sagnac systems containing long fibers and lossy photonic devices. Here, we solve this problem by sending optical pulses in long on-off bursts and using time post-selection on measurements taken with free-run single-photon avalanche detectors. We also investigate the impact of the residual phase noise uncompensated by the Sagnac structure and find that the variance of the phase noise scales as loop length to the third power, verifying an existing calculation in the literature. We measure the interference visibility in Sagnac loops of varying length without active phase or polarization stabilization and achieve > 97% visibility in 200 km ultra-low-loss fiber, which is, to our knowledge, the longest fiber Sagnac interferometer demonstrated. Our results indicate the suitability of a Sagnac system for long-distance TFQKD networks, an important step towards the practical implementation of metropolitan quantum networks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07875 [pdf, other]

Generative Image as Action Models

Authors: Mohit Shridhar, Yat Long Lo, Stephen James

Abstract: Image-generation diffusion models have been fine-tuned to unlock new capabilities such as image-editing and novel view synthesis. Can we similarly unlock image-generation models for visuomotor control? We present GENIMA, a behavior-cloning agent that fine-tunes Stable Diffusion to 'draw joint-actions' as targets on RGB images. These images are fed into a controller that maps the visual targets int… ▽ More Image-generation diffusion models have been fine-tuned to unlock new capabilities such as image-editing and novel view synthesis. Can we similarly unlock image-generation models for visuomotor control? We present GENIMA, a behavior-cloning agent that fine-tunes Stable Diffusion to 'draw joint-actions' as targets on RGB images. These images are fed into a controller that maps the visual targets into a sequence of joint-positions. We study GENIMA on 25 RLBench and 9 real-world manipulation tasks. We find that, by lifting actions into image-space, internet pre-trained diffusion models can generate policies that outperform state-of-the-art visuomotor approaches, especially in robustness to scene perturbations and generalizing to novel objects. Our method is also competitive with 3D agents, despite lacking priors such as depth, keypoints, or motion-planners. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Project website, code, checkpoints: https://genima-robot.github.io/

arXiv:2407.06377 [pdf, other]

Measurement and analysis of the $^{246}$Cm and $^{248}$Cm neutron capture cross-sections at the EAR2 of the n TOF facility

Authors: V. Alcayne, A. Kimura, E. Mendoza, D. Cano-Ott, O. Aberle, F. Álvarez-Velarde, S. Amaducci, J. Andrzejewski, L. Audouin, V. Bécares, V. Babiano-Suarez, M. Bacak, M. Barbagallo, F. Bečvář, G. Bellia, E. Berthoumieux, J. Billowes, D. Bosnar, A. Brown, M. Busso, M. Caamaño, L. Caballero-Ontanaya, F. Calviño, M. Calviani, A. Casanovas , et al. (108 additional authors not shown)

Abstract: The $^{246}$Cm(n,$γ$) and $^{248}$Cm(n,$γ$) cross-sections have been measured at the Experimental Area 2 (EAR2) of the n_TOF facility at CERN with three C$_6$D$_6$ detectors. This measurement is part of a collective effort to improve the capture cross-section data for Minor Actinides (MAs), which are required to estimate the production and transmutation rates of these isotopes in light water react… ▽ More The $^{246}$Cm(n,$γ$) and $^{248}$Cm(n,$γ$) cross-sections have been measured at the Experimental Area 2 (EAR2) of the n_TOF facility at CERN with three C$_6$D$_6$ detectors. This measurement is part of a collective effort to improve the capture cross-section data for Minor Actinides (MAs), which are required to estimate the production and transmutation rates of these isotopes in light water reactors and innovative reactor systems. In particular, the neutron capture in $^{246}$Cm and $^{248}$Cm open the path for the formation of other Cm isotopes and heavier elements such as Bk and Cf and the knowledge of (n,$γ$) cross-sections of these Cm isotopes plays an important role in the transport, transmutation and storage of the spent nuclear fuel. The reactions $^{246}$Cm(n,$γ$) and $^{248}$Cm(n,$γ$) have been the two first capture measurements analyzed at n_TOF EAR2. Until this experiment and two recent measurements performed at J-PARC, there was only one set of data of the capture cross-sections of $^{246}$Cm and $^{248}$Cm, that was obtained in 1969 in an underground nuclear explosion experiment. In the measurement at n_TOF a total of 13 resonances of $^{246}$Cm between 4 and 400 eV and 5 of $^{248}$Cm between 7 and 100 eV have been identified and fitted. The radiative kernels obtained for $^{246}$Cm are compatible with JENDL-5, but some of them are not with JENDL-4, which has been adopted by JEFF-3.3 and ENDF/B-VIII.0. The radiative kernels obtained for the first three $^{248}$Cm resonances are compatible with JENDL-5, however, the other two are not compatible with any other evaluation and are 20% and 60% larger than JENDL-5. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03122 [pdf, other]

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

Authors: Wei Gao, Bo Ai, Joel Loo, Vinay, David Hsu

Abstract: This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper… ▽ More This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper proposes two types of intentions, Local Path and Environment (LPE) and Discretised Local Move (DLM), and shows that DLM is robust to significant metric positioning and map** errors. The paper also presents Kilo-IntentionNet, an instance of the IntentionNet system using the DLM intention that is deployed on a Boston Dynamics Spot robot, and which successfully navigates through complex indoor and outdoor environments over distances of up to a kilometre with only noisy odometry. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02824 [pdf, other]

Exploring the Capabilities of LLMs for Code Change Related Tasks

Authors: Lishui Fan, Jiakun Liu, Zhongxin Liu, David Lo, Xin Xia, Shan** Li

Abstract: Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions… ▽ More Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions. Thus, it is an open question how LLMs perform on code-change-related tasks. To answer this question, we conduct an empirical study using \textgreater 1B parameters LLMs on three code-change-related tasks, i.e., code review generation, commit message generation, and just-in-time comment update, with in-context learning (ICL) and parameter-efficient fine-tuning (PEFT, including LoRA and prefix-tuning). We observe that the performance of LLMs is poor without examples and generally improves with examples, but more examples do not always lead to better performance. LLMs tuned with LoRA have comparable performance to the state-of-the-art small pre-trained models. Larger models are not always better, but \textsc{Llama~2} and \textsc{Code~Llama} families are always the best. The best LLMs outperform small pre-trained models on the code changes that only modify comments and perform comparably on other code changes. We suggest future work should focus more on guiding LLMs to learn the knowledge specific to the changes related to code rather than comments for code-change-related tasks. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02640 [pdf, other]

Subpath-Based Column Generation for the Electric Routing-Scheduling Problem

Authors: Alexandre Jacquillat, Sean Lo

Abstract: Motivated by widespread electrification targets, this paper studies an electric routing-scheduling problem (ERSP) that jointly optimizes routing-scheduling and charging decisions. The ERSP is formulated as a semi-infinite set-partitioning model, where continuous charging decisions result in infinitely-many path-based variables. To solve it, we develop a column generation algorithm with a bi-level… ▽ More Motivated by widespread electrification targets, this paper studies an electric routing-scheduling problem (ERSP) that jointly optimizes routing-scheduling and charging decisions. The ERSP is formulated as a semi-infinite set-partitioning model, where continuous charging decisions result in infinitely-many path-based variables. To solve it, we develop a column generation algorithm with a bi-level label-setting algorithm to decompose the pricing problem into (i) a first-level procedure to generate subpaths between charging stations, and (ii) a second-level procedure to combine subpaths into paths. We formalize subpath-based domination properties to establish the finite convergence and exactness of the column generation algorithm. We prove that the methodology can handle modeling extensions with heterogeneous charging costs (via dynamic re-optimization of charging decisions) and algorithm extensions to tighten the relaxation using ng-routes and limited-memory subset-row inequalities (via augmented domination criteria). Computational results show that the methodology scales to large instances, outperforming state-of-the-art column generation algorithms. From a practical standpoint, the methodology achieves significant cost reductions by jointly optimizing routing-scheduling and charging decisions and by capturing heterogeneous charging costs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 30 pages

MSC Class: 90C39 (Primary) 90C11; 90B06 (Secondary)

arXiv:2407.02473 [pdf, other]

Open Scene Graphs for Open World Object-Goal Navigation

Authors: Joel Loo, Zhanxin Wu, David Hsu

Abstract: How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation is needed to connect them into a complete robot system. We address this with Open Scene Graphs (OSGs), a topo-semantic representation that retains and organises… ▽ More How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation is needed to connect them into a complete robot system. We address this with Open Scene Graphs (OSGs), a topo-semantic representation that retains and organises open-set scene information for these models, and has a structure that can be configured for different environment types. We integrate foundation models and OSGs into the OpenSearch system for Open World Object-Goal Navigation, which is capable of searching for open-set objects specified in natural language, while generalising zero-shot across diverse environments and embodiments. Our OSGs enhance reasoning with Large Language Models (LLM), enabling robust object-goal navigation outperforming existing LLM approaches. Through simulation and real-world experiments, we validate OpenSearch's generalisation across varied environments, robots and novel instructions. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02290 [pdf, other]

A systematic comparison of measures for k-anonymity in networks

Authors: Rachel G. de Jong, Mark P. J. van der Loo, Frank W. Takes

Abstract: Privacy-aware sharing of network data is a difficult task due to the interconnectedness of individuals in networks. An important part of this problem is the inherently difficult question of how in a particular situation the privacy of an individual node should be measured. To that end, in this paper we propose a set of aspects that one should consider when choosing a measure for privacy. These asp… ▽ More Privacy-aware sharing of network data is a difficult task due to the interconnectedness of individuals in networks. An important part of this problem is the inherently difficult question of how in a particular situation the privacy of an individual node should be measured. To that end, in this paper we propose a set of aspects that one should consider when choosing a measure for privacy. These aspects include the type of desired privacy and attacker scenario against which the measure protects, utility of the data, the type of desired output, and the computational complexity of the chosen measure. Based on these aspects, we provide a systematic overview of existing approaches in the literature. We then focus on a set of measures that ultimately enables our objective: sharing the anonymized full network dataset with limited disclosure risk. The considered measures, each based on the concept of k-anonymity, account for the structure of the surroundings of a certain node and differ in completeness and reach of the structural information taken into account. We present a comprehensive theoretical characterization as well as comparative empirical experiments on a wide range of real-world network datasets with up to millions of edges. We find that the choice of the measure has an enormous effect on aforementioned aspects. Most interestingly, we find that the most effective measures consider a greater node vicinity, yet utilize minimal structural information and thus use minimal computational resources. This finding has important implications for researchers and practitioners, who may, based on the recommendations given in this paper, make an informed choice on how to safely share large-scale network data in a privacy-aware manner. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00225 [pdf, other]

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

Authors: Wendkûuni C. Ouédraogo, Kader Kaboré, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, Tegawendé F. Bissyandé

Abstract: Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However… ▽ More Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However, their effectiveness remains unclear. This study conducts the first comprehensive investigation of LLMs, evaluating the effectiveness of four LLMs and five prompt engineering techniques, for unit test generation. We analyze 216\,300 tests generated by the selected advanced instruct-tuned LLMs for 690 Java classes collected from diverse datasets. We assess correctness, understandability, coverage, and bug detection capabilities of LLM-generated tests, comparing them to EvoSuite, a popular automated testing tool. While LLMs show potential, improvements in test correctness are necessary. This study reveals the strengths and limitations of LLMs compared to traditional methods, paving the way for further research on LLMs in software engineering. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.19867 [pdf, other]

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Authors: Gabriele Di Bona, Emma Fraxanet, Björn Komander, Andrea Lo Sasso, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi

Abstract: Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization mea… ▽ More Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18219 [pdf, other]

A Closer Look into Mixture-of-Experts in Large Language Models

Authors: Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu

Abstract: Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechani… ▽ More Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17654 [pdf, other]

MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection

Authors: Michelle Adeline, Junn Yong Loo, Vishnu Monn Baskaran

Abstract: Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs… ▽ More Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top-k anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention mechanism only projects to a single image but allows reference points to seamlessly attend to adjacent images, improving efficiency without compromising on performance. On the nuScenes val set, it achieves 46.4% mAP and 55.0% NDS with a ResNet101 backbone. MDHA significantly outperforms the baseline, where anchor proposals are modelled as learnable embeddings. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16746 [pdf, other]

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Authors: Shayne Longpre, Stella Biderman, Alon Albalak, Hailey Schoelkopf, Daniel McDuff, Sayash Kapoor, Kevin Klyman, Kyle Lo, Gabriel Ilharco, Nay San, Maribeth Rauh, Aviya Skowron, Bertie Vidgen, Laura Weidinger, Arvind Narayanan, Victor Sanh, David Adelani, Percy Liang, Rishi Bommasani, Peter Henderson, Sasha Luccioni, Yacine Jernite, Luca Soldaini

Abstract: Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation,… ▽ More Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. We hope this curated collection of resources helps guide more responsible development. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed so that capabilities and impact are assessed in context. △ Less

Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16401 [pdf, ps, other]

doi 10.4204/EPTCS.403.19

Bijective Enumeration and Sign-Imbalance for Permutation Depth and Excedances

Authors: Sen-Peng Eu, Tung-Shan Fu, Yuan-Hsun Lo

Abstract: We present a simplified variant of Biane's bijection between permutations and 3-colored Motzkin paths with weight that keeps track of the inversion number, excedance number and a statistic so-called depth of a permutation. This generalizes a result by Guay-Paquet and Petersen about a continued fraction of the generating function for depth on the permutations of n elements. In terms of weighted Mot… ▽ More We present a simplified variant of Biane's bijection between permutations and 3-colored Motzkin paths with weight that keeps track of the inversion number, excedance number and a statistic so-called depth of a permutation. This generalizes a result by Guay-Paquet and Petersen about a continued fraction of the generating function for depth on the permutations of n elements. In terms of weighted Motzkin path, we establish an involution on the permutations that reverses the parities of depth and excedance numbers simultaneously, which proves that the numbers of permutations with even and odd depth (excedance numbers, respectively) are equal if n is even and differ by the tangent number if n is odd. Moreover, we present some interesting sign-imbalance results on permutations and derangements, refined with respect to depth and excedance numbers. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: In Proceedings GASCom 2024, arXiv:2406.14588

Journal ref: EPTCS 403, 2024, pp. 87-91

arXiv:2406.16264 [pdf, other]

One Thousand and One Pairs: A "novel" challenge for long-context language models

Authors: Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer

Abstract: Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, wr… ▽ More Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, written by human readers of those books. In contrast to existing long-context benchmarks, our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify. Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%. Further analysis reveals that (1) on average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning; (2) model-generated explanations for their decisions are often inaccurate even for correctly-labeled claims; and (3) models perform substantially worse on speculative fiction books that contain extensive world-building. The methodology proposed in NoCha allows for the evolution of the benchmark dataset and the easy analysis of future models. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: preprint, 29 pages

arXiv:2406.15877 [pdf, other]

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area. △ Less

Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

arXiv:2406.14607 [pdf, other]

Quantum Extreme Learning of molecular potential energy surfaces and force fields

Authors: Gabriele Lo Monaco, Marco Bertini, Salvatore Lorenzo, G. Massimo Palma

Abstract: Quantum machine learning algorithms are expected to play a pivotal role in quantum chemistry simulations in the immediate future. One such key application is the training of a quantum neural network to learn the potential energy surface and force field of molecular systems. We address this task by using the quantum extreme learning machine paradigm. This particular supervised learning routine allo… ▽ More Quantum machine learning algorithms are expected to play a pivotal role in quantum chemistry simulations in the immediate future. One such key application is the training of a quantum neural network to learn the potential energy surface and force field of molecular systems. We address this task by using the quantum extreme learning machine paradigm. This particular supervised learning routine allows for resource-efficient training, consisting of a simple linear regression performed on a classical computer. We have tested a setup that can be used to study molecules of any dimension and is optimized for immediate use on NISQ devices with a limited number of native gates. We have applied this setup to three case studies: lithium hydride, water, and formamide, carrying out both noiseless simulations and actual implementation on IBM quantum hardware. Compared to other supervised learning routines, the proposed setup requires minimal quantum resources, making it feasible for direct implementation on quantum platforms, while still achieving a high level of predictive accuracy compared to simulations. Our encouraging results pave the way towards the future application to more complex molecules, being the proposed setup scalable. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 7 figures. Accepted on Machine Learning: Science and Technology

arXiv:2406.13189 [pdf, ps, other]

The cosmological significance of boundary term in non-metricity gravity

Authors: Hamid Shabani, Avik De, Tee-How Loo

Abstract: Within the context of metric-affine gravity, we examine the significance of the boundary term in symmetric teleparallel gravity by employing the cosmological dynamical system analysis method. We focus on the novel gravity models characterized by the functions $f(Q,C)$, where $f$ is a smooth function of the non-metricity scalar $Q$ and the associated boundary term $C$. In a cosmological setting ado… ▽ More Within the context of metric-affine gravity, we examine the significance of the boundary term in symmetric teleparallel gravity by employing the cosmological dynamical system analysis method. We focus on the novel gravity models characterized by the functions $f(Q,C)$, where $f$ is a smooth function of the non-metricity scalar $Q$ and the associated boundary term $C$. In a cosmological setting adopting three different classes of symmetric teleparallel affine connections, we investigate a model $f(Q,C)=Q^{s}+eC^{r}$, and some special cases of this model. We show that the boundary term which is added to the Einsteinian field equations (or equivalently to $f(Q)=Q$ ones) are capable of bringing forward solutions corresponding to the early accelerated expansion. This alludes the physics behind the boundary terms which usually are discarded in the most gravitational theories. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2406.11701 [pdf, other]

Coronal energy release by MHD avalanches II. EUV line emission from a multi-threaded coronal loop

Authors: G. Cozzo, J. Reid, P. Pagano, F. Reale, P. Testa, A. W. Hood, C. Argiroffi, A. Petralia, E. Alaimo, F. D'Anca, L. Sciortino, M. Todaro, U. Lo Cicero, M. Barbera, B. De Pontieu, J. Martinez-Sykora

Abstract: MHD kink instability can trigger the fragmentation of a twisted magnetic flux tube into small-scale current sheets that dissipate as aperiodic impulsive heating events. This instability propagates as an avalanche to nearby flux tubes and leads to a nanoflare storm. Our previous work was devoted to related 3D MHD numerical modeling with a stratified and realistic atmosphere. This work addresses pre… ▽ More MHD kink instability can trigger the fragmentation of a twisted magnetic flux tube into small-scale current sheets that dissipate as aperiodic impulsive heating events. This instability propagates as an avalanche to nearby flux tubes and leads to a nanoflare storm. Our previous work was devoted to related 3D MHD numerical modeling with a stratified and realistic atmosphere. This work addresses predictions for the EUV imaging spectroscopy of such structure and evolution of a loop, with an average temperature of 2.5 MK in the solar corona. We set a particular focus on the forthcoming MUSE mission. From the output of the numerical simulations, we synthesized the intensities, Doppler shifts, and non-thermal line broadening in 3 EUV spectral lines in the MUSE passbands: Fe IX 171A, Fe XV 284 A, and Fe XIX 108 A, at 1 MK, 2 MK, and 10 MK, respectively, according to the MUSE expected pixel size, temporal resolution, and temperature response functions. We provide maps showing different view angles and realistic spectra. Finally, we discuss the relevant evolutionary processes from the perspective of possible observations. We find that the MUSE observations might be able to detect the fine structure determined by tube fragmentation. In particular, the Fe IX line is mostly emitted at the loop footpoints, where we track the motions that drive the magnetic stressing and detect the upward motion of evaporating plasma from the chromosphere. In Fe XV, we see the bulk of the loop with increasing intensity. The Fe XIX line is very faint within the chosen simulation parameters; thus, any transient brightening around the loop apex may possibly be emphasized by the folding of sheet-like structure. In conclusion, we show that coronal loop observations with MUSE can pinpoint some crucial features of MHD-modeled ignition processes, such as the related dynamics, hel** to identify the heating processes. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11271 [pdf, other]

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Authors: Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Ye** Choi, Ludwig Schmidt

Abstract: Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo… ▽ More Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09887 [pdf, ps, other]

Split-Apply-Combine with Dynamic Grou**

Authors: Mark P. J. van der Loo

Abstract: Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-co… ▽ More Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-combine types of operations. This paper presents the \texttt{R} package \texttt{accumulate} that offers convenient interfaces for defining grouped aggregation where the grou** itself is dynamically determined, based on user-defined conditions on subsets, and a user-defined subset collapsing scheme. The formal underlying algorithm is described and analyzed as well. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted for publication

arXiv:2406.08832 [pdf, other]

Multiplexed Quantum Communication with Surface and Hypergraph Product Codes

Authors: Shin Nishio, Nicholas Connolly, Nicolò Lo Piparo, William John Munro, Thomas Rowan Scruby, Kae Nemoto

Abstract: Connecting multiple processors via quantum interconnect technologies could help to overcome issues of scalability in single-processor quantum computers. Transmission via these interconnects can be performed more efficiently using quantum multiplexing, where information is encoded in high-dimensional photonic degrees of freedom. We explore the effects of multiplexing on logical error rates in surfa… ▽ More Connecting multiple processors via quantum interconnect technologies could help to overcome issues of scalability in single-processor quantum computers. Transmission via these interconnects can be performed more efficiently using quantum multiplexing, where information is encoded in high-dimensional photonic degrees of freedom. We explore the effects of multiplexing on logical error rates in surface codes and hypergraph product codes. We show that, although multiplexing makes loss errors more damaging, assigning qubits to photons in an intelligent manner can minimize these effects, and the ability to encode higher-distance codes in a smaller number of photons can result in overall lower logical error rates. This multiplexing technique can also be adapted to quantum communication and multimode quantum memory with high-dimensional qudit systems. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 12 pages + 12-page appendices, 19 figures

ACM Class: E.4; C.2; G.2

arXiv:2406.08437 [pdf, other]

Effect of Cr Segregation on Grain Growth in Nanocrystalline α-Fe Alloy: A Multiscale Modelling Approach

Authors: Sandip Guin, Albert Linda, Yu-Chieh Lo, Somanth Bhowmick, Rajdip Mukherjee

Abstract: We present a multiscale modelling framework that integrates density functional theory (DFT) with a phase-field model (PFM) to explore the intricate dynamics of grain growth in nanocrystalline α-Fe single-phase alloy in the presence of chromium (Cr) segregation. We begin our study by validating our simulation results for equilibrium segregation in stationary GB with Mclean isotherm. Polycrystal sim… ▽ More We present a multiscale modelling framework that integrates density functional theory (DFT) with a phase-field model (PFM) to explore the intricate dynamics of grain growth in nanocrystalline α-Fe single-phase alloy in the presence of chromium (Cr) segregation. We begin our study by validating our simulation results for equilibrium segregation in stationary GB with Mclean isotherm. Polycrystal simulations featuring nanocrystalline grains at different temperatures reveal that the grain growth kinetics depends on the ratio of Cr diffusivity to intrinsic GB mobility. In the absence of segregation, the relationship between the square of average grain size (d 2 ) and time (t) demonstrates a linear correlation. We observe that the d 2 vs. t plot exhibits a consistent linear trend up to a threshold grain size, independent of Cr segregation at GB. However, when Cr is segregated at GB, a deviation from this linear trend with a decreasing slope is evident within the temperature range of 700K to 900K beyond the threshold size. This threshold grain size decreases with increasing temperature. Notably, at 1000K, the deviation from the linear trend is observed from the initial stages of grain growth with segregation, albeit with a linear trend exhibiting a smaller slope. We also present an analytical formulation based on Cahn solute drag theory to predict grain growth behaviour in the presence of solute segregation and our simulation results well aligned this analytical formulation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08304 [pdf, other]

NIRPS first light and early science: breaking the 1 m/s RV precision barrier at infrared wavelengths

Authors: Étienne Artigau, François Bouchy, René Doyon, Frédérique Baron, Lison Malo, François Wildi, Franceso Pepe, Neil J. Cook, Simon Thibault, Vladimir Reshetov, Xavier Dumusque, Christophe Lovis, Danuta Sosnowska, Bruno L. Canto Martins, Jose Renan De Medeiros, Xavier Delfosse, Nuno Santos, Rafael Rebolo, Manuel Abreu, Guillaume Allain, Romain Allart, Hugues Auger, Susana Barros, Luc Bazinet, Nicolas Blind , et al. (89 additional authors not shown)

Abstract: The Near-InfraRed Planet Searcher or NIRPS is a precision radial velocity spectrograph developed through collaborative efforts among laboratories in Switzerland, Canada, Brazil, France, Portugal and Spain. NIRPS extends to the 0.98-1.8 $μ$m domain of the pioneering HARPS instrument at the La Silla 3.6-m telescope in Chile and it has achieved unparalleled precision, measuring stellar radial velocit… ▽ More The Near-InfraRed Planet Searcher or NIRPS is a precision radial velocity spectrograph developed through collaborative efforts among laboratories in Switzerland, Canada, Brazil, France, Portugal and Spain. NIRPS extends to the 0.98-1.8 $μ$m domain of the pioneering HARPS instrument at the La Silla 3.6-m telescope in Chile and it has achieved unparalleled precision, measuring stellar radial velocities in the infrared with accuracy better than 1 m/s. NIRPS can be used either stand-alone or simultaneously with HARPS. Commissioned in late 2022 and early 2023, NIRPS embarked on a 5-year Guaranteed Time Observation (GTO) program in April 2023, spanning 720 observing nights. This program focuses on planetary systems around M dwarfs, encompassing both the immediate solar vicinity and transit follow-ups, alongside transit and emission spectroscopy observations. We highlight NIRPS's current performances and the insights gained during its deployment at the telescope. The lessons learned and successes achieved contribute to the ongoing advancement of precision radial velocity measurements and high spectral fidelity, further solidifying NIRPS' role in the forefront of the field of exoplanets. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Proceeding at the SPIE Astronomical Telescopes + Instrumentation conference [Yokohama,Japan; June 2024]

arXiv:2406.07899 [pdf, other]

Josephson Parametric Amplifier based Quantum Noise Limited Amplifier Development for Axion Search Experiments in CAPP

Authors: Sergey V. Uchaikin, **myeong Kim, Caglar Kutlu, Boris I. Ivanov, **su Kim, Arjan F. van Loo, Yasunobu Nakamura, Saebyeok Ahn, Seonjeong Oh, Minsu Ko, Yannis K. Semertzidis

Abstract: This paper provides a comprehensive overview of the development of flux-driven Josephson Parametric Amplifiers (JPAs) as Quantum Noise Limited Amplifier for axion search experiments conducted at the Center for Axion and Precision Physics Research (CAPP) of the Institute for Basic Science. It focuses on the characterization, and optimization of JPAs, which are crucial for achieving the highest sens… ▽ More This paper provides a comprehensive overview of the development of flux-driven Josephson Parametric Amplifiers (JPAs) as Quantum Noise Limited Amplifier for axion search experiments conducted at the Center for Axion and Precision Physics Research (CAPP) of the Institute for Basic Science. It focuses on the characterization, and optimization of JPAs, which are crucial for achieving the highest sensitivity in axion particle detection. We discuss various characterization techniques, methods for improving bandwidth, and the attainment of ultra-low noise temperatures. JPAs have emerged as indispensable tools in CAPPs axion search endeavors, playing a significant role in advancing our understanding of fundamental physics and unraveling the mysteries of the universe. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 29 pages, 15 figures

arXiv:2406.07835 [pdf, other]

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

Authors: David Wadden, Kejian Shi, Jacob Morrison, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan

Abstract: We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed t… ▽ More We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPS Datasets and Benchmarks 2024

arXiv:2406.06386 [pdf, other]

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

Authors: Julia Yang, Alina Jade Barnett, Jon Donnelly, Satvik Kishore, Jerry Fang, Fides Regina Schwartz, Chaofan Chen, Joseph Y. Lo, Cynthia Rudin

Abstract: Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t… ▽ More Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse- to fine-grained prototypes. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures, Accepted for oral presentation at the 2024 CVPR Workshop on Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)

arXiv:2406.06278 [pdf, other]

Three super-Earths and a possible water world from TESS and ESPRESSO

Authors: M. J. Hobson, F. Bouchy, B. Lavie, C. Lovis, V. Adibekyan, C. Allende Prieto, Y. Alibert, S. C. C. Barros, A. Castro-González, S. Cristiani, V. D'Odorico, M. Damasso, P. Di Marcantonio, X. Dumusque, D. Ehrenreich, P. Figueira, R. Génova Santos, J. I. González Hernández, J. Lillo-Box, G. Lo Curto, C. J. A. P. Martins, A. Mehner, G. Micela, P. Molaro, N. J. Nunes , et al. (29 additional authors not shown)

Abstract: Since 2018, the ESPRESSO spectrograph at the VLT has been hunting for planets in the Southern skies via the RV method. One of its goals is to follow up candidate planets from transit surveys such as the TESS mission, particularly small planets. We analyzed photometry from TESS and ground-based facilities, high-resolution imaging, and RVs from ESPRESSO, HARPS, and HIRES, to confirm and characterize… ▽ More Since 2018, the ESPRESSO spectrograph at the VLT has been hunting for planets in the Southern skies via the RV method. One of its goals is to follow up candidate planets from transit surveys such as the TESS mission, particularly small planets. We analyzed photometry from TESS and ground-based facilities, high-resolution imaging, and RVs from ESPRESSO, HARPS, and HIRES, to confirm and characterize three new planets: TOI-260 b, transiting a late K-dwarf, and TOI-286 b and c, orbiting an early K-dwarf. We also update parameters for the known super-Earth TOI-134 b , hosted by an M-dwarf. TOI-260 b has a $13.475853^{+0.000013}_{-0.000011}$ d period, $4.23 \pm1.60 \mathrm{M_\oplus}$ mass and $1.71\pm0.08\mathrm{R_\oplus}$ radius. For TOI-286 b we find a $4.5117244^{+0.0000031}_{-0.0000027}$ d period, $4.53\pm0.78\mathrm{M_\oplus}$ mass and $1.42\pm0.10\mathrm{R_\oplus}$ radius; for TOI-286 c, a $39.361826^{+0.000070}_{-0.000081}$ d period, $3.72\pm2.22\mathrm{M_\oplus}$ mass and $1.88\pm 0.12\mathrm{R_\oplus}$ radius. For TOI-134 b we obtain a $1.40152604^{+0.00000074}_{-0.00000082}$ d period, $4.07\pm0.45\mathrm{M_\oplus}$ mass, and $1.63\pm0.14\mathrm{R_\oplus}$ radius. Circular models are preferred for all, although for TOI-260 b the eccentricity is not well-constrained. We compute bulk densities and place the planets in the context of composition models. TOI-260 b lies within the radius valley, and is most likely a rocky planet. However, the uncertainty on the eccentricity and thus on the mass renders its composition hard to determine. TOI-286 b and c span the radius valley, with TOI-286 b lying below it and having a likely rocky composition, while TOI-286 c is within the valley, close to the upper border, and probably has a significant water fraction. With our updated parameters for TOI-134 b, we obtain a lower density than previous findings, giving a rocky or Earth-like composition. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 61 pages (of which pp. 24-61 are appendices), 20 figures (main text). Accepted for publication in A&A

arXiv:2406.05712 [pdf, other]

Demystifying the Characteristics for Smart Contract Upgrades

Authors: Ye Liu, Shuo Li, Xiuheng Wu, Yi Li, Zhiyang Chen, David Lo

Abstract: Upgradable smart contracts play an important role in the decentralized application ecosystem, to support routine maintenance, security patching, and feature additions. In this paper, we conduct an empirical study on proxy-based upgradable smart contracts to understand the characteristics of contract upgrading. Through our study on 57,118 open source proxy contracts, we found that 583 contracts hav… ▽ More Upgradable smart contracts play an important role in the decentralized application ecosystem, to support routine maintenance, security patching, and feature additions. In this paper, we conduct an empirical study on proxy-based upgradable smart contracts to understand the characteristics of contract upgrading. Through our study on 57,118 open source proxy contracts, we found that 583 contracts have ever been upgraded on Ethereum, involving 973 unique implementation contract versions. The results show that developers often intend to improve usability of contracts if upgrading, where functionality addition and update are the most frequent upgrade intentions. We investigated the practical impacts of contract upgrades, e.g., breaking changes causing compatibility issues, storage collisions and initialization risks leading to security vulnerabilities. The results demonstrate that there are 4,334 ABI breaking changes due to the upgrades of 276 proxies, causing real-world broken usages within 584 transactions witnessed by the blockchain; 36 contract upgrades had storage collisions and five proxies with 59 implementation contracts are vulnerable to initialization attacks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04398 [pdf, other]

lenscat: a Public and Community-Contributed Catalog of Known Strong Gravitational Lenses

Authors: L. Vujeva, R. K. L. Lo, J. M. Ezquiaga, J. C. L. Chan

Abstract: We present lenscat, a public and community-contributed catalog of strong gravitational lenses found by electromagnetic surveys. The main objective of lenscat is to compile a simple, easy-to-access catalog that can be used in a variety of lensing studies, such as facilitating the search for the host galaxy of a candidate strongly lensed transient event. We also provide a python package to interact… ▽ More We present lenscat, a public and community-contributed catalog of strong gravitational lenses found by electromagnetic surveys. The main objective of lenscat is to compile a simple, easy-to-access catalog that can be used in a variety of lensing studies, such as facilitating the search for the host galaxy of a candidate strongly lensed transient event. We also provide a python package to interact with tools commonly used by the community. This allows end users both with and without lensing expertise to obtain a list of known strong lenses within a given search area, and to also rank them by their respective searched probabilities. Here, we exemplify this by crossmatching the gravitational wave joint sky localization region of an interesting pair of events GW170104-GW170814. Other examples with short gamma-ray bursts are given. Thanks to the open and simple infrastructure of lenscat, members of the lensing community can directly add newly found lenses from their own studies to help create a long-lasting catalog that is as exhaustive and accessible as possible. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 7 pages, 2 figures

arXiv:2406.02523 [pdf, other]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Authors: Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, Yuke Zhu

Abstract: Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyd… ▽ More Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. RoboCasa features realistic and diverse scenes focusing on kitchen environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. We enrich the realism and diversity of our simulation with generative AI tools, such as object assets from text-to-3D models and environment textures from text-to-image models. We design a set of 100 tasks for systematic evaluation, including composite tasks generated by the guidance of large language models. To facilitate learning, we provide high-quality human demonstrations and integrate automated trajectory generation methods to substantially enlarge our datasets with minimal human burden. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks. Videos and open-source code are available at https://robocasa.ai/ △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: RSS 2024

arXiv:2406.00720 [pdf, other]

Age-Gain-Dependent Random Access for Event-Driven Periodic Updating

Authors: Yuqing Zhu, Yiwen Zhu, Aoyu Gong, Yan Lin, Yuan-Hsuan Lo, Yi** Zhang

Abstract: This paper considers utilizing the knowledge of age gains to reduce the network average age of information (AoI) in random access with event-driven periodic updating for the first time. Built on the form of slotted ALOHA, we require each device to determine its age gain threshold and transmission probability in an easily implementable decentralized manner, so that the unavoided contention can be l… ▽ More This paper considers utilizing the knowledge of age gains to reduce the network average age of information (AoI) in random access with event-driven periodic updating for the first time. Built on the form of slotted ALOHA, we require each device to determine its age gain threshold and transmission probability in an easily implementable decentralized manner, so that the unavoided contention can be limited to devices with age gains as high as possible. For the basic case that each device utilizes its knowledge of age gain of only itself, we provide an analytical modeling approach by a multi-layer discrete-time Markov chains (DTMCs), where an external infinite-horizon DTMC manages the jumps between the beginnings of frames and an internal finite-horizon DTMC manages the evolution during an arbitrary frame. Such modelling enables that optimal access parameters can be obtained offline. For the enhanced case that each device utilizes its knowledge of age gains of all the devices, we require each device to adjust its access parameters for maximizing the estimated network \textit{expected AoI reduction} (EAR) per slot, which captures the essential for improving the contribution of the throughput to the AoI performance. To estimate the network EAR, we require each device to use Bayes' rule to keep a posteriori joint probability distribution of local age and age gain of an arbitrary device based on the channel observations. Numerical results validate our theoretical analysis and demonstrate the advantage of the proposed schemes over the existing schemes in a wide range of network configurations. △ Less

Submitted 27 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20458 [pdf, other]

Contingency-Aware Station-Kee** Control of Halo Orbits

Authors: Fausto Vega, Zachary Manchester, Martin Lo, Ricardo Restrepo

Abstract: We present an algorithm to perform fuel-optimal stationkee** for spacecraft in unstable halo orbits with additional constraints to ensure safety in the event of a control failure. We formulate a convex trajectory-optimization problem to generate impulsive spacecraft maneuvers to loosely track a halo orbit using a receding-horizon controller. Our solution also provides a safe exit strategy in the… ▽ More We present an algorithm to perform fuel-optimal stationkee** for spacecraft in unstable halo orbits with additional constraints to ensure safety in the event of a control failure. We formulate a convex trajectory-optimization problem to generate impulsive spacecraft maneuvers to loosely track a halo orbit using a receding-horizon controller. Our solution also provides a safe exit strategy in the event that propulsion is lost at any point in the mission. We validate our algorithm in simulations of the three-body Earth-Moon and Saturn-Enceladus systems, demonstrating both low total delta-v and a safe contingency plan throughout the mission. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20305 [pdf, other]

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

Authors: Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee

Abstract: We introduce PlausiVL, a large video-language model for anticipating action sequences that are plausible in the real-world. While significant efforts have been made towards anticipating future actions, prior approaches do not take into account the aspect of plausibility in an action sequence. To address this limitation, we explore the generative capability of a large video-language model in our wo… ▽ More We introduce PlausiVL, a large video-language model for anticipating action sequences that are plausible in the real-world. While significant efforts have been made towards anticipating future actions, prior approaches do not take into account the aspect of plausibility in an action sequence. To address this limitation, we explore the generative capability of a large video-language model in our work and further, develop the understanding of plausibility in an action sequence by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss. We utilize temporal logical constraints as well as verb-noun action pair logical constraints to create implausible/counterfactual action sequences and use them to train the model with plausible action sequence learning loss. This loss helps the model to differentiate between plausible and not plausible action sequences and also helps the model to learn implicit temporal cues crucial for the task of action anticipation. The long-horizon action repetition loss puts a higher penalty on the actions that are more prone to repetition over a longer temporal window. With this penalization, the model is able to generate diverse, plausible action sequences. We evaluate our approach on two large-scale datasets, Ego4D and EPIC-Kitchens-100, and show improvements on the task of action anticipation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2405.19514 [pdf, other]

doi 10.1145/3656420

Wavefront Threading Enables Effective High-Level Synthesis

Authors: Blake Pelton, Adam Sapek, Ken Eguro, Daniel Lo, Alessandro Forin, Matt Humphrey, **wen Xi, David Cox, Rajas Karandikar, Johannes de Fine Licht, Evgeny Babin, Adrian Caulfield, Doug Burger

Abstract: Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware… ▽ More Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware designs. This paper describes Kanagawa, a language that takes a new approach to combine the programmer productivity benefits of traditional High-Level Synthesis (HLS) approaches with the expressibility and hardware efficiency of Register-Transfer Level (RTL) design. The language's concise syntax, matched with a hardware design-friendly execution model, permits a relatively simple toolchain to map high-level code into efficient hardware implementations. △ Less

Submitted 10 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted to PLDI'24

arXiv:2405.19413 [pdf, other]

VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture

Authors: Heesup Yun, Sassoum Lo, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles

Abstract: Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature acc… ▽ More Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computer vision techniques, particularly deep learning networks, we propose a method, called $\textbf{VisTA-SR}$ ($\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18954 [pdf, other]

Determining state space anomalies in mean field games

Authors: Hongyu Liu, Catharine W. K. Lo

Abstract: In this paper, we are concerned with the inverse problem of determining anomalies in the state space associated with the stationary mean field game (MFG) system. We establish novel unique identifiability results for the intrinsic structure of these anomalies in mean field games systems, including their topological structure and parameter configurations, in several general scenarios of practical in… ▽ More In this paper, we are concerned with the inverse problem of determining anomalies in the state space associated with the stationary mean field game (MFG) system. We establish novel unique identifiability results for the intrinsic structure of these anomalies in mean field games systems, including their topological structure and parameter configurations, in several general scenarios of practical interest, including traffic flow, market economics and epidemics. To the best of our knowledge, this is the first work that considers anomalies in the state space for the nonlinear coupled MFG system. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Keywords: Stationary mean field games, inverse boundary problems, anomalies in state space, singularities, uniqueness

MSC Class: Primary 35R30; secondary 35Q89; 91A16; 35R35

arXiv:2405.18943 [pdf, other]

Decoding a mean field game by the Cauchy data around its unknown stationary states

Authors: Hongyu Liu, Catharine W. K. Lo, Shen Zhang

Abstract: In recent years, mean field games (MFGs) have garnered considerable attention and emerged as a dynamic and actively researched field across various domains, including economics, social sciences, finance, and transportation. The inverse design and decoding of MFGs offer valuable means to extract information from observed data and gain insights into the intricate underlying dynamics and strategies o… ▽ More In recent years, mean field games (MFGs) have garnered considerable attention and emerged as a dynamic and actively researched field across various domains, including economics, social sciences, finance, and transportation. The inverse design and decoding of MFGs offer valuable means to extract information from observed data and gain insights into the intricate underlying dynamics and strategies of these complex physical systems. This paper presents a novel approach to the study of inverse problems in MFGs by analyzing the Cauchy data around their unknown stationary states. This study distinguishes itself from existing inverse problem investigations in three key significant aspects: Firstly, we consider MFG problems in a highly general form. Secondly, we address the technical challenge of the probability measure constraint by utilizing Cauchy data in our inverse problem study. Thirdly, we enhance existing high order linearization methods by introducing a novel approach that involves conducting linearization around non-trivial stationary states of the MFG system, which are not a-priori known. These contributions provide new insights and offer promising avenues for studying inverse problems for MFGs. By unraveling the hidden structure of MFGs, researchers and practitioners can make informed decisions, optimize system performance, and address real-world challenges more effectively. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Keywords: Mean field games, inverse problems, Cauchy data, unique continuation principle, unique identifiability

MSC Class: Primary 35Q89; 35R30; secondary 91A16; 35R35

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less