-
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
Authors:
Ruida Wang,
Jipeng Zhang,
Yizhen Jia,
Rui Pan,
Shizhe Diao,
Renjie Pi,
Tong Zhang
Abstract:
Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. Similar methods have shown promising results in code generation. However, most modern LLMs exhibit suboptimal performance…
▽ More
Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. Similar methods have shown promising results in code generation. However, most modern LLMs exhibit suboptimal performance due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data. This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs. To address the challenges, this paper proposes **TheoremLlama**, an end-to-end framework to train a general-purpose LLM to become a Lean4 expert. This framework encompasses NL-FL aligned dataset generation methods, training approaches for the LLM formal theorem prover, and techniques for LLM Lean4 proof writing. Using the dataset generation method, we provide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped dataset. A key innovation in this framework is the NL-FL bootstrap** method, where NL proofs are integrated into Lean4 code for training datasets, leveraging the NL reasoning ability of LLMs for formal reasoning. The **TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline of 22.95% and 25.41%. We have also open-sourced our model checkpoints and generated dataset, and will soon make all the code publicly available.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Authors:
Rui Pan,
Jipeng Zhang,
Xingyuan Pan,
Renjie Pi,
Xiaoyu Wang,
Tong Zhang
Abstract:
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu…
▽ More
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particularly in the context of large language models (LLMs). This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting. By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to 34-billion-parameter LLMs on eight A40 GPUs, marking the first successful application of bilevel optimization under practical scenarios for large-sized LLMs. Empirically, extensive experiments on data reweighting verify the effectiveness of ScaleBiO for different-scaled models, including GPT-2, LLaMA-3-8B, GPT-NeoX-20B, and Yi-34B, where bilevel optimization succeeds in filtering irrelevant data samples and selecting informative samples. Theoretically, ScaleBiO ensures the optimality of the learned data weights, along with a convergence guarantee matching the conventional first-order bilevel optimization paradigm on smooth and strongly convex objectives.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Authors:
Yuxing Liu,
Rui Pan,
Tong Zhang
Abstract:
Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can d…
▽ More
Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can demonstrate the benefit of Adagrad over SGD was obtained in the original paper of Adagrad for nonsmooth objective functions. However, for nonsmooth objective functions, there can be a linear slowdown of convergence when batch size increases, and thus a convergence analysis based on nonsmooth assumption cannot be used for large batch algorithms. In this work, we resolve this gap between theory and practice by providing a new analysis of Adagrad on both convex and nonconvex smooth objectives suitable for the large batch setting. It is shown that under the anisotropic smoothness and noise conditions, increased batch size does not slow down convergence for Adagrad, and thus it can still achieve a faster convergence guarantee over SGD even in the large batch setting. We present detailed comparisons between SGD and Adagrad to provide a better understanding of the benefits of adaptive gradient methods. Experiments in logistic regression and instruction following fine-tuning tasks provide strong evidence to support our theoretical analysis.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Authors:
Renjie Pi,
Jianshu Zhang,
Jipeng Zhang,
Rui Pan,
Zhekai Chen,
Tong Zhang
Abstract:
Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scra** of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy. Another is t…
▽ More
Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scra** of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy. Another is through human labeling. Datasets such as COCO are generally very short and lack details. Although detailed image descriptions can be annotated by humans, the high annotation cost limits the feasibility. These limitations underscore the need for more efficient and scalable methods to generate accurate and detailed image descriptions. In this paper, we propose an innovative framework termed Image Textualization (IT), which automatically produces high-quality image descriptions by leveraging existing multi-modal large language models (MLLMs) and multiple vision expert models in a collaborative manner, which maximally convert the visual information into text. To address the current lack of benchmarks for detailed descriptions, we propose several benchmarks for comprehensive evaluation, which verifies the quality of image descriptions created by our framework. Furthermore, we show that LLaVA-7B, benefiting from training on IT-curated descriptions, acquire improved capability to generate richer image descriptions, substantially increasing the length and detail of their output with less hallucination.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Universal properties of the evolution of the Universe in modified loop quantum cosmology
Authors:
Jamal Saeed,
Rui Pan,
Christian Brown,
Gerald Clevear,
Anzhong Wang
Abstract:
In this paper, we systematically study the evolution of the Universe in the framework of a modified loop quantum cosmological model (mLQC-I) with various inflationary potentials, including chaotic, Starobinsky, generalized Starobinsky, polynomials of the first and second kinds, generalized T- models and natural inflation. In all these models, the big bang singularity is represented by a quantum bo…
▽ More
In this paper, we systematically study the evolution of the Universe in the framework of a modified loop quantum cosmological model (mLQC-I) with various inflationary potentials, including chaotic, Starobinsky, generalized Starobinsky, polynomials of the first and second kinds, generalized T- models and natural inflation. In all these models, the big bang singularity is represented by a quantum bounce, and the evolution of the Universe both before and after the bounce is universal and weakly depends on the inflationary potentials, as long as the evolution is dominated by the kinetic energy of the inflaton at the bounce. In particular, the evolution in the pre-bounce region can be universally divided into three different phases: pre-bouncing, pre-transition, and pre-de Sitter. The pre-bouncing phase occurs immediately before the quantum bounce, during which the evolution of the Universe is dominated by the kinetic energy of the inflaton. Thus, the equation of state of the inflaton is about one, w = 1. Soon, the inflation potential takes over, so w rapidly falls from one to negative one. This pre-transition phase is very short and quickly turns into the pre-de Sitter phase, whereby the effective cosmological constant with a Planck size takes over and dominates the rest of the contracting phase. In the entire pre-bounce regime, the evolution of the expansion factor and the inflaton can be approximated by analytical solutions, which are universal and independent of the inflation potentials.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Linear correlations of Gibbs free energy for rare earth element oxide, hydroxide, chloride, fluoride, carbonate, and ferrite minerals and crystalline solids
Authors:
Ruiguang Pan,
Chen Zhu
Abstract:
Rare Earth Elements (REE) are critical minerals (metals) for the transition from fossil fuels to renewable and clean energy. Accurate thermodynamic properties of REE minerals and other crystalline solids are crucial for geochemical modeling of the solubility, speciation, and transport of REE in ore formation, extraction, chemical processing, and recycling processes. However, the Gibbs free energie…
▽ More
Rare Earth Elements (REE) are critical minerals (metals) for the transition from fossil fuels to renewable and clean energy. Accurate thermodynamic properties of REE minerals and other crystalline solids are crucial for geochemical modeling of the solubility, speciation, and transport of REE in ore formation, extraction, chemical processing, and recycling processes. However, the Gibbs free energies of formation (DGof, REEX) for these solids from different sources vary by 10s kJ/mol. We applied the Sverjensky linear free energy relationship (LFER) to evaluate their internal consistency and predict the unavailable DGof of the REE solids. By considering both the effects of ionic radius size and corresponding aqueous ion properties, the Sverjensky LFER allows estimates with much accuracy and precision. Here, rREEZ+ represents the Shannon-Prewitt ionic radii of REEZ+, and DGon, REEZ+ denotes the non-solvation contribution to the DGof of the aqueous REEZ+ ion. X represents the remainder of the compounds. In this study, the parameters aREEX, bREEX, and beta REEX were regressed from DGof compilations in the literature for 13 isostructural families. Based on these linear relationships, we recommend a set of internally consistent DGof, REEX for 119 end-members of REE oxides, hydroxides, chlorides, fluorides, carbonates, hydrous carbonates, and ferrites. These DGof, REEX are combined with experimental or predicted values of So, Vo, and Cpo from the literature and incorporated into a new SUPCRT database, which allows the calculations of thermodynamic properties to high P-T conditions (e.g., up to 1000 oC and 5 kb).
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Data Quality in Crowdsourcing and Spamming Behavior Detection
Authors:
Yang Ba,
Michelle V. Mancenido,
Erin K. Chiou,
Rong Pan
Abstract:
As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credib…
▽ More
As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Medium Bands, Mega Science: a JWST/NIRCam Medium-Band Imaging Survey of Abell 2744
Authors:
Katherine A. Suess,
John R. Weaver,
Sedona H. Price,
Richard Pan,
Bingjie Wang,
Rachel Bezanson,
Gabriel Brammer,
Sam E. Cutler,
Ivo Labbe,
Joel Leja,
Christina C. Williams,
Katherine E. Whitaker,
Pratika Dayal,
Anna de Graaff,
Robert Feldmann,
Marijn Franx,
Yoshinobu Fudamoto,
Seiji Fujimoto,
Lukas J. Furtak,
Andy D. Goulding,
Jenny E. Greene,
Gourav Khullar,
Vasily Kokorev,
Mariska Kriek,
Brian Lorenz
, et al. (17 additional authors not shown)
Abstract:
In this paper, we describe the "Medium Bands, Mega Science" JWST Cycle 2 survey (JWST-GO-4111) and demonstrate the power of these data to reveal both the spatially-integrated and spatially-resolved properties of galaxies from the local universe to the era of cosmic dawn. Executed in November 2023, MegaScience obtained ~30 arcmin^2 of deep multiband NIRCam imaging centered on the z~0.3 Abell 2744 c…
▽ More
In this paper, we describe the "Medium Bands, Mega Science" JWST Cycle 2 survey (JWST-GO-4111) and demonstrate the power of these data to reveal both the spatially-integrated and spatially-resolved properties of galaxies from the local universe to the era of cosmic dawn. Executed in November 2023, MegaScience obtained ~30 arcmin^2 of deep multiband NIRCam imaging centered on the z~0.3 Abell 2744 cluster, including eleven medium-band filters and the two shortest-wavelength broad-band filters, F070W and F090W. Together, MegaScience and the UNCOVER Cycle 1 treasury program provide a complete set of deep (~28-30 mag) images in all NIRCam medium- and broad-band filters. This unique dataset allows us to precisely constrain photometric redshifts, map stellar populations and dust attenuation for large samples of distant galaxies, and examine the connection between galaxy structures and formation histories. MegaScience also includes ~17 arcmin^2 of NIRISS parallel imaging in two broad-band and four medium-band filters from 0.9-4.8um, expanding the footprint where robust spectral energy distribution (SED) fitting is possible. We provide example SEDs and multi-band cutouts at a variety of redshifts, and use a catalog of JWST spectroscopic redshifts to show that MegaScience improves both the scatter and catastrophic outlier rate of photometric redshifts by factors of 2-3. Additionally, we demonstrate the spatially-resolved science enabled by MegaScience by presenting maps of the [OIII] line emission and continuum emission in three spectroscopically-confirmed z>6 galaxies. We show that line emission in reionization-era galaxies can be clumpy, extended, and spatially offset from continuum emission, implying that galaxy assembly histories are complex even at these early epochs. We publicly release fully reduced mosaics and photometric catalogs for both the NIRCam primary and NIRISS parallel fields.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model
Authors:
Jihao Dong,
Renjie Pan,
Hua Yang
Abstract:
Human-Object Interaction (HOI) detection aims to localize human-object pairs and comprehend their interactions. Recently, two-stage transformer-based methods have demonstrated competitive performance. However, these methods frequently focus on object appearance features and ignore global contextual information. Besides, vision-language model CLIP which effectively aligns visual and text embeddings…
▽ More
Human-Object Interaction (HOI) detection aims to localize human-object pairs and comprehend their interactions. Recently, two-stage transformer-based methods have demonstrated competitive performance. However, these methods frequently focus on object appearance features and ignore global contextual information. Besides, vision-language model CLIP which effectively aligns visual and text embeddings has shown great potential in zero-shot HOI detection. Based on the former facts, We introduce a novel HOI detector named ISA-HOI, which extensively leverages knowledge from CLIP, aligning interactive semantics between visual and textual features. We first extract global context of image and local features of object to Improve interaction Features in images (IF). On the other hand, we propose a Verb Semantic Improvement (VSI) module to enhance textual features of verb labels via cross-modal fusion. Ultimately, our method achieves competitive results on the HICO-DET and V-COCO benchmarks with much fewer training epochs, and outperforms the state-of-the-art under zero-shot settings.
△ Less
Submitted 24 May, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Scaling Instructable Agents Across Many Simulated Worlds
Authors:
SIMA Team,
Maria Abi Raad,
Arun Ahuja,
Catarina Barros,
Frederic Besse,
Andrew Bolt,
Adrian Bolton,
Bethanie Brownfield,
Gavin Buttimore,
Max Cant,
Sarah Chakera,
Stephanie C. Y. Chan,
Jeff Clune,
Adrian Collister,
Vikki Copeman,
Alex Cullum,
Ishita Dasgupta,
Dario de Cesare,
Julia Di Trapani,
Yani Donchev,
Emma Dunleavy,
Martin Engelcke,
Ryan Faulkner,
Frankie Garcia,
Charles Gbadamosi
, et al. (68 additional authors not shown)
Abstract:
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio…
▽ More
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.
△ Less
Submitted 17 April, 2024; v1 submitted 13 March, 2024;
originally announced April 2024.
-
A Latent Factor Model for High-Dimensional Binary Data
Authors:
Jiaxin Shi,
Yuan Gao,
Rui Pan,
Hansheng Wang
Abstract:
In this study, we develop a latent factor model for analysing high-dimensional binary data. Specifically, a standard probit model is used to describe the regression relationship between the observed binary data and the continuous latent variables. Our method assumes that the dependency structure of the observed binary data can be fully captured by the continuous latent factors. To estimate the mod…
▽ More
In this study, we develop a latent factor model for analysing high-dimensional binary data. Specifically, a standard probit model is used to describe the regression relationship between the observed binary data and the continuous latent variables. Our method assumes that the dependency structure of the observed binary data can be fully captured by the continuous latent factors. To estimate the model, a moment-based estimation method is developed. The proposed method is able to deal with both discontinuity and high dimensionality. Most importantly, the asymptotic properties of the resulting estimators are rigorously established. Extensive simulation studies are presented to demonstrate the proposed methodology. A real dataset about product descriptions is analysed for illustration.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation
Authors:
Ruotong Pan,
Boxi Cao,
Hongyu Lin,
Xianpei Han,
Jia Zheng,
Sirui Wang,
Xunliang Cai,
Le Sun
Abstract:
The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and corre…
▽ More
The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and correctness of the generated outcomes. In this paper, we propose Credibility-aware Generation (CAG), a universally applicable framework designed to mitigate the impact of flawed information in RAG. At its core, CAG aims to equip models with the ability to discern and process information based on its credibility. To this end, we propose an innovative data transformation framework that generates data based on credibility, thereby effectively endowing models with the capability of CAG. Furthermore, to accurately evaluate the models' capabilities of CAG, we construct a comprehensive benchmark covering three critical real-world scenarios. Experimental results demonstrate that our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents, thereby maintaining robust performance. Moreover, our model supports customized credibility, offering a wide range of potential applications.
△ Less
Submitted 8 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies
Authors:
Tommaso Bonato,
Abdul Kabbani,
Daniele De Sensi,
Rong Pan,
Yanfang Le,
Costin Raiciu,
Mark Handley,
Timo Schneider,
Nils Blach,
Ahmad Ghalayini,
Daniel Alves,
Michael Papamichael,
Adrian Caulfield,
Torsten Hoefler
Abstract:
With the rapid growth of machine learning (ML) workloads in datacenters, existing congestion control (CC) algorithms fail to deliver the required performance at scale. ML traffic is bursty and bulk-synchronous and thus requires quick reaction and strong fairness. We show that existing CC algorithms that use delay as a main signal react too slowly and are not always fair. We design SMaRTT, a simple…
▽ More
With the rapid growth of machine learning (ML) workloads in datacenters, existing congestion control (CC) algorithms fail to deliver the required performance at scale. ML traffic is bursty and bulk-synchronous and thus requires quick reaction and strong fairness. We show that existing CC algorithms that use delay as a main signal react too slowly and are not always fair. We design SMaRTT, a simple sender-based CC algorithm that combines delay, ECN, and optional packet trimming for fast and precise window adjustments. At the core of SMaRTT lies the novel QuickAdapt algorithm that accurately estimates the bandwidth at the receiver. We show how to combine SMaRTT with a new per-packet traffic load-balancing algorithm called REPS to effectively reroute packets around congested hotspots as well as flaky or failing links. Our evaluation shows that SMaRTT alone outperforms EQDS, Swift, BBR, and MPRDMA by up to 50% on modern datacenter networks.
△ Less
Submitted 27 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Authors:
Rui Pan,
Xiang Liu,
Shizhe Diao,
Renjie Pi,
Jipeng Zhang,
Chi Han,
Tong Zhang
Abstract:
The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource envir…
▽ More
The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource environments. Parameter Efficient Fine-Tuning techniques such as Low-Rank Adaptation (LoRA) have been proposed to alleviate this problem. However, in most large-scale fine-tuning settings, their performance does not reach the level of full parameter training because they confine the parameter search to a low-rank subspace. Attempting to complement this deficiency, we investigate the layerwise properties of LoRA on fine-tuning tasks and observe an unexpected but consistent skewness of weight norms across different layers. Utilizing this key observation, a surprisingly simple training strategy is discovered, which outperforms both LoRA and full parameter training in a wide range of settings with memory costs as low as LoRA. We name it Layerwise Importance Sampled AdamW (LISA), a promising alternative for LoRA, which applies the idea of importance sampling to different layers in LLMs and randomly freezes most middle layers during optimization. Experimental results show that with similar or less GPU memory consumption, LISA surpasses LoRA or even full parameter tuning in downstream fine-tuning tasks, where LISA consistently outperforms LoRA by over 10%-35% in terms of MT-Bench score while achieving on-par or better performance in MMLU, AGIEval and WinoGrande. On large models, specifically LLaMA-2-70B, LISA surpasses LoRA on MT-Bench, GSM8K, and PubMedQA, demonstrating its effectiveness across different domains.
△ Less
Submitted 25 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Bivariate temporal dependence via mixtures of rotated copulas
Authors:
Ruyi Pan,
Luis E. Nieto-Barajas,
Radu Craiu
Abstract:
Parametric bivariate copula families have been known to flexibly capture enough various dependence patterns, e.g., either positive or negative dependence in either the lower or upper tails of bivariate distributions. However, to the best of our knowledge, there is not a single parametric model adaptable enough to capture several of these features simultaneously. To address this, we propose a mixtu…
▽ More
Parametric bivariate copula families have been known to flexibly capture enough various dependence patterns, e.g., either positive or negative dependence in either the lower or upper tails of bivariate distributions. However, to the best of our knowledge, there is not a single parametric model adaptable enough to capture several of these features simultaneously. To address this, we propose a mixture of 4-way rotations of a parametric copula that is able to capture all these features. We illustrate the construction using the Clayton family but the concept is general and can be applied to other families. In order to include dynamic dependence regimes, the approach is extended to a time-dependent sequence of mixture copulas in which the mixture probabilities are allowed to evolve in time via a moving average type of relationship. The properties of the proposed model and its performance are examined using simulated and real data sets.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques
Authors:
Xuetong Li,
Yuan Gao,
Hong Chang,
Danyang Huang,
Yingying Ma,
Rui Pan,
Haobo Qi,
Feifei Wang,
Shuyuan Wu,
Ke Xu,
**g Zhou,
Xuening Zhu,
Yingqiu Zhu,
Hansheng Wang
Abstract:
This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas…
▽ More
This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Authors:
Renjie Pi,
Tianyang Han,
Wei Xiong,
Jipeng Zhang,
Runtao Liu,
Rui Pan,
Tong Zhang
Abstract:
Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards generating responses similar to their pretraining corpus, overshadowing the importance of visual information. We treat this bias as a "preference" for pretraining statistics, which hinders the model's grounding in visual input. To mitigate this issue, we pro…
▽ More
Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards generating responses similar to their pretraining corpus, overshadowing the importance of visual information. We treat this bias as a "preference" for pretraining statistics, which hinders the model's grounding in visual input. To mitigate this issue, we propose Bootstrapped Preference Optimization (BPO), which conducts preference learning with datasets containing negative responses bootstrapped from the model itself. Specifically, we propose the following two strategies: 1) using distorted image inputs to the MLLM for eliciting responses that contain signified pretraining bias; 2) leveraging text-based LLM to explicitly inject erroneous but common elements into the original response. Those undesirable responses are paired with original annotated responses from the datasets to construct the preference dataset, which is subsequently utilized to perform preference learning. Our approach effectively suppresses pretrained LLM bias, enabling enhanced grounding in visual inputs. Extensive experimentation demonstrates significant performance improvements across multiple benchmarks, advancing the state-of-the-art in multimodal conversational systems.
△ Less
Submitted 3 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs
Authors:
Hankz Hankui Zhuo,
Xin Chen,
Rong Pan
Abstract:
Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, wit…
▽ More
Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.
△ Less
Submitted 18 February, 2024;
originally announced March 2024.
-
UNCOVER NIRSpec/PRISM Spectroscopy Unveils Evidence of Early Core Formation in a Massive, Centrally Dusty Quiescent Galaxy at $z_{spec}=3.97$
Authors:
David J. Setton,
Gourav Khullar,
Tim B. Miller,
Rachel Bezanson,
Jenny E. Greene,
Katherine A. Suess,
Katherine E. Whitaker,
Jacqueline Antwi-Danso,
Hakim Atek,
Gabriel Brammer,
Sam E. Cutler,
Pratika Dayal,
Robert Feldmann,
Lukas J. Furtak,
Seiji Fujimoto,
Karl Glazebrook,
Andy D. Goulding,
Vasily Kokorev,
Ivo Labbe,
Joel Leja,
Yilun Ma,
Danilo Marchesini,
Themiya Nanayakkara,
Richard Pan,
Sedona H. Price
, et al. (6 additional authors not shown)
Abstract:
We report the spectroscopic confirmation of a massive ($\log(M_\star/M_\odot)=10.34 \pm_{0.07}^{0.06}$), HST-dark ($m_\mathrm{F150W} - m_\mathrm{F444W} = 3.6$) quiescent galaxy at $z_{spec}=3.97$ in the UNCOVER survey. NIRSpec/PRISM spectroscopy and a non-detection in deep ALMA imaging surprisingly reveals that the galaxy is consistent with a low ($<$10 $M_\odot \ \mathrm{yr^{-1}}$) star formation…
▽ More
We report the spectroscopic confirmation of a massive ($\log(M_\star/M_\odot)=10.34 \pm_{0.07}^{0.06}$), HST-dark ($m_\mathrm{F150W} - m_\mathrm{F444W} = 3.6$) quiescent galaxy at $z_{spec}=3.97$ in the UNCOVER survey. NIRSpec/PRISM spectroscopy and a non-detection in deep ALMA imaging surprisingly reveals that the galaxy is consistent with a low ($<$10 $M_\odot \ \mathrm{yr^{-1}}$) star formation rate despite evidence for moderate dust attenuation. The F444W image is well modeled with a two component \sersic fit that favors a compact, $r_e\sim200$ pc, $n\sim2.9$ component and a more extended, $r_e\sim1.6$ kpc, $n\sim1.7$ component. The galaxy exhibits strong color gradients: the inner regions are significantly redder than the outskirts. Spectral energy distribution models that reproduce both the red colors and low star formation rate in the center of UNCOVER 18407 require both significant ($A_v\sim1.4$ mag) dust attenuation and a stellar mass-weighted age of 900 Myr, implying 50\% of the stars in the core already formed by $z=7.5$. Using spatially resolved annular mass-to-light measurements enabled by the galaxy's moderate magnification ($μ=2.12\pm_{0.01}^{0.05}$) to reconstruct a radial mass profile from the best-fitting two-component \sersic model, we infer a total mass-weighted $r_\mathrm{eff} = 0.72 \pm_{0.11}^{0.15}$ kpc and log$(Σ_\mathrm{1 kpc} \ [\mathrm{M_\odot/kpc^2}]) = 9.61 \pm_{0.10}^{0.08}$. The early formation of a dense, low star formation rate, and dusty core embedded in a less attenuated stellar envelope suggests an evolutionary link between the earliest-forming massive galaxies and their elliptical descendants. Furthermore, the disparity between the global, integrated dust properties and the spatially resolved gradients highlights the importance of accounting for radially varying stellar populations when characterizing the early growth of galaxy structure.
△ Less
Submitted 12 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs
Authors:
Tianyang Han,
Qing Lian,
Rui Pan,
Renjie Pi,
Jipeng Zhang,
Shizhe Diao,
Yong Lin,
Tong Zhang
Abstract:
Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typi…
▽ More
Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from hallucination. To quantify the effect, we propose CorrelationQA, the first benchmark that assesses the hallucination level given spurious images. This benchmark contains 7,308 text-image pairs across 13 categories. Based on the proposed CorrelationQA, we conduct a thorough analysis on 9 mainstream MLLMs, illustrating that they universally suffer from this instinctive bias to varying degrees. We hope that our curated benchmark and evaluation results aid in better assessments of the MLLMs' robustness in the presence of misleading images. The resource is available in https://github.com/MasaiahHan/CorrelationQA.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Deciphering regulatory architectures from synthetic single-cell expression patterns
Authors:
Rosalind Wenshan Pan,
Tom Roeschinger,
Kian Faizi,
Hernan Garcia,
Rob Phillips
Abstract:
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome…
▽ More
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRA pipelines, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. To that end, in this paper we create tens of thousands of synthetic single-cell gene expression outputs using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and from this summary statistic to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the map** between mutations in binding sites and their corresponding expression profiles, a tool useful not only for better designing MPRAs, but also for exploring regulatory evolution.
△ Less
Submitted 5 June, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
AI for social science and social science of AI: A Survey
Authors:
Ruoxi Xu,
Yingfei Sun,
Mengjie Ren,
Shiguang Guo,
Ruotong Pan,
Hongyu Lin,
Le Sun,
Xianpei Han
Abstract:
Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically…
▽ More
Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically categorize previous explorations in the combination of AI and social science into two directions that share common technical approaches but differ in their research objectives. The first direction is focused on AI for social science, where AI is utilized as a powerful tool to enhance various stages of social science research. While the second direction is the social science of AI, which examines AI agents as social entities with their human-like cognitive and linguistic capabilities. By conducting a thorough review, particularly on the substantial progress facilitated by recent advancements in large language models, this paper introduces a fresh perspective to reassess the relationship between AI and social science, provides a cohesive framework that allows researchers to understand the distinctions and connections between AI for social science and social science of AI, and also summarized state-of-art experiment simulation platforms to facilitate research in these two directions. We believe that as AI technology continues to advance and intelligent agents find increasing applications in our daily lives, the significance of the combination of AI and social science will become even more prominent.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Force Propagation in Active Cytoskeletal Networks
Authors:
Shichen Liu,
Rosalind Wenshan Pan,
Heun ** Lee,
Shahriar Shadkhoo,
Fan Yang,
Chunhe Li,
Zijie Qu,
Rob Phillips,
Matt Thomson
Abstract:
In biological systems, molecular-scale forces and motions are pivotal for enabling processes like motility, shape change, and replication. These forces and motions are organized, amplified, and transmitted across macroscopic scales by active materials such as the cytoskeleton, which drives micron-scale cellular movement and re-organization. Despite the integral role of active materials, understand…
▽ More
In biological systems, molecular-scale forces and motions are pivotal for enabling processes like motility, shape change, and replication. These forces and motions are organized, amplified, and transmitted across macroscopic scales by active materials such as the cytoskeleton, which drives micron-scale cellular movement and re-organization. Despite the integral role of active materials, understanding how molecular-scale interactions alter macroscopic structure and force propagation remains elusive. This knowledge gap presents challenges to the harnessing and regulation of such dynamics across diverse length scales. Here, we demonstrate how mediating the bundling of microtubules can shift active matter between a global force-transmitting phase and a local force-dissipating phase. A fivefold increase in microtubule effective length results in the transition from local to global phase with a hundredfold increase in velocity autocorrelation. Through theory and simulation, we identify signatures of a percolation-driven transition between the two phases. This provides evidence for how force propagation can be generated when local molecular interactions reach a sufficient length scale. We show that force propagation in the active matter system enables material transport. Consequently, we demonstrate that the global phase is capable of facilitating millimeter-scale human cell transport and manipulation, as well as powering the movement of aqueous droplets. These findings underscore the potential for designing active materials capable of force organization and transmission. Our results lay the foundation for further exploration into the organization and propagation of forces/stresses in biological systems, thereby paving the way for the engineering of active materials in synthetic biology and soft robotics.
△ Less
Submitted 12 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
Authors:
Renjie Pi,
Tianyang Han,
Jianshu Zhang,
Yueqi Xie,
Rui Pan,
Qing Lian,
Hanze Dong,
Jipeng Zhang,
Tong Zhang
Abstract:
The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not cons…
▽ More
The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not considered during safety alignment, making MLLMs more prone to producing harmful responses. Unfortunately, unlike the discrete tokens considered in text-based LLMs, the continuous nature of image signals presents significant alignment challenges, which poses difficulty to thoroughly cover all possible scenarios. This vulnerability is exacerbated by the fact that most state-of-the-art MLLMs are fine-tuned on limited image-text pairs that are much fewer than the extensive text-based pretraining corpus, which makes the MLLMs more prone to catastrophic forgetting of their original abilities during safety fine-tuning. To tackle these challenges, we introduce MLLM-Protector, a plug-and-play strategy that solves two subtasks: 1) identifying harmful responses via a lightweight harm detector, and 2) transforming harmful responses into harmless ones via a detoxifier. This approach effectively mitigates the risks posed by malicious visual inputs without compromising the original performance of MLLMs. Our results demonstrate that MLLM-Protector offers a robust solution to a previously unaddressed aspect of MLLM security.
△ Less
Submitted 17 June, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
Authors:
Ernest Perkowski,
Rui Pan,
Tuan Dung Nguyen,
Yuan-Sen Ting,
Sandor Kruk,
Tong Zhang,
Charlie O'Neill,
Maja Jablonska,
Zechang Sun,
Michael J. Smith,
Huiling Liu,
Kevin Schawinski,
Kartheik Iyer,
Ioana Ciucă for UniverseTBD
Abstract:
We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like…
▽ More
We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community.
△ Less
Submitted 5 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Two Distinct Classes of Quiescent Galaxies at Cosmic Noon Revealed by JWST PRIMER and UNCOVER
Authors:
Sam E. Cutler,
Katherine E. Whitaker,
John R. Weaver,
Bingjie Wang,
Richard Pan,
Rachel Bezanson,
Lukas J. Furtak,
Ivo Labbe,
Joel Leja,
Sedona H. Price,
Yingjie Cheng,
Maike Clausen,
Fergus Cullen,
Pratika Dayal,
Anna de Graaff,
Mark Dickinson,
James S. Dunlop,
Robert Feldmann,
Marijn Franx,
Mauro Giavalisco,
Karl Glazebrook,
Jenny E. Greene,
Norman A. Grogin,
Garth Illingworth,
Anton M. Koekemoer
, et al. (9 additional authors not shown)
Abstract:
We present a measurement of the low-mass quiescent size-mass relation at Cosmic Noon (1<z<3) from the JWST PRIMER and UNCOVER treasury surveys, which highlights two distinct classes of quiescent galaxies. While the massive population is well studied at these redshifts, the low-mass end has been previously under-explored due to a lack of observing facilities with sufficient sensitivity and spatial…
▽ More
We present a measurement of the low-mass quiescent size-mass relation at Cosmic Noon (1<z<3) from the JWST PRIMER and UNCOVER treasury surveys, which highlights two distinct classes of quiescent galaxies. While the massive population is well studied at these redshifts, the low-mass end has been previously under-explored due to a lack of observing facilities with sufficient sensitivity and spatial resolution. We select a conservative sample of low-mass quiescent galaxy candidates using rest-frame UVJ colors and specific star formation rate criteria and measure galaxy morphology in both rest-frame UV/optical wavelengths (F150W) and rest-frame near-infrared (F444W). We confirm an unambiguous flattening of the low-mass quiescent size-mass relation, which results from the separation of the quiescent galaxy sample into two distinct populations at $\log(M_\star/M_\odot)\sim10.3$: low-mass quiescent galaxies that are notably younger and have disky structures, and massive galaxies consistent with spheroidal morphologies and older median stellar ages. These separate populations imply mass quenching dominates at the massive end while other mechanisms, such as environmental or feedback-driven quenching, form the low-mass end. This stellar mass dependent slope of the quiescent size-mass relation could also indicate a shift from size growth due to star formation (low masses) to growth via mergers (massive galaxies). The transition mass between these two populations also corresponds with other dramatic changes and characteristic masses in several galaxy evolution scaling relations (e.g. star-formation efficiency, dust obscuration, and stellar-halo mass ratios), further highlighting the stark dichotomy between low-mass and massive galaxy formation.
△ Less
Submitted 23 April, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Authors:
Rui Pan,
Yuxing Liu,
Xiaoyu Wang,
Tong Zhang
Abstract:
Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide…
▽ More
Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrtκ)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.
△ Less
Submitted 17 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Authors:
Yinwei Dai,
Rui Pan,
Anand Iyer,
Kai Li,
Ravi Netravali
Abstract:
Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper exp…
▽ More
Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper explores an alternate strategy to taming throughput-latency tradeoffs by changing the granularity at which inference is performed. We present Apparate, a system that automatically applies and manages early exits (EEs) in ML models, whereby certain inputs can exit with results at intermediate layers. To cope with the time-varying overhead and accuracy challenges that EEs bring, Apparate repurposes exits to provide continual feedback that powers several novel runtime monitoring and adaptation strategies. Apparate lowers median response latencies by 40.5-91.5% and 10.0-24.2% for diverse CV and NLP workloads, respectively, without affecting throughputs or violating tight accuracy constraints.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
JWST UNCOVER: The Overabundance of Ultraviolet-luminous Galaxies at $z>9$
Authors:
Iryna Chemerynska,
Hakim Atek,
Lukas J. Furtak,
Adi Zitrin,
Jenny E. Greene,
Pratika Dayal,
Andrea Weibel,
Vasily Kokorev,
Andy D. Goulding,
Christina C. Williams,
Themiya Nanayakkara,
Rachel Bezanson,
Gabriel Brammer,
Sam E. Cutler,
Ivo Labbe,
Joel Leja,
Richard Pan,
Sedona H. Price,
Bingjie Wang,
John R. Weaver,
Katherine E. Whitaker
Abstract:
Over the past year, JWST has uncovered galaxies at record-breaking distances up to $z \sim 13$. The JWST UNCOVER (ultra-deep NIRSpec and NIRcam observations before the epoch of reionization) program has obtained ultra-deep multiwavelength NIRCam imaging of the massive galaxy cluster Abell 2744 over $\sim 45$ arcmin$^{2}$ down to $\sim 29.5$ AB mag. Here, we present a robust ultraviolet (UV) lumino…
▽ More
Over the past year, JWST has uncovered galaxies at record-breaking distances up to $z \sim 13$. The JWST UNCOVER (ultra-deep NIRSpec and NIRcam observations before the epoch of reionization) program has obtained ultra-deep multiwavelength NIRCam imaging of the massive galaxy cluster Abell 2744 over $\sim 45$ arcmin$^{2}$ down to $\sim 29.5$ AB mag. Here, we present a robust ultraviolet (UV) luminosity function derived through lensing clusters at $9<z<12$. Using comprehensive end-to-end simulations, we account for all lensing effects and systematic uncertainties in deriving both the amplification factors and the effective survey volume. Our results confirm the intriguing excess of UV-bright galaxies ($M_{UV} < -20$ AB mag) previously reported at $z>9$ in recent JWST studies. In particular, a double power-law (DPL) describes better the bright-end of the luminosity function compared to the classical Schechter form. The number density of these bright galaxies is 10-100 times larger than theoretical predictions and previous findings based on Hubble Space Telescope (HST) observations. Additionally, we measure a star formation rate density of $ρ_{\rm SFR} = 10^{-2.64}$ M$_{\odot}$ yr$^{-1}$ Mpc$^{-3}$ at these redshifts, which is 4 to 10 times higher than galaxy formation models that assume a constant star formation efficiency. Future wide-area surveys and accurate modeling of lensing-assisted observations will reliably constrain both the bright and the dim end of the UV luminosity function at $z>9$, which will provide key benchmarks for galaxy formation models.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes
Authors:
Hao Zhao,
Rong Pan
Abstract:
Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data acquisition in CPD, we introduce the Derivative-Aware Change Detection (DACD) method. It leverages the derivative process of a Gaussian process (GP) for Active L…
▽ More
Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data acquisition in CPD, we introduce the Derivative-Aware Change Detection (DACD) method. It leverages the derivative process of a Gaussian process (GP) for Active Learning (AL), aiming to pinpoint change-point locations effectively. DACD balances the exploitation and exploration of derivative processes through multiple data acquisition functions (AFs). By utilizing GP derivative mean and variance as criteria, DACD sequentially selects the next sampling data point, thus enhancing algorithmic efficiency and ensuring reliable and accurate results. We investigate the effectiveness of DACD method in diverse scenarios and show it outperforms other active learning change-point detection approaches.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
THz-Driven Coherent Magnetization Dynamics in a Labyrinth Domain State
Authors:
M Riepp,
A Philippi-Kobs,
L Mueller,
R Froemter,
W Roseker,
R Rysov,
M Walther,
K Bagschik,
M Hennes,
D Gupta,
S Marotzke,
S Bajt,
R Pan,
T Golz,
N Stojanovic,
C Boeglin,
G Gruebel
Abstract:
Terahertz (THz) light pulses can be used for an ultrafast coherent manipulation of the magnetization. Driving the magnetization at THz frequencies is currently the fastest way of writing magnetic information in ferromagnets. Using time-resolved resonant magnetic scattering, we gain new insights to the THz-driven coherent magnetization dynamics on nanometer length scales. We observe ultrafast demag…
▽ More
Terahertz (THz) light pulses can be used for an ultrafast coherent manipulation of the magnetization. Driving the magnetization at THz frequencies is currently the fastest way of writing magnetic information in ferromagnets. Using time-resolved resonant magnetic scattering, we gain new insights to the THz-driven coherent magnetization dynamics on nanometer length scales. We observe ultrafast demagnetization and coherent magnetization oscillations that are governed by a time-dependent dam**. This dam** is determined by the interplay of lattice heating and magnetic anisotropy reduction revealing an upper speed limit for THz-induced magnetization switching. We show that in the presence of nanometer-sized magnetic domains, the ultrafast magnetization oscillations are associated with a correlated beating of the domain walls. The overall domain structure thereby remains largely unaffected which highlights the applicability of THz-induced switching on the nanoscale.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Plum: Prompt Learning using Metaheuristic
Authors:
Rui Pan,
Shuo Xing,
Shizhe Diao,
Wenhe Sun,
Xiang Liu,
Kashun Shum,
Renjie Pi,
Jipeng Zhang,
Tong Zhang
Abstract:
Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunatel…
▽ More
Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunately, few existing prompt learning methods satisfy the criteria of being truly "general", i.e., automatic, discrete, black-box, gradient-free, and interpretable all at once. In this paper, we introduce metaheuristics, a branch of discrete non-convex optimization methods with over 100 options, as a promising approach to prompt learning. Within our paradigm, we test six typical methods: hill climbing, simulated annealing, genetic algorithms with/without crossover, tabu search, and harmony search, demonstrating their effectiveness in white-box and black-box prompt learning. Furthermore, we show that these methods can be used to discover more human-understandable prompts that were previously unknown in both reasoning and image generation tasks, opening the door to a cornucopia of possibilities in prompt optimization. We release all the codes in \url{https://github.com/research4pan/Plum}.
△ Less
Submitted 30 June, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
Authors:
Yichi Zhang,
Jiayi Pan,
Yuchen Zhou,
Rui Pan,
Joyce Chai
Abstract:
Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this questio…
▽ More
Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a step** stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world. The code and data are available at https://github.com/vl-illusion/dataset.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
Quantifying the Effects of Known Unknowns on Inferred High-redshift Galaxy Properties: Burstiness, the IMF, and Nebular Physics
Authors:
Bingjie Wang,
Joel Leja,
Hakim Atek,
Ivo Labbe,
Yijia Li,
Rachel Bezanson,
Gabriel Brammer,
Sam E. Cutler,
Pratika Dayal,
Lukas J. Furtak,
Jenny E. Greene,
Vasily Kokorev,
Richard Pan,
Sedona H. Price,
Katherine A. Suess,
John R. Weaver,
Katherine E. Whitaker,
Christina C. Williams
Abstract:
The era of the James Webb Space Telescope ushers stellar population models into uncharted territories, particularly at the high-redshift frontier. In a companion paper, we apply the \texttt{Prospector} Bayesian framework to jointly infer galaxy redshifts and stellar population properties from broad-band photometry as part of the UNCOVER survey. Here we present a comprehensive error budget in spect…
▽ More
The era of the James Webb Space Telescope ushers stellar population models into uncharted territories, particularly at the high-redshift frontier. In a companion paper, we apply the \texttt{Prospector} Bayesian framework to jointly infer galaxy redshifts and stellar population properties from broad-band photometry as part of the UNCOVER survey. Here we present a comprehensive error budget in spectral energy distribution (SED) modeling. Using a sample selected to have photometric redshifts higher than 9, we quantify the systematic shifts stemming from various model choices in inferred stellar mass, star formation rate (SFR), and age. These choices encompass different timescales for changes in the star formation history (SFH), non-universal stellar initial mass functions (IMF), and the inclusion of variable nebular abundances, gas density and ionizing photon budget. We find that the IMF exerts the strongest influence on the inferred properties: the systematic uncertainties can be as much as 1 dex, 2--5 times larger than the formal reported uncertainties in mass and SFR; and importantly, exceed the scatter seen when using different SED fitting codes. Although the assumptions on the lower end of the IMF induce degeneracy, our findings suggest that a common practice in the literature of assessing uncertainties in SED-fitting processes by comparing multiple codes is substantively underestimating the true systematic uncertainty. Highly stochastic SFHs change the inferred SFH by much larger than the formal uncertainties, and introduce $\sim 0.8$ dex systematics in SFR averaged over short time scale and $\sim 0.3$ dex systematics in average age. Finally, employing a flexible nebular emission model causes $\sim 0.2$ dex systematic increase in mass and SFR, comparable to the formal uncertainty. This paper constitutes an initial step toward a complete uncertainty estimate in SED modeling.
△ Less
Submitted 8 January, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
UNCOVER: The rest ultraviolet to near infrared multiwavelength structures and dust distributions of sub-millimeter-detected galaxies in Abell 2744
Authors:
Sedona H. Price,
Katherine A. Suess,
Christina C. Williams,
Rachel Bezanson,
Gourav Khullar,
Erica J. Nelson,
Bingjie Wang,
John R. Weaver,
Seiji Fujimoto,
Vasily Kokorev,
Jenny E. Greene,
Gabriel Brammer,
Sam E. Cutler,
Pratika Dayal,
Lukas J. Furtak,
Ivo Labbe,
Joel Leja,
Tim B. Miller,
Themiya Nanayakkara,
Richard Pan,
Katherine E. Whitaker
Abstract:
With the wavelength coverage, sensitivity, and high spatial resolution of JWST, it is now possible to peer through the dust attenuation to probe the rest-frame near infrared (NIR) and stellar structures of extremely dusty galaxies at cosmic noon (z~1-3). In this paper we leverage the combined ALMA and JWST/HST coverage in Abell 2744 to study the multiwavelength (0.5-4.4um) structures of 11 sub-mil…
▽ More
With the wavelength coverage, sensitivity, and high spatial resolution of JWST, it is now possible to peer through the dust attenuation to probe the rest-frame near infrared (NIR) and stellar structures of extremely dusty galaxies at cosmic noon (z~1-3). In this paper we leverage the combined ALMA and JWST/HST coverage in Abell 2744 to study the multiwavelength (0.5-4.4um) structures of 11 sub-millimeter (sub-mm) detected galaxies at z~0.9-3.5 that are fainter than bright "classical" sub-mm galaxies (SMGs). While these objects reveal a diversity of structures and sizes, all exhibit decreasing sizes and increasing central concentration towards longer wavelengths. The smaller sizes of these objects at long wavelengths indicate that their stellar mass profiles are more compact than their optical light profiles, likely due to centrally-concentrated dust obscuration. Further, we find that galaxies with higher central concentration values tend to have more extreme size ratios (comparing the rest-frame NIR to rest-frame optical); this suggests that the galaxies with the most compact light distributions also have the most concentrated dust distributions. We also find the galaxies with the most extreme size ratios do not have elevated 1.2mm flux densities compared to the rest of our sample: we argue this means compact dust geometry, rather than e.g. high total dust quantity, drives the most extreme observed rest-frame NIR-to-optical size ratios. Upcoming higher resolution 1.2mm ALMA imaging will facilitate joint spatially-resolved analysis and will directly test the dust distributions within this representative sub-mm population.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
The UNCOVER Survey: A First-look HST+JWST Catalog of Galaxy Redshifts and Stellar Population Properties Spanning $0.2 \lesssim z \lesssim 15$
Authors:
Bingjie Wang,
Joel Leja,
Ivo Labbé,
Rachel Bezanson,
Katherine E. Whitaker,
Gabriel Brammer,
Lukas J. Furtak,
John R. Weaver,
Sedona H. Price,
Adi Zitrin,
Hakim Atek,
Dan Coe,
Sam E. Cutler,
Pratika Dayal,
Pieter van Dokkum,
Robert Feldmann,
Danilo Marchesini,
Marijn Franx,
Natascha Förster Schreiber,
Seiji Fujimoto,
Marla Geha,
Karl Glazebrook,
Anna de Graaff,
Jenny E. Greene,
Stéphanie Juneau
, et al. (19 additional authors not shown)
Abstract:
The recent UNCOVER survey with the James Webb Space Telescope (JWST) exploits the nearby cluster Abell 2744 to create the deepest view of our universe to date by leveraging strong gravitational lensing. In this work, we perform photometric fitting of more than 50,000 robustly detected sources out to $z \sim 15$. We show the redshift evolution of stellar ages, star formation rates, and rest-frame c…
▽ More
The recent UNCOVER survey with the James Webb Space Telescope (JWST) exploits the nearby cluster Abell 2744 to create the deepest view of our universe to date by leveraging strong gravitational lensing. In this work, we perform photometric fitting of more than 50,000 robustly detected sources out to $z \sim 15$. We show the redshift evolution of stellar ages, star formation rates, and rest-frame colors across the full range of $0.2 \lesssim z \lesssim 15$. The galaxy properties are inferred using the Prospector Bayesian inference framework using informative Prospector-$β$ priors on masses and star formation histories to produce joint redshift and stellar population posteriors, and additionally lensing magnification is performed on-the-fly to ensure consistency with the scale-dependent priors. We show that this approach produces excellent photometric redshifts with $σ_{\rm NMAD} \sim 0.03$, of a similar quality to the established photometric redshift code EAzY. In line with the open-source scientific objective of the Treasury survey, we publicly release the stellar population catalog with this paper, derived from the photometric catalog adapting aperture sizes based on source profiles. This release includes posterior moments, maximum-likelihood spectra, star-formation histories, and full posterior distributions, offering a rich data set to explore the processes governing galaxy formation and evolution over a parameter space now accessible by JWST.
△ Less
Submitted 16 April, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
DUALZ: Deep UNCOVER-ALMA Legacy High-Z Survey
Authors:
Seiji Fujimoto,
Rachel Bezanson,
Ivo Labbe,
Gabriel Brammer,
Sedona H. Price,
Bingjie Wang,
John R. Weaver,
Yoshinobu Fudamoto,
Pascal A. Oesch,
Christina C. Williams,
Pratika Dayal,
Robert Feldmann,
Jenny E. Greene,
Joel Leja,
Katherine E. Whitaker,
Adi Zitrin,
Sam E. Cutler,
Lukas J. Furtak,
Richard Pan,
Iryna Chemerynska,
Vasily Kokorev,
Tim B. Miller,
Hakim Atek,
Pieter van Dokkum,
Stephanie Juneau
, et al. (7 additional authors not shown)
Abstract:
We present the survey design and initial results of the ALMA Cycle 9 program of DUALZ, which aims to establish a joint ALMA and JWST public legacy field targeting the massive galaxy cluster Abell 2744. DUALZ features a contiguous $4'\times6'$ ALMA 30-GHz-wide mosaic in Band 6, covering areas of $μ>2$ down to a sensitivity of $σ=32.7~μ$Jy. Through a blind search, we identified 69 dust continuum sou…
▽ More
We present the survey design and initial results of the ALMA Cycle 9 program of DUALZ, which aims to establish a joint ALMA and JWST public legacy field targeting the massive galaxy cluster Abell 2744. DUALZ features a contiguous $4'\times6'$ ALMA 30-GHz-wide mosaic in Band 6, covering areas of $μ>2$ down to a sensitivity of $σ=32.7~μ$Jy. Through a blind search, we identified 69 dust continuum sources at S/N $\gtrsim5.0$ with median redshift and intrinsic 1.2-mm flux of $z=2.30$ and $S_{\rm 1.2mm}^{\rm int}=0.24$~mJy. Of these, 27 have been spectroscopically confirmed, leveraged by the latest NIRSpec observations, while photometric redshift estimates are constrained by the comprehensive HST, NIRCam, and ALMA data for the remaining sources. With priors, we further identify a [CII]158 $μ$m line emitter at $z=6.3254\pm0.0004$, confirmed by the latest NIRSpec spectroscopy. The NIRCam counterparts of the 1.2-mm continuum exhibit undisturbed morphologies, denoted either by disk or spheroid, implying the triggers for the faint mm emission are less catastrophic than mergers. We have identified 8 HST-dark galaxies (F150W$>$27mag, F150W$-$F444W$>$2.3) and 2 JWST-dark (F444W$>$30mag) galaxy candidates among the ALMA continuum sources. The former includes face-on disk galaxies, hinting that substantial dust obscuration does not always result from inclination. We also detect a marginal dust emission from an X-ray-detected galaxy at $z_{\rm spec}=10.07$, suggesting an active co-evolution of the central black hole and its host. We assess the infrared luminosity function up to $z\sim10$ and find it consistent with predictions from galaxy formation models. To foster diverse scientific outcomes from the community, we publicly release reduced ALMA mosaic maps, cubes, and the source catalog.
△ Less
Submitted 16 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Mitigating the Alignment Tax of RLHF
Authors:
Yong Lin,
Hangyu Lin,
Wei Xiong,
Shizhe Diao,
Jianmeng Liu,
Jipeng Zhang,
Rui Pan,
Haoxiang Wang,
Wenbin Hu,
Hanning Zhang,
Hanze Dong,
Renjie Pi,
Han Zhao,
Nan Jiang,
Heng Ji,
Yuan Yao,
Tong Zhang
Abstract:
LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite var…
▽ More
LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite various techniques to mitigate forgetting, they are often at odds with the RLHF performance, leading to a trade-off between reward maximization and forgetting mitigation.
In light of the above pressing issue in aligning LLMs, in this paper we explore model averaging, which interpolates between pre and post RLHF model weights, to achieve a more efficient reward-tax Pareto front. To understand its effectiveness, We offer theoretical insights into model averaging, revealing that it enhances performance Pareto front by increasing feature diversity on the layers where tasks share overlapped feature spaces. Empirical evidence corroborates our analysis by showing the benefits of averaging low-level transformer layers. Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different reward-tax trade-offs, we propose Adaptive Model Averaging (AMA) to adaptively find various combination ratios of model layers. AMA seeks to maximize the alignment reward while incurring minimal alignment tax. Moreover, we validate AMA's performance across a range of RLHF algorithms over OpenLLaMA-3B and further extend our findings to Mistral-7B.
△ Less
Submitted 5 February, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
UNCOVER spectroscopy confirms a surprising ubiquity of AGN in red galaxies at $z>5$
Authors:
Jenny E. Greene,
Ivo Labbe,
Andy D. Goulding,
Lukas J. Furtak,
Iryna Chemerynska,
Vasily Kokorev,
Pratika Dayal,
Christina C. Williams,
Bingjie Wang,
David J. Setton,
Adam J. Burgasser,
Rachel Bezanson,
Hakim Atek,
Gabriel Brammer,
Sam E. Cutler,
Robert Feldmann,
Seiji Fujimoto,
Karl Glazebrook,
Anna de Graaff,
Joel Leja,
Danilo Marchesini,
Michael V. Maseda,
Jorryt Matthee,
Tim B. Miller,
Rohan P. Naidu
, et al. (9 additional authors not shown)
Abstract:
JWST is revealing a new population of dust-reddened broad-line active galactic nuclei (AGN) at redshifts $z\gtrsim5$. Here we present deep NIRSpec/Prism spectroscopy from the Cycle 1 Treasury program UNCOVER of 15 AGN candidates selected to be compact, with red continua in the rest-frame optical but with blue slopes in the UV. From NIRCam photometry alone, they could have been dominated by dusty s…
▽ More
JWST is revealing a new population of dust-reddened broad-line active galactic nuclei (AGN) at redshifts $z\gtrsim5$. Here we present deep NIRSpec/Prism spectroscopy from the Cycle 1 Treasury program UNCOVER of 15 AGN candidates selected to be compact, with red continua in the rest-frame optical but with blue slopes in the UV. From NIRCam photometry alone, they could have been dominated by dusty star formation or AGN. Here we show that the majority of the compact red sources in UNCOVER are dust-reddened AGN: $60\%$ show definitive evidence for broad-line H$α$ with FWHM$\, >2000$ km/s, for $20\%$ current data are inconclusive, and $20\%$ are brown dwarf stars. We propose an updated photometric criterion to select red $z>5$ AGN that excludes brown dwarfs and is expected to yield $>80\%$ AGN. Remarkably, among all $z_{\rm phot}>5$ galaxies with F277W$-$F444W$>1$ in UNCOVER at least $33\%$ are AGN regardless of compactness, climbing to at least $80\%$ AGN for sources with F277W$-$F444W$>1.6$. The confirmed AGN have black hole masses of $10^7-10^9$ M$_{\odot}$. While their UV-luminosities ($-16>M_{\rm UV}>-20$ AB mag) are low compared to UV-selected AGN at these epochs, consistent with percent-level scattered AGN light or low levels of unobscured star formation, the inferred bolometric luminosities are typical of $10^7-10^9$ M$_{\odot}$ black holes radiating at $\sim 10-40\%$ of Eddington. The number densities are surprisingly high at $\sim10^{-5}$ Mpc$^{-3}$ mag$^{-1}$, 100 times more common than the faintest UV-selected quasars, while accounting for $\sim1\%$ of the UV-selected galaxies. While their UV-faintness suggest they may not contribute strongly to reionization, their ubiquity poses challenges to models of black hole growth.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Uniform Asymptotic Approximation Method with Pöschl-Teller Potential
Authors:
Rui Pan,
John Joseph Marchetta,
Jamal Saeed,
Gerald Cleaver,
Bao-Fei Li,
Anzhong Wang,
Tao Zhu
Abstract:
In this paper, we study analytical approximate solutions of the second-order homogeneous differential equations with the existence of only two turning points (but without poles), by using the uniform asymptotic approximation (UAA) method. To be more concrete, we consider the Pöschl-Teller (PT) potential, for which analytical solutions are known. Depending on the values of the parameters involved i…
▽ More
In this paper, we study analytical approximate solutions of the second-order homogeneous differential equations with the existence of only two turning points (but without poles), by using the uniform asymptotic approximation (UAA) method. To be more concrete, we consider the Pöschl-Teller (PT) potential, for which analytical solutions are known. Depending on the values of the parameters involved in the PT potential, we find that the upper bounds of the errors of the approximate solutions in general are $\lesssim 0.15\% \sim 10\% $, to the first-order approximation of the UAA method. The approximations can be easily extended to high-order, with which the errors are expected to be much smaller. Such obtained analytical solutions can be used to study cosmological perturbations in the framework of quantum cosmology, as well as quasi-normal modes of black holes.
△ Less
Submitted 5 January, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
UNCOVER: JWST Spectroscopy of Three Cold Brown Dwarfs at Kiloparsec-scale Distances
Authors:
Adam J. Burgasser,
Rachel Bezanson,
Ivo Labbe,
Gabriel Brammer,
Sam E. Cutler,
Lukas J. Furtak,
Jenny E. Greene,
Roman Gerasimov,
Joel Leja,
Richard Pan,
Sedona H. Price,
Bingjie Wang,
John R. Weaver,
Katherine E. Whitaker,
Seiji Fujimoto,
Vasily Kokorev,
Pratika Dayal,
Themiya Nanayakkara,
Christina C. Williams,
Danilo Marchesini,
Adi Zitrin,
Pieter van Dokkum
Abstract:
We report JWST/NIRSpec spectra of three distant T-type brown dwarfs identified in the Ultradeep NIRSpec and NIRCam ObserVations before the Epoch of Reionization (UNCOVER) survey of the Abell 2744 lensing field. One source was previously reported as a candidate T dwarf on the basis of NIRCam photometry, while two sources were initially identified as candidate active galactic nuclei. Low-resolution…
▽ More
We report JWST/NIRSpec spectra of three distant T-type brown dwarfs identified in the Ultradeep NIRSpec and NIRCam ObserVations before the Epoch of Reionization (UNCOVER) survey of the Abell 2744 lensing field. One source was previously reported as a candidate T dwarf on the basis of NIRCam photometry, while two sources were initially identified as candidate active galactic nuclei. Low-resolution 1--5 $μ$m spectra confirm the presence of molecular features consistent with T dwarf atmospheres, and comparison to spectral standards infers classifications of sdT1, T6, and T8--T9. The warmest source, UNCOVER-BD-1, shows evidence of subsolar metallicity, and atmosphere model fits indicates T$_{eff}$ = 1300 K and [M/H] $\sim$ $-$1.0, making this one of the few spectroscopically-confirmed T subdwarfs known. The coldest source, UNCOVER-BD-3, is near the T/Y dwarf boundary with T$_{eff}$ = 550 K, and our analysis indicates the presence of PH$_3$ in the 3--5~$μ$m region, favored over CO$_2$ and a possible indicator of subsolar metallicity. We estimate distances of 0.9--4.5 kpc from the Galactic midplane, making these the most distant brown dwarfs with spectroscopic confirmation. Population simulations indicate high probabilities of membership in the Galactic thick disk for two of these brown dwarfs, and potential halo membership for UNCOVER-BD-1. Our simulations indicate that there are approximately 5 T dwarfs and 1--2 L dwarfs in the Abell 2744 field down to F444W = 30 AB mag, roughly one-third of which are thick disk members. These results highlight the utility of deep JWST/NIRSpec spectroscopy for identifying and characterizing the oldest metal-poor brown dwarfs in the Milky Way.
△ Less
Submitted 7 February, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
UNCOVER: A NIRSpec Identification of a Broad Line AGN at z = 8.50
Authors:
Vasily Kokorev,
Seiji Fujimoto,
Ivo Labbe,
Jenny E. Greene,
Rachel Bezanson,
Pratika Dayal,
Erica J. Nelson,
Hakim Atek,
Gabriel Brammer,
Karina I. Caputi,
Iryna Chemerynska,
Sam E. Cutler,
Robert Feldmann,
Yoshinobu Fudamoto,
Lukas J. Furtak,
Andy D. Goulding,
Anna de Graaff,
Joel Leja,
Danilo Marchesini,
Tim B. Miller,
Themiya Nanayakkara,
Pascal Oesch,
Richard Pan,
Sedona H. Price,
David J. Setton
, et al. (7 additional authors not shown)
Abstract:
Deep observations with JWST have revealed an emerging population of red point-like sources that could provide a link between the postulated supermassive black hole seeds and observed quasars. In this work we present a JWST/NIRSpec spectrum from the JWST Cycle 1 UNCOVER Treasury survey, of a massive accreting black hole at $z=8.50$, displaying a clear broad-line component as inferred from the H$β$…
▽ More
Deep observations with JWST have revealed an emerging population of red point-like sources that could provide a link between the postulated supermassive black hole seeds and observed quasars. In this work we present a JWST/NIRSpec spectrum from the JWST Cycle 1 UNCOVER Treasury survey, of a massive accreting black hole at $z=8.50$, displaying a clear broad-line component as inferred from the H$β$ line with FWHM = $3439\pm413$ km s$^{-1}$, typical of the broad line region of an active galactic nucleus (AGN). The AGN nature of this object is further supported by high ionization, as inferred from emission lines, and a point-source morphology. We compute the black hole mass of log$_{10}(M_{\rm BH}/M_\odot)=8.17\pm0.42$, and a bolometric luminosity of $L_{\rm bol}\sim6.6\times10^{45}$ erg s$^{-1}$. These values imply that our object is accreting at $\sim 40\%$ of the Eddington limit. Detailed modeling of the spectral energy distribution in the optical and near-infrared, together with constraints from ALMA, indicate an upper limit on the stellar mass of log$_{10}(M_{\rm *}/M_\odot)<8.7$, which would lead to an unprecedented ratio of black hole to host mass of at least $\sim 30 \%$. This is orders of magnitude higher compared to the local QSOs, but is consistent with recent AGN studies at high redshift with JWST. This finding suggests that a non-negligible fraction of supermassive black holes either started out from massive seeds and/or grew at a super-Eddington rate at high redshift. Given the predicted number densities of high-$z$ faint AGN, future NIRSpec observations of larger samples will allow us to further investigate the galaxy-black hole co-evolution in the early Universe.
△ Less
Submitted 15 October, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
UNCOVER: A NIRSpec Census of Lensed Galaxies at z=8.50-13.08 Probing a High AGN Fraction and Ionized Bubbles in the Shadow
Authors:
Seiji Fujimoto,
Bingjie Wang,
John Weaver,
Vasily Kokorev,
Hakim Atek,
Rachel Bezanson,
Ivo Labbe,
Gabriel Brammer,
Jenny E. Greene,
Iryna Chemerynska,
Pratika Dayal,
Anna de Graaff,
Lukas J. Furtak,
Pascal A. Oesch,
David J. Setton,
Sedona H. Price,
Tim B. Miller,
Christina C. Williams,
Katherine E. Whitaker,
Adi Zitrin,
Sam E. Cutler,
Joel Leja,
Richard Pan,
Dan Coe,
Pieter van Dokkum
, et al. (11 additional authors not shown)
Abstract:
We present JWST NIRSpec prism spectroscopy of gravitationally lensed galaxies at $z\gtrsim9$ found behind the massive galaxy cluster Abell 2744 in the UNCOVER Cycle 1 Treasury Program. We confirm the source redshift via emission lines and/or the Ly$α$ break feature for ten galaxies at z=8.50-13.08 down to $M_{\rm UV}=-17.3$. We achieve a high confirmation rate of 100\% for $z>9$ candidates reporte…
▽ More
We present JWST NIRSpec prism spectroscopy of gravitationally lensed galaxies at $z\gtrsim9$ found behind the massive galaxy cluster Abell 2744 in the UNCOVER Cycle 1 Treasury Program. We confirm the source redshift via emission lines and/or the Ly$α$ break feature for ten galaxies at z=8.50-13.08 down to $M_{\rm UV}=-17.3$. We achieve a high confirmation rate of 100\% for $z>9$ candidates reported in Atek et al. (2023). Using six sources with multiple emission line detections, we find that the offset of the redshift estimates between the lines and the Ly$α$ break alone with prism can be as large as $\pm0.2$, raising caution in designing future follow-up spectroscopy for the break-only sources. With spec-$z$ confirmed sources in UNCOVER and the literature, we derive lower limits on the rest-frame ultraviolet (UV) luminosity function (LF) at $z\simeq9$-12 and find these lower limits to be consistent with recent photometric measurements. We identify at least two unambiguous and several possible active galactic nucleus (AGN) systems based on X-ray emission, broad line (BL) H$β$, high ionization line (e.g., NIV]1487, CIV1549) detections, and excess in UVLF. This requires the AGN LFs at $z\simeq$ 9-10 to be comparable or even higher than the X-ray AGN LF estimated at $z\sim6$ and indicates a plausible cause of the high abundance of $z>9$ galaxies claimed in recent photometric studies may be AGNs. One UV-luminous source is confirmed at the same redshift as a dusty BL AGN at $z=8.50$ with a physical separation of 380 kpc in the source plane. These two sources show blueward Ly$α$ line or continuum emission, suggesting that they reside in the same ionized bubble with a radius of $7.69\pm0.18$ pMpc. Our results imply that AGNs have a non-negligible contribution to cosmic reionization.
△ Less
Submitted 25 August, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Large-scale Multi-layer Academic Networks Derived from Statistical Publications
Authors:
Tianchen Gao,
Yan Zhang,
Rui Pan,
Hansheng Wang
Abstract:
The utilization of multi-layer network structures now enables the explanation of complex systems in nature from multiple perspectives. Multi-layer academic networks capture diverse relationships among academic entities, facilitating the study of academic development and the prediction of future directions. However, there are currently few academic network datasets that simultaneously consider mult…
▽ More
The utilization of multi-layer network structures now enables the explanation of complex systems in nature from multiple perspectives. Multi-layer academic networks capture diverse relationships among academic entities, facilitating the study of academic development and the prediction of future directions. However, there are currently few academic network datasets that simultaneously consider multi-layer academic networks; often, they only include a single layer. In this study, we provide a large-scale multi-layer academic network dataset, namely, LMANStat, which includes collaboration, co-institution, citation, co-citation, journal citation, author citation, author-paper and keyword co-occurrence networks. Furthermore, each layer of the multi-layer academic network is dynamic. Additionally, we expand the attributes of nodes, such as authors' research interests, productivity, region and institution. Supported by this dataset, it is possible to study the development and evolution of statistical disciplines from multiple perspectives. This dataset also provides fertile ground for studying complex systems with multi-layer structures.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Most of the photons that reionized the Universe came from dwarf galaxies
Authors:
Hakim Atek,
Ivo Labbé,
Lukas J. Furtak,
Iryna Chemerynska,
Seiji Fujimoto,
David J. Setton,
Tim B. Miller,
Pascal Oesch,
Rachel Bezanson,
Sedona H. Price,
Pratika Dayal,
Adi Zitrin,
Vasily Kokorev,
John R. Weaver,
Gabriel Brammer,
Pieter van Dokkum,
Christina C. Williams,
Sam E. Cutler,
Robert Feldmann,
Yoshinobu Fudamoto,
Jenny E. Greene,
Joel Leja,
Michael V. Maseda,
Adam Muzzin,
Richard Pan
, et al. (8 additional authors not shown)
Abstract:
The identification of sources driving cosmic reionization, a major phase transition from neutral Hydrogen to ionized plasma around 600-800 Myr after the Big Bang (Dayal et al. 2018, Mason et al. 2019, Robertson et al. 2022), has been a matter of intense debate (Robertson et al. 2022). Some models suggest that high ionizing emissivity and escape fractions ($f_{\rm esc}$) from quasars support their…
▽ More
The identification of sources driving cosmic reionization, a major phase transition from neutral Hydrogen to ionized plasma around 600-800 Myr after the Big Bang (Dayal et al. 2018, Mason et al. 2019, Robertson et al. 2022), has been a matter of intense debate (Robertson et al. 2022). Some models suggest that high ionizing emissivity and escape fractions ($f_{\rm esc}$) from quasars support their role in driving cosmic reionization (Madau & Haardt 2015, Mitra et al. 2018). Others propose that the high $f_{\rm esc}$ values from bright galaxies generates sufficient ionizing radiation to drive this process (Naidu et al. 2020). Finally, a few studies suggest that the number density of faint galaxies, when combined with a stellar-mass-dependent model of ionizing efficiency and $f_{\rm esc}$, can effectively dominate cosmic reionization (Finkelstein et al. 2019, Dayal et al. 2020). However, so far, low-mass galaxies have eluded comprehensive spectroscopic studies owing to their extreme faintness. Here we report an analysis of eight ultra-faint galaxies (in a very small field) during the epoch of reionization with absolute magnitudes between $M_{\rm UV}$ $\sim -17$ to $-15$ mag (down to 0.005 $L^{\star}$. We find that faint galaxies during the Universe's first billion years produce ionizing photons with log($ξ_{\mathrm{ion}}$/ Hz erg$^{-1}$) =$25.80\pm 0.14$, a factor of 4 higher than commonly assumed values (Robertson et al. 2015). If this field is representative of the large scale distribution of faint galaxies, the rate of ionizing photons exceeds that needed for reionization, even for escape fractions of order five per cent.
△ Less
Submitted 30 April, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
A supermassive black hole in the early universe growing in the shadows
Authors:
Lukas J. Furtak,
Ivo Labbé,
Adi Zitrin,
Jenny E. Greene,
Pratika Dayal,
Iryna Chemerynska,
Vasily Kokorev,
Tim B. Miller,
Andy D. Goulding,
Rachel Bezanson,
Gabriel B. Brammer,
Sam E. Cutler,
Joel Leja,
Richard Pan,
Sedona H. Price,
Bingjie Wang,
John R. Weaver,
Katherine E. Whitaker,
Hakim Atek,
Ákos Bogdán,
Stéphane Charlot,
Emma Curtis-Lake,
Pieter van Dokkum,
Ryan Endsley,
Yoshinobu Fudamoto
, et al. (12 additional authors not shown)
Abstract:
Early JWST observations have uncovered a new, substantial population of red sources that might represent a previously overlooked phase of actively growing supermassive black holes (Kocevski et al. 2023, Matthee et al. 2023, Labbe et al. 2023). One of the most intriguing examples is an extremely red, point-like object that was found to be triply-imaged by the strong lensing galaxy cluster Abell 274…
▽ More
Early JWST observations have uncovered a new, substantial population of red sources that might represent a previously overlooked phase of actively growing supermassive black holes (Kocevski et al. 2023, Matthee et al. 2023, Labbe et al. 2023). One of the most intriguing examples is an extremely red, point-like object that was found to be triply-imaged by the strong lensing galaxy cluster Abell 2744 (Furtak et al. 2023), allowing an unprecedented detailed look into this enigmatic population. Here we present deep spectroscopic JWST/NIRSpec observations of this object, Abell2744-QSO1. The spectroscopy confirms that the three images are of the same object, and that it is a highly reddened ($A_V\sim3$) broad emission-line Active Galactic Nucleus (AGN) at a redshift of $z_{\mathrm{spec}}=7.0451\pm0.0005$. From the width of H$β$ ($\mathrm{FWHM}=2800\pm250\,\frac{\mathrm{km}}{\mathrm{s}}$) we derive a black hole mass of $M_{\mathrm{BH}}=3_{-1}^{+2}\times10^7\,\mathrm{M}_{\odot}$. We infer a very high ratio of black hole to galaxy mass of at least 3% and possibly as high as 100%, an order of magnitude or more than is seen in local galaxies. The lack of strong metal lines in the spectrum together with the high bolometric luminosity ($L_{\mathrm{bol}}=(1.1\pm0.3)\times10^{45}\,\frac{\mathrm{erg}}{\mathrm{s}}$) suggest that we are seeing the black hole in a phase of rapid growth, accreting at 30% of the Eddington limit. Based on early JWST imaging studies we estimate that such heavily reddened, low-mass black holes can be $\sim100$ times more common than UV-selected ones at this epoch. The rapid growth and high black hole to galaxy mass ratio of A2744-QSO1 suggests that it may represent the missing link between black hole seeds (Inayoshi et al. 2020; Greene et al. 2020; Volonteri 2021) and the first luminous quasars (Fan et al. 2023).
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
UNCOVER: Illuminating the Early Universe -- JWST/NIRSpec Confirmation of $z > 12$ Galaxies
Authors:
Bingjie Wang,
Seiji Fujimoto,
Ivo Labbe,
Lukas J. Furtak,
Tim B. Miller,
David J. Setton,
Adi Zitrin,
Hakim Atek,
Rachel Bezanson,
Gabriel Brammer,
Joel Leja,
Pascal A. Oesch,
Sedona H. Price,
Iryna Chemerynska,
Sam E. Cutler,
Pratika Dayal,
Pieter van Dokkum,
Andy D. Goulding,
Jenny E. Greene,
Y. Fudamoto,
Gourav Khullar,
Vasily Kokorev,
Danilo Marchesini,
Richard Pan,
John R. Weaver
, et al. (2 additional authors not shown)
Abstract:
Observations of high-redshift galaxies provide a critical direct test to the theories of early galaxy formation, yet to date, only three have been spectroscopically confirmed at $z>12$. Due to strong gravitational lensing over a wide area, the galaxy cluster field A2744 is ideal for searching for the earliest galaxies. Here we present JWST/NIRSpec observations of two galaxies: a robust detection a…
▽ More
Observations of high-redshift galaxies provide a critical direct test to the theories of early galaxy formation, yet to date, only three have been spectroscopically confirmed at $z>12$. Due to strong gravitational lensing over a wide area, the galaxy cluster field A2744 is ideal for searching for the earliest galaxies. Here we present JWST/NIRSpec observations of two galaxies: a robust detection at $z_{\rm spec} = 12.393^{+0.004}_{-0.001}$, and a plausible candidate at $z_{\rm spec} = 13.079^{+0.013}_{-0.001}$. The galaxies are discovered in JWST/NIRCam imaging and their distances are inferred with JWST/NIRSpec spectroscopy, all from the JWST Cycle 1 UNCOVER Treasury survey. Detailed stellar population modeling using JWST NIRCam and NIRSpec data corroborates the primeval characteristics of these galaxies: low mass ($\sim 10^8~{\rm M_\odot}$), young, rapidly-assembling, metal-poor, and star-forming. Interestingly, both galaxies are spatially resolved, having lensing-corrected rest-UV effective radii on the order of 300-400 pc, which are notably larger than other spectroscopically confirmed systems at similar redshifts. The observed dynamic range of $z \gtrsim 10$ sizes spans over 1 order of magnitude, implying a significant scatter in the size-mass relation at early times. Deep into the epoch of reionization, these discoveries elucidate the emergence of the first galaxies.
△ Less
Submitted 10 October, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code
Authors:
Rangeet Pan,
Ali Reza Ibrahimzada,
Rahul Krishna,
Divya Sankar,
Lambert Pouguem Wassi,
Michele Merler,
Boris Sobolev,
Raju Pavuluri,
Saurabh Sinha,
Reyhaneh Jabbarvand
Abstract:
Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that en…
▽ More
Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that end, we present a large-scale empirical study to investigate the ability of general LLMs and code LLMs for code translation across pairs of different languages, including C, C++, Go, Java, and Python. Our study, which involves the translation of 1,700 code samples from three benchmarks and two real-world projects, reveals that LLMs are yet to be reliably used to automate code translation -- with correct translations ranging from 2.1% to 47.3% for the studied LLMs. Further manual investigation of unsuccessful translations identifies 15 categories of translation bugs. We also compare LLM-based code translation with traditional non-LLM-based approaches. Our analysis shows that these two classes of techniques have their own strengths and weaknesses. Finally, insights from our study suggest that providing more context to LLMs during translation can help them produce better results. To that end, we propose a prompt-crafting approach based on the symptoms of erroneous translations; this improves the performance of LLM-based code translation by 5.5% on average. Our study is the first of its kind, in terms of scale and breadth, that provides insights into the current limitations of LLMs in code translation and opportunities for improving them. Our dataset -- consisting of 1,700 code samples in five PLs with 10K+ tests, 43K+ translated code, 1,748 manually labeled bugs, and 1,365 bug-fix pairs -- can help drive research in this area.
△ Less
Submitted 16 January, 2024; v1 submitted 6 August, 2023;
originally announced August 2023.