-
Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning
Authors:
Diksha Goel,
Kristen Moore,
Mingyu Guo,
Derui Wang,
Minjune Kim,
Seyit Camtepe
Abstract:
This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study coun…
▽ More
This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study counters this by recognizing their dynamic nature and develo** advanced edge-blocking defenses through a Stackelberg game model between attacker and defender. We devise a Reinforcement Learning (RL)-based attack strategy and an RL-assisted Evolutionary Diversity Optimization-based defense strategy, where the attacker and defender improve each other strategy via parallel gameplay. To address the computational challenges of training attacker-defender strategies on numerous dynamic AD graphs, we propose an RL Training Facilitator that prunes environments and neural networks to eliminate irrelevant elements, enabling efficient and scalable training for large graphs. We extensively train the attacker strategy, as a sophisticated attacker model is essential for a robust defense. Our empirical results successfully demonstrate that our proposed approach enhances defender's proficiency in hardening dynamic AD graphs while ensuring scalability for large-scale AD.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Authors:
Jiwan Chung,
Sungjae Lee,
Minseo Kim,
Seungju Han,
Ashkan Yousefpour,
Jack Hessel,
Youngjae Yu
Abstract:
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by…
▽ More
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today's AI capable of similar understanding?
We collect and release VisArgs, an annotated corpus designed to make explicit the (usually implicit) structures underlying visual arguments. VisArgs includes 1,611 images accompanied by three types of textual annotations: 5,112 visual premises (with region annotations), 5,574 commonsense premises, and reasoning trees connecting them to a broader argument. We propose three tasks over VisArgs to probe machine capacity for visual argument understanding: localization of premises, identification of premises, and deduction of conclusions. Experiments demonstrate that 1) machines cannot fully identify the relevant visual cues. The top-performing model, GPT-4-O, achieved an accuracy of only 78.5%, whereas humans reached 98.0%. All models showed a performance drop, with an average decrease in accuracy of 19.5%, when the comparison set was changed from objects outside the image to irrelevant objects within the image. Furthermore, 2) this limitation is the greatest factor impacting their performance in understanding visual arguments. Most models improved the most when given relevant visual premises as additional inputs, compared to other inputs, for deducing the conclusion of the visual argument.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants
Authors:
Minhwa Lee,
Zae Myung Kim,
Vivek A. Khetan,
Dongyeop Kang
Abstract:
Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific wri…
▽ More
Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific writing. To address this gap, we propose an approach of human-AI collaborative taxonomy development to perform as a guideline for domain-specific writing assistants. This method integrates iterative feedback from domain experts and multiple interactions between these experts and LLMs to refine the taxonomy. Through larger-scale experiments, we aim to validate this methodology and thus improve LLM-powered writing assistance, tailoring it to meet the unique requirements of different stakeholder needs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model
Authors:
Joun Yeop Lee,
Myeonghun Jeong,
Minchan Kim,
Ji-Hyun Lee,
Hoon-Young Cho,
Nam Soo Kim
Abstract:
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v…
▽ More
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target voice to generate acoustic tokens from semantic tokens, enriching speech reconstruction. The Interpreting stage employs a transducer for its robustness in aligning text to speech. In contrast, the Speaking stage utilizes a Conformer-based architecture integrated with a Grouped Masked Language Model (G-MLM) to boost computational efficiency. Our experiments verify that this innovative structure surpasses the conventional models in the zero-shot scenario in terms of speech quality and speaker similarity.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Authors:
Hyun Myung Kim,
Kangwook Jang,
Hoirin Kim
Abstract:
As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafid…
▽ More
As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafide samples to define their centroid, which can yield a specialized centroid for one-class learning. Integrating our ACS with one-class learning gathers bonafide representations into a single cluster, forming well-separated embeddings robust to unseen spoofing attacks. Our proposed method achieves an equal error rate (EER) of 2.19% on the ASVspoof 2021 deepfake dataset, outperforming all existing systems. Furthermore, the t-SNE visualization illustrates that our method effectively maps the bonafide embeddings into a single cluster and successfully disentangles the bonafide and spoof classes.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting
Authors:
Jiyong Oh,
Syed M. Raza,
Lusungu J. Mwasinga,
Moonseong Kim,
Hyunseung Choo
Abstract:
Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T…
▽ More
Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. This paper addresses these limitations by proposing a Deep Reinforcement Learning (DRL) framework for UAV path planning with assured connectivity (DUPAC). During UAV flight, DUPAC determines the best route from a defined source to the destination in terms of distance and signal quality. The viability and performance of DUPAC are evaluated under simulated real-world urban scenarios using the Unity framework. The results confirm that DUPAC achieves an autonomous UAV flight path similar to base method with only 2% increment while maintaining an average 9% better connection quality throughout the flight.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Authors:
Seungbeen Lee,
Seungwon Lim,
Seungju Han,
Giyeong Oh,
Hyungjoo Chae,
Jiwan Chung,
Minju Kim,
Beong-woo Kwak,
Yeonsoo Lee,
Dongha Lee,
**young Yeo,
Youngjae Yu
Abstract:
The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit…
▽ More
The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliability for precise personality measurements. To address this, we introduce TRAIT, a new tool consisting of 8K multi-choice questions designed to assess the personality of LLMs with validity and reliability. TRAIT is built on the psychometrically validated human questionnaire, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC10X knowledge graph for testing personality in a variety of real scenarios. TRAIT overcomes the reliability and validity issues when measuring personality of LLM with self-assessment, showing the highest scores across three metrics: refusal rate, prompt sensitivity, and option order sensitivity. It reveals notable insights into personality of LLM: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (i.e., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Augmenting Query and Passage for Retrieval-Augmented Generation using LLMs for Open-Domain Question Answering
Authors:
Minsang Kim,
Cheoneum Park,
Seungjun Baek
Abstract:
Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambig…
▽ More
Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambiguous or complex. In this paper, we propose a simple yet efficient method called question and passage augmentation via LLMs for open-domain QA. Our method first decomposes the original questions into multiple-step sub-questions. By augmenting the original question with detailed sub-questions and planning, we are able to make the query more specific on what needs to be retrieved, improving the retrieval performance. In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction. Experimental results show that the proposed scheme outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Measuring Sample Importance in Data Pruning for Training LLMs from a Data Compression Perspective
Authors:
Minsang Kim,
Seungjun Baek
Abstract:
Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea…
▽ More
Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea is that, less informative samples are likely to contain redundant information, and thus should be pruned first. We leverage log-likelihood function of trained models as a surrogate to measure information content of samples. Experiments reveal a surprising insight that information-based pruning can enhance the generalization capability of the model, improves upon language modeling and downstream tasks as compared to the model trained on the entire dataset.
△ Less
Submitted 20 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Global bases for Bosonic extensions of quantum unipotent coordinate rings
Authors:
Masaki Kashiwara,
Myungho Kim,
Se-** Oh,
Euiyong Park
Abstract:
In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules…
▽ More
In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules in the Hernandez-Leclerc category correspond to the normalized global basis of $\widehat{\mathcal{A}}$.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation
Authors:
Miseul Kim,
Soo-Whan Chung,
Youna Ji,
Hong-Goo Kang,
Min-Seok Choi
Abstract:
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by…
▽ More
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by the target acoustic scene of the reference prompt. Specifically, AST-LDM is a latent diffusion model conditioned by CLAP embeddings that describe target acoustic scenes in either audio or text modalities. The contributions of this paper include introducing the AST task and implementing its baseline model. For AST-LDM, we emphasize its core framework, which is to preserve the input speech and generate audio consistently with both the given speech and the target acoustic environment. Experiments, including objective and subjective tests, validate the feasibility and efficacy of our approach.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
Authors:
Myeonghwa Lee,
Seonho An,
Min-Soo Kim
Abstract:
In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locatin…
▽ More
In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locating and Building, constructed from two video games (Europa Universalis IV and Victoria 3) that have almost the same goal as Decision QA. To address Decision QA effectively, we also propose a new RAG technique called the iterative plan-then-retrieval augmented generation (PlanRAG). Our PlanRAG-based LM generates the plan for decision making as the first step, and the retriever generates the queries for data analysis as the second step. The proposed method outperforms the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario, respectively. We release our code and benchmark at https://github.com/myeon9h/PlanRAG.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation
Authors:
Xin Yu,
Qi Yang,
Han Liu,
Ho Hin Lee,
Yucheng Tang,
Lucas W. Remedios,
Michael Kim,
Shunxing Bao,
Ann Xenobia Moore,
Luigi Ferrucci,
Bennett A. Landman
Abstract:
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta…
▽ More
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
The Focal-plane Actualized Shifted Technique Realized for a Shack Hartmann Wavefront Sensor (fastrSHWFS)
Authors:
Benjamin L. Gerard,
Aaron Lemmer,
Bautista R. Fernandez,
Xiaoxing Xia,
Cesar Laguna,
Mike Kim,
Stephen Mark Ammons,
Brian Bauman,
Lisa Poyneer
Abstract:
Astronomical adaptive optics (AO) is a critical approach to enable ground-based diffraction-limited imaging and high contrast science, with the potential to enable habitable exoplanet imaging on future extremely large telescopes. However, AO systems must improve significantly to enable habitable exoplanet imaging. Time lag between the end of an exposure and end of deformable mirror commands being…
▽ More
Astronomical adaptive optics (AO) is a critical approach to enable ground-based diffraction-limited imaging and high contrast science, with the potential to enable habitable exoplanet imaging on future extremely large telescopes. However, AO systems must improve significantly to enable habitable exoplanet imaging. Time lag between the end of an exposure and end of deformable mirror commands being applied in an AO loop is now the dominant error term in many extreme AO systems (e.g., Poyneer et al. 2016), and within that lag component detector read time is becoming non-negligible (e.g., Cetre et al. 2018). This term will decrease as faster detector readout capabilities are developed by vendors. In complement, we have developed a modified Shack Hartmann Wavefront Sensor (SHWFS) to address this problem called the Focal-plane Actualized Shifted Technique Realized for a SHWFS (fastrSHWFS). The novelty of this design is to replace the usual lenslet array with a bespoke pupil-plane phase mask that redistributes the spot pattern on the detector into a rectangular array with a custom aspect ratio (in an extreme case, if the detector size can accommodate it, the array can be a single line). We present the fastrSHWFS concept and preliminary laboratory tests. For some detectors and AO systems, the fastrSHWFS technique can decrease the read time per frame compared to a regular SHWFS by up to 30x, supporting the goal of reduced AO lag needed to eventually enable habitable exoplanet imaging.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
OpenVLA: An Open-Source Vision-Language-Action Model
Authors:
Moo ** Kim,
Karl Pertsch,
Siddharth Karamcheti,
Ted Xiao,
Ashwin Balakrishna,
Suraj Nair,
Rafael Rafailov,
Ethan Foster,
Grace Lam,
Pannag Sanketi,
Quan Vuong,
Thomas Kollar,
Benjamin Burchfiel,
Russ Tedrake,
Dorsa Sadigh,
Sergey Levine,
Percy Liang,
Chelsea Finn
Abstract:
Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be…
▽ More
Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has been challenging as 1) existing VLAs are largely closed and inaccessible to the public, and 2) prior work fails to explore methods for efficiently fine-tuning VLAs for new tasks, a key component for adoption. Addressing these challenges, we introduce OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations. OpenVLA builds on a Llama 2 language model combined with a visual encoder that fuses pretrained features from DINOv2 and SigLIP. As a product of the added data diversity and new model components, OpenVLA demonstrates strong results for generalist manipulation, outperforming closed models such as RT-2-X (55B) by 16.5% in absolute task success rate across 29 tasks and multiple robot embodiments, with 7x fewer parameters. We further show that we can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities, and outperform expressive from-scratch imitation learning methods such as Diffusion Policy by 20.4%. We also explore compute efficiency; as a separate contribution, we show that OpenVLA can be fine-tuned on consumer GPUs via modern low-rank adaptation methods and served efficiently via quantization without a hit to downstream success rate. Finally, we release model checkpoints, fine-tuning notebooks, and our PyTorch codebase with built-in support for training VLAs at scale on Open X-Embodiment datasets.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Site-Specific Radio Channel Representation -- Current State and Future Applications
Authors:
Thomas Zemen,
Jorge Gomez-Ponce,
Aniruddha Chandra,
Michael Walter,
Enes Aksoy,
Ruisi He,
David Matolak,
Minseok Kim,
Jun-ichi Takada,
Sana Salous,
Reinaldo Valenzuela,
Andreas F. Molisch
Abstract:
A site-specific radio channel representation considers the surroundings of the communication system through the environment geometry, such as buildings, vegetation, and mobile objects including their material and surface properties. In this article, we focus on communication technologies for 5G and beyond that are increasingly able to exploit the specific environment geometry for both communicatio…
▽ More
A site-specific radio channel representation considers the surroundings of the communication system through the environment geometry, such as buildings, vegetation, and mobile objects including their material and surface properties. In this article, we focus on communication technologies for 5G and beyond that are increasingly able to exploit the specific environment geometry for both communication and sensing. We present methods for a site-specific radio channel representation that is spatially consistent, such that mobile transmitter and receveiver cause a correlated time-varying channel impulse response. When modelled as random, this channel impulse response has non-stationary statistical properties, i.e., a time-variant Doppler spectrum, power delay profile, K-factor and spatial correlation. A site-specific radio channel representation will enable research into emerging 5G and beyond technologies such as distributed multiple-input multiple-output systems, reconfigurable intelligent surfaces, multi-band communication, and joint communication and sensing. These 5G and beyond technologies will be deployed for a wide range of environments, from dense urban areas to railways, road transportation, industrial automation, and unmanned aerial vehicles.
△ Less
Submitted 18 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Authors:
Kang-il Lee,
Minbeom Kim,
Minsung Kim,
Dongryeol Lee,
Hyukhun Koh,
Kyomin Jung
Abstract:
Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with im…
▽ More
Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language priors, or blindness, of LVLMs. It not only includes tests on counterfactual images to assess language priors but also involves a series of tests to evaluate more basic capabilities such as commonsense knowledge, visual perception, and commonsense biases. For each instance in our benchmark, we ensure that all these basic tests are passed before evaluating the language priors, thereby minimizing the influence of other factors on the assessment. The evaluation and analysis of recent LVLMs in our benchmark reveal that almost all models exhibit a significant reliance on language priors, presenting a strong challenge in the field.
△ Less
Submitted 17 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation
Authors:
Tsun-An Hsieh,
Heeyoul Choi,
Minje Kim
Abstract:
Recent studies highlight the potential of textual modalities in conditioning the speech separation model's inference process. However, regularization-based methods remain underexplored despite their advantages of not requiring auxiliary text data during the test time. To address this gap, we introduce a timed text-based regularization (TTR) method that uses language model-derived semantics to impr…
▽ More
Recent studies highlight the potential of textual modalities in conditioning the speech separation model's inference process. However, regularization-based methods remain underexplored despite their advantages of not requiring auxiliary text data during the test time. To address this gap, we introduce a timed text-based regularization (TTR) method that uses language model-derived semantics to improve speech separation models. Our approach involves two steps. We begin with two pretrained audio and language models, WavLM and BERT, respectively. Then, a Transformer-based audio summarizer is learned to align the audio and word embeddings and to minimize their gap. The summarizer Transformer, incorporated as a regularizer, promotes the separated sources' alignment with the semantics from the timed text. Experimental results show that the proposed TTR method consistently improves the various objective metrics of the separation results over the unregularized baselines.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Authors:
Dongsu Zhang,
Francis Williams,
Zan Gojcic,
Karsten Kreis,
Sanja Fidler,
Young Min Kim,
Amlan Kar
Abstract:
We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Gener…
▽ More
We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Generative Cellular Automata (hGCA), a spatially scalable conditional 3D generative model, which grows geometry recursively with local kernels following, in a coarse-to-fine manner, equipped with a light-weight planner to induce global consistency. Experiments on synthetic scenes show that hGCA generates plausible scene geometry with higher fidelity and completeness compared to state-of-the-art baselines. Our model generalizes strongly from sim-to-real, qualitatively outperforming baselines on the Waymo-open dataset. We also show anecdotal evidence of the ability to create novel objects from real-world geometric cues even when trained on limited synthetic content. More results and details can be found on https://research.nvidia.com/labs/toronto-ai/hGCA/.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Authors:
Se ** Park,
Chae Won Kim,
Hyeongseop Rha,
Minsu Kim,
Joanna Hong,
Jeong Hun Yeo,
Yong Man Ro
Abstract:
In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp…
▽ More
In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
The Size-luminosity Relation of the AGN Torus Determined from the Comparison between Optical and Mid-infrared Variability
Authors:
Min** Kim,
Suyeon Son,
Luis C. Ho
Abstract:
We investigate the optical variability of low-redshift ($0.15< z\leq0.4$) active galactic nuclei using the multi-epoch data from the Zwicky Transient Facility. We find that a damped random walk model well describes the ensemble structure function in the $g$ band. Consistent with previous studies, more luminous active galactic nuclei tend to have a steeper structure function at a timescale less tha…
▽ More
We investigate the optical variability of low-redshift ($0.15< z\leq0.4$) active galactic nuclei using the multi-epoch data from the Zwicky Transient Facility. We find that a damped random walk model well describes the ensemble structure function in the $g$ band. Consistent with previous studies, more luminous active galactic nuclei tend to have a steeper structure function at a timescale less than the break timescale and smaller variability amplitude. By comparing the structure functions in the optical with the mid-infrared obtained from the Wide-field Infrared Survey Explorer, we derive the size of the dusty torus using a toy model for the geometry of the torus. The size of the torus positively correlates with the luminosity of the active nucleus, following a relation that agrees well with previous studies based on reverberation map**. This result demonstrates that the structure function method can be used as a powerful and highly efficient tool to examine the size of the torus.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Perron solutions and boundary regularity for nonlocal nonlinear Dirichlet problems
Authors:
Anders Björn,
Jana Björn,
Minhyun Kim
Abstract:
For nonlinear operators of fractional $p$-Laplace type, we consider two types of solutions to the nonlocal Dirichlet problem: Sobolev solutions based on fractional Sobolev spaces and Perron solutions based on superharmonic functions. These solutions give rise to two different concepts of regularity for boundary points, namely Sobolev and Perron regularity. We show that these two notions are equiva…
▽ More
For nonlinear operators of fractional $p$-Laplace type, we consider two types of solutions to the nonlocal Dirichlet problem: Sobolev solutions based on fractional Sobolev spaces and Perron solutions based on superharmonic functions. These solutions give rise to two different concepts of regularity for boundary points, namely Sobolev and Perron regularity. We show that these two notions are equivalent and we also provide several characterizations of regular boundary points. Along the way, we give a new definition of Perron solutions, which is applicable to arbitrary exterior Dirichlet data $g: Ω^c \to [-\infty,\infty]$. We obtain resolutivity and invariance results for these Perron solutions, and show that the Sobolev and Perron solutions coincide for a large class of exterior Dirichlet data. A uniqueness result for the Dirichlet problem is also obtained for the class of bounded solutions taking prescribed continuous exterior data quasieverywhere on the boundary.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance
Authors:
Semin Kim,
Myeonghun Jeong,
Hyeonseung Lee,
Minchan Kim,
Byoung ** Choi,
Nam Soo Kim
Abstract:
In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin…
▽ More
In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancing the quality of generated voices with large amount of unlabeled data. At inference, our novel dual guiding mechanism gives text and pitch guidance on the reverse diffusion step by estimating the score of masked input. Experimental results show that the model trained in a semi-supervised manner outperforms other baselines trained only on the labeled data in terms of pronunciation, pitch accuracy and overall quality. Furthermore, we demonstrate that by adding Text-to-Speech (TTS) data in training, the model can synthesize the singing voices of TTS speakers even without their singing voices.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024
Authors:
**woo Ahn,
Junhyeok Park,
Min-Jun Kim,
Kang-Hyeon Kim,
So-Yeong Sohn,
Yun-Ji Lee,
Du-Seong Chang,
Yu-Jung Heo,
Eun-Sol Kim
Abstract:
In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m…
▽ More
In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
The quantum-refractive evolution of polarization states in pulsar emission
Authors:
Dong-Hoon Kim,
Chul Min Kim,
Sang Pyo Kim
Abstract:
Highly magnetized neutron stars have quantum refraction effects on pulsar emission due to the non-linearity of the quantum electrodynamics (QED) action. In this paper, we investigate the evolution of the polarization states under the quantum refraction effects combined with the frequency dependence of pulsar emission; we solve a system of evolution equations of the Stokes vector, where the birefri…
▽ More
Highly magnetized neutron stars have quantum refraction effects on pulsar emission due to the non-linearity of the quantum electrodynamics (QED) action. In this paper, we investigate the evolution of the polarization states under the quantum refraction effects combined with the frequency dependence of pulsar emission; we solve a system of evolution equations of the Stokes vector, where the birefringent vector, in which such effects are encoded, acts on the Stokes vector. At a fixed frequency of emission, depending on the magnitude of the birefringent vector, dominated mostly by the magnetic field strength, the evolution of the Stokes vector largely exhibits three different patterns: (i) monotonic, or (ii) half-oscillatory, or (iii) highly oscillatory behaviours. These features are understood and confirmed by means of approximate analytical solutions to the evolution equations.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
High-precision and low-depth eigenstate property estimation: theory and resource estimation
Authors:
**zhao Sun,
Pei Zeng,
Tom Gur,
M. S. Kim
Abstract:
Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target pre…
▽ More
Estimating the eigenstate properties of quantum many-body systems is a long-standing, challenging problem for both classical and quantum computing. For the task of eigenstate preparation, quantum signal processing (QSP) has established near-optimal query complexity $O( Δ^{-1} \log(ε^{-1}) )$ by querying the block encoding of the Hamiltonian $H$ where $Δ$ is the energy gap and $ε$ is the target precision. However, QSP is challenging for both near-term noisy quantum computers and early fault-tolerant quantum computers (FTQC), which are limited by the number of logical qubits and circuit depth. To date, early FTQC algorithms have focused on querying the perfect time evolution $e^{-iHt}$. It remains uncertain whether early FTQC algorithms can maintain good asymptotic scaling at the gate level. Moreover, when considering qubit connectivity, the circuit depth of existing FTQC algorithms may scale suboptimally with system size. Here, we present a full-stack design of a random sampling algorithm for estimating the eigenenergy and the observable expectations on the eigenstates, which can achieve high precision and good system size scaling. The gate complexity has a logarithmic dependence on precision $ {O}(\log^{1+o(1)} (1/ε))$ for generic Hamiltonians, which cannot achieved by methods using Trottersiation to realise $e^{-iHt}$ like in QETU. For $n$-qubit lattice Hamiltonians, our method achieves near-optimal system size dependence with the gate complexity $O(n^{1+o(1)})$. When restricting the qubit connectivity to a linear nearest-neighbour architecture, The method shows advantages in circuit depth, with $O(n^{o(1)})$ for lattice models and $O(n^{2+o(1)})$ for electronic structure problems. We compare the resource requirements (CNOT gates, T gates and qubit numbers) by phase estimation, QSP, and QETU, in lattice and molecular problems.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Probing quantum complexity via universal saturation of stabilizer entropies
Authors:
Tobias Haug,
Leandro Aolita,
M. S. Kim
Abstract:
Nonstabilizerness or `magic' is a key resource for quantum computing and a necessary condition for quantum advantage. Non-Clifford operations turn stabilizer states into resourceful states, where the amount of nonstabilizerness is quantified by resource measures such as stabilizer Rényi entropies (SREs). Here, we show that SREs saturate their maximum value at a critical number of non-Clifford oper…
▽ More
Nonstabilizerness or `magic' is a key resource for quantum computing and a necessary condition for quantum advantage. Non-Clifford operations turn stabilizer states into resourceful states, where the amount of nonstabilizerness is quantified by resource measures such as stabilizer Rényi entropies (SREs). Here, we show that SREs saturate their maximum value at a critical number of non-Clifford operations. Close to the critical point SREs show universal behavior. Remarkably, the derivative of the SRE crosses at the same point independent of the number of qubits and can be rescaled onto a single curve. We find that the critical point depends non-trivially on Rényi index $α$. For random Clifford circuits doped with T-gates, the critical T-gate density scales independently of $α$. In contrast, for random Hamiltonian evolution, the critical time scales linearly with qubit number for $α>1$, while is a constant for $α<1$. This highlights that $α$-SREs reveal fundamentally different aspects of nonstabilizerness depending on $α$: $α$-SREs with $α<1$ relate to Clifford simulation complexity, while $α>1$ probe the distance to the closest stabilizer state and approximate state certification cost via Pauli measurements. As technical contributions, we observe that the Pauli spectrum of random evolution can be approximated by two highly concentrated peaks which allows us to compute its SRE. Further, we introduce a class of random evolution that can be expressed as random Clifford circuits and rotations, where we provide its exact SRE. Our results opens up new approaches to characterize the complexity of quantum systems.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Leveraging Off-the-Shelf Silicon Chips for Quantum Computing
Authors:
John Michniewicz,
M. S. Kim
Abstract:
There is a growing demand for quantum computing across various sectors, including finance, materials and studying chemical reactions. A promising implementation involves semiconductor qubits utilizing quantum dots within transistors. While academic research labs currently produce their own devices, scaling this process is challenging, requires expertise, and results in devices of varying quality.…
▽ More
There is a growing demand for quantum computing across various sectors, including finance, materials and studying chemical reactions. A promising implementation involves semiconductor qubits utilizing quantum dots within transistors. While academic research labs currently produce their own devices, scaling this process is challenging, requires expertise, and results in devices of varying quality. Some initiatives are exploring the use of commercial transistors, offering scalability, improved quality, affordability, and accessibility for researchers. This paper delves into potential realizations and the feasibility of employing off-the-shelf commercial devices for qubits. It addresses challenges such as noise, coherence, limited customizability in large industrial fabs, and scalability issues. The exploration includes discussions on potential manufacturing approaches for early versions of small qubit chips. The use of state-of-the-art transistors as hosts for quantum dots, incorporating readout techniques based on charge sensing or reflectometry, and methods like electron shuttling for qubit connectivity are examined. Additionally, more advanced designs, including 2D arrays and crossbar or DRAM-like access arrays, are considered for the path toward accessible quantum computing.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency
Authors:
Haoran Ma,
Yifan Qiao,
Shi Liu,
Shan Yu,
Yuanjiang Ni,
Qingda Lu,
Jiesheng Wu,
Yiying Zhang,
Miryung Kim,
Harry Xu
Abstract:
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit…
▽ More
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunities for significantly simplifying the coherence implementation if the ownership semantics can be exposed to and leveraged by the runtime. This paper discusses the design and implementation of DistR, a Rust-based DSM system that outperforms the two state-of-the-art DSM systems GAM and Grappa by up to 2.64x and 29.16x in throughput, and scales much better with the number of servers.
△ Less
Submitted 27 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Aligning Large Language Models via Fine-grained Supervision
Authors:
Dehong Xu,
Liang Qiu,
Minseok Kim,
Faisal Ladhak,
Jaeyoung Do
Abstract:
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learn…
▽ More
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learning process. However, because this approach operates on sequence-level feedback, it lacks the precision to identify the exact parts of the output affecting user preferences. To address this gap, we propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained Proximal Policy Optimization (PPO) model. Our experiment results demonstrate that this approach can achieve up to an absolute improvement of $5.1\%$ in LLM performance, in terms of win rate against the reference model, compared with the traditional PPO model.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
MPCR: Multi- and Mixed-Precision Computations Package in R
Authors:
Mary Lai O. Salvana,
Sameh Abdulah,
Minwoo Kim,
David Helmy,
Ying Sun,
Marc G. Genton
Abstract:
Computational statistics has traditionally utilized double-precision (64-bit) data structures and full-precision operations, resulting in higher-than-necessary accuracy for certain applications. Recently, there has been a growing interest in exploring low-precision options that could reduce computational complexity while still achieving the required level of accuracy. This trend has been amplified…
▽ More
Computational statistics has traditionally utilized double-precision (64-bit) data structures and full-precision operations, resulting in higher-than-necessary accuracy for certain applications. Recently, there has been a growing interest in exploring low-precision options that could reduce computational complexity while still achieving the required level of accuracy. This trend has been amplified by new hardware such as NVIDIA's Tensor Cores in their V100, A100, and H100 GPUs, which are optimized for mixed-precision computations, Intel CPUs with Deep Learning (DL) boost, Google Tensor Processing Units (TPUs), Field Programmable Gate Arrays (FPGAs), ARM CPUs, and others. However, using lower precision may introduce numerical instabilities and accuracy issues. Nevertheless, some applications have shown robustness to low-precision computations, leading to new multi- and mixed-precision algorithms that balance accuracy and computational cost. To address this need, we introduce MPCR, a novel R package that supports three different precision types (16-, 32-, and 64-bit) and their combinations, along with its usage in commonly-used Frequentist/Bayesian statistical examples. The MPCR package is written in C++ and integrated into R through the \pkg{Rcpp} package, enabling highly optimized operations in various precisions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Influence of spectra sewing on XCT measurement
Authors:
A. J. Arikkat,
K. A. Janulewicz,
C. M. Kim,
P. Wachulak
Abstract:
The paper presents an analysis of the possible spectra manipulation and its consequence for the specific application of XCT. The focus was on the modification of the registered spectra dominantly by the sewing/stitching method. A model spectrum was created to analyse the possible behaviour of the spectral components when specifically arranged. The model and processing of real experimental data rev…
▽ More
The paper presents an analysis of the possible spectra manipulation and its consequence for the specific application of XCT. The focus was on the modification of the registered spectra dominantly by the sewing/stitching method. A model spectrum was created to analyse the possible behaviour of the spectral components when specifically arranged. The model and processing of real experimental data revealed that careful spectral sewing can be a very useful procedure and typically leads to improvement of the results obtained with the XCT technique. The results recommended also cautiousness in the choice of the applied modification and scale. In some cases gain or spectral enhancement of a part of the spectrum can be considered also as a sort of sewing, and improve the XCT results.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Authors:
Jungmin Yun,
Mihyeon Kim,
Youngbin Kim
Abstract:
Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning a…
▽ More
Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's performance but also reduce its computational demands. Experiments with various datasets demonstrate superior performance compared to baseline models, especially with the best improvement over the existing BERT model, achieving +5%p in accuracy and +5.6%p in F1 score. Additionally, memory cost is reduced to 0.61x, and a speedup of 1.64x is achieved.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Modeling the refractive index profile n(z) of polar ice for ultra-high energy neutrino experiments
Authors:
S. Ali,
P. Allison,
S. Archambault,
J. J. Beatty,
D. Z. Besson,
A. Bishop,
P. Chen,
Y. C. Chen,
B. A. Clark,
W. Clay,
A. Connolly,
K. Couberly,
L. Cremonesi,
A. Cummings,
P. Dasgupta,
R. Debolt,
S. de Kockere,
K. D. de Vries,
C. Deaconu,
M. A. DuVernois,
J. Flaherty,
E. Friedman,
R. Gaior,
P. Giri,
J. Hanson
, et al. (45 additional authors not shown)
Abstract:
We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing di…
▽ More
We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing differences provide constraints on the index of refraction profile near South Pole, where the Askaryan Radio Array (ARA) neutrino observatory is located. We constrain the refractive index profile by simulating D and R ray paths via ray tracing and comparing those to measured dt(D,R) signals. Using previous ice density data as a proxy for n(z), we demonstrate that our data strongly favors a glaciologically-motivated three-phase densification model rather than a single exponential scale height model. Simulations show that the single exponential model overestimates ARA neutrino sensitivity compared to the three-phase model.
△ Less
Submitted 11 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead
Authors:
Minsu Kim,
Walid Saad,
Merouane Debbah,
Choong Seon Hong
Abstract:
The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune…
▽ More
The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Amortizing intractable inference in diffusion models for vision, language, and control
Authors:
Siddarth Venkatraman,
Moksh Jain,
Luca Scimeca,
Minsu Kim,
Marcin Sendera,
Mohsin Hasan,
Luke Rowe,
Sarthak Mittal,
Pablo Lemos,
Emmanuel Bengio,
Alexandre Adam,
Jarrid Rector-Brooks,
Yoshua Bengio,
Glen Berseth,
Nikolay Malkin
Abstract:
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat…
▽ More
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Double-sided van der Waals epitaxy of topological insulators across an atomically thin membrane
Authors:
Joon Young Park,
Young Jae Shin,
Jeacheol Shin,
Jehyun Kim,
Janghyun Jo,
Hyobin Yoo,
Danial Haei,
Chohee Hyun,
Jiyoung Yun,
Robert M. Huber,
Arijit Gupta,
Kenji Watanabe,
Takashi Taniguchi,
Wan Kyu Park,
Hyeon Suk Shin,
Miyoung Kim,
Dohun Kim,
Gyu-Chul Yi,
Philip Kim
Abstract:
Atomically thin van der Waals (vdW) films provide a novel material platform for epitaxial growth of quantum heterostructures. However, unlike the remote epitaxial growth of three-dimensional bulk crystals, the growth of two-dimensional (2D) material heterostructures across atomic layers has been limited due to the weak vdW interaction. Here, we report the double-sided epitaxy of vdW layered materi…
▽ More
Atomically thin van der Waals (vdW) films provide a novel material platform for epitaxial growth of quantum heterostructures. However, unlike the remote epitaxial growth of three-dimensional bulk crystals, the growth of two-dimensional (2D) material heterostructures across atomic layers has been limited due to the weak vdW interaction. Here, we report the double-sided epitaxy of vdW layered materials through atomic membranes. We grow vdW topological insulators (TIs) Sb$_2$Te$_3$ and Bi$_2$Se$_3$ by molecular beam epitaxy on both surfaces of atomically thin graphene or hBN, which serve as suspended 2D vdW "$\textit{substrate}$" layers. Both homo- and hetero- double-sided vdW TI tunnel junctions are fabricated, with the atomically thin hBN acting as a crystal-momentum-conserving tunnelling barrier with abrupt and epitaxial interface. By performing field-angle dependent magneto-tunnelling spectroscopy on these devices, we reveal the energy-momentum-spin resonant tunnelling of massless Dirac electrons between helical Landau levels developed in the topological surface states at the interface.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Solid-State Reactions at Niobium-Germanium Interfaces in Hybrid Superconductor-Semiconductor Devices
Authors:
Bernardo Langa Jr.,
Deepak Sapkota,
Ivan Lainez,
Richard Haight,
Bernadeta Srijanto,
Leonard Feldman,
Hussein Hijazi,
Xiangyu Zhu,
Lifang Hu,
Moon Kim,
Kasra Sardashti
Abstract:
Hybrid Superconductor-Semiconductor (S-Sm) materials systems are promising candidates for quantum computing applications. Their integration into superconducting electronics has enabled on-demand voltage tunability at millikelvin temperatures. Ge quantum wells (Ge QWs) have been among the semiconducting platforms interfaced with superconducting Al to realize voltage tunable Josephson junctions. Her…
▽ More
Hybrid Superconductor-Semiconductor (S-Sm) materials systems are promising candidates for quantum computing applications. Their integration into superconducting electronics has enabled on-demand voltage tunability at millikelvin temperatures. Ge quantum wells (Ge QWs) have been among the semiconducting platforms interfaced with superconducting Al to realize voltage tunable Josephson junctions. Here, we explore Nb as a superconducting material in direct contact with Ge channels by focusing on the solid-state reactions at the Nb/Ge interfaces. We employ Nb evaporation at cryogenic temperatures (100 K) to establish a baseline structure with atomically and chemically abrupt Nb/Ge interfaces. By conducting systematic photoelectron spectroscopy and transport measurements on Nb/Ge samples across varying annealing temperatures, we elucidated the influence of Ge out-diffusion on the ultimate performance of superconducting electronics. This study underlines the need for low-temperature growth to minimize chemical intermixing and band bending at the Nb/Ge interfaces.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Improving the Fidelity of CNOT Circuits on NISQ Hardware
Authors:
Dohun Kim,
Minyoung Kim,
Sarah Meng Li,
Michele Mosca
Abstract:
We introduce an improved CNOT synthesis algorithm that considers nearest-neighbour interactions and CNOT gate error rates in noisy intermediate-scale quantum (NISQ) hardware. Compared to IBM's Qiskit compiler, it improves the fidelity of a synthesized CNOT circuit by about 2 times on average (up to 9 times). It lowers the synthesized CNOT count by a factor of 13 on average (up to a factor of 162).…
▽ More
We introduce an improved CNOT synthesis algorithm that considers nearest-neighbour interactions and CNOT gate error rates in noisy intermediate-scale quantum (NISQ) hardware. Compared to IBM's Qiskit compiler, it improves the fidelity of a synthesized CNOT circuit by about 2 times on average (up to 9 times). It lowers the synthesized CNOT count by a factor of 13 on average (up to a factor of 162).
Our contribution is twofold. First, we define a $\textsf{Cost}$ function by approximating the average gate fidelity $F_{avg}$. According to the simulation results, $\textsf{Cost}$ fits the error probability of a noisy CNOT circuit, $\textsf{Prob} = 1 - F_{avg}$, much tighter than the commonly used cost functions. On IBM's fake Nairobi backend, it matches $\textsf{Prob}$ to within $10^{-3}$. On other backends, it fits $\textsf{Prob}$ to within $10^{-1}$. $\textsf{Cost}$ accurately quantifies the dynamic error characteristics and shows remarkable scalability. Second, we propose a noise-aware CNOT routing algorithm, NAPermRowCol, by adapting the leading Steiner-tree-based connectivity-aware CNOT synthesis algorithms. A weighted edge is used to encode a CNOT gate error rate and $\textsf{Cost}$-instructed heuristics are applied to each reduction step. NAPermRowCol does not use ancillary qubits and is not restricted to certain initial qubit maps. Compared with algorithms that are noise-agnostic, it improves the fidelity of a synthesized CNOT circuit across varied NISQ hardware. Depending on the benchmark circuit and the IBM backend selected, it lowers the synthesized CNOT count up to $56.95\%$ compared to ROWCOL and up to $21.62\%$ compared to PermRowCol. It reduces the synthesis $\textsf{Cost}$ up to $25.71\%$ compared to ROWCOL and up to $9.12\%$ compared to PermRowCol. Our method can be extended to route a more general quantum circuit, giving a powerful new tool for compiling on NISQ devices.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Preference Alignment with Flow Matching
Authors:
Minu Kim,
Yongsik Lee,
Sehyeok Kang,
Jihwan Oh,
Song Chong,
Seyoung Yun
Abstract:
We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs lik…
▽ More
We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs like GPT-4. In contrast, PFM utilizes flow matching techniques to directly learn from preference data, thereby reducing the dependency on extensive fine-tuning of pre-trained models. By leveraging flow-based models, PFM transforms less preferred data into preferred outcomes, and effectively aligns model outputs with human preferences without relying on explicit or implicit reward function estimation, thus avoiding common issues like overfitting in reward models. We provide theoretical insights that support our method's alignment with standard PbRL objectives. Experimental results indicate the practical effectiveness of our method, offering a new direction in aligning a pre-trained model to preference.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing
Authors:
Minsun Kim,
SeonGyeom Kim,
Suyoun Lee,
Yoosang Yoon,
Junho Myung,
Haneul Yoo,
Hyungseung Lim,
Jieun Han,
Yoonsu Kim,
So-Yeon Ahn,
Juho Kim,
Alice Oh,
Hwajung Hong,
Tak Yeon Lee
Abstract:
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur…
▽ More
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises surveys, interviews, and prototype demonstration involving six EFL (English as a Foreign Language) teachers, who integrated ChatGPT into semester-long English essay writing classes. Based on the needs identified during the initial survey and interviews, we developed a prototype of Prompt Analytics Dashboard (PAD) that integrates the essay editing history and chat logs between students and ChatGPT. Teacher's feedback on the prototype informs additional features and unmet needs for designing future PAD, which helps them (1) analyze contextual analysis of student behaviors, (2) design an overall learning loop, and (3) develop their teaching skills.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space
Authors:
Minji Lee,
Luiz Felipe Vecchietti,
Hyunkyu Jung,
Hyun Joo Ro,
Meeyoung Cha,
Ho Min Kim
Abstract:
Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space…
▽ More
Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Authors:
Seanie Lee,
Minsu Kim,
Lynn Cherif,
David Dobre,
Juho Lee,
Sung Ju Hwang,
Kenji Kawaguchi,
Gauthier Gidel,
Yoshua Bengio,
Nikolay Malkin,
Moksh Jain
Abstract:
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Develo** effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that e…
▽ More
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Develo** effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts
Authors:
Christoph Kern,
Michael Kim,
Angela Zhou
Abstract:
Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regres…
▽ More
Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regressions) to become robust to unknown covariate shifts at the time of deployment. The method works in general for pseudo-outcome regression, such as the DR-learner. We show how this approach can combine (large) confounded observational and (smaller) randomized datasets by learning a confounded predictor from the observational dataset, and auditing for multi-accuracy on the randomized controlled trial. We show improvements in bias and mean squared error in simulations with increasingly larger covariate shift, and on a semi-synthetic case study of a parallel large observational study and smaller randomized controlled experiment. Overall, we establish a connection between methods developed for multi-distribution learning and achieve appealing desiderata (e.g. external validity) in causal inference and machine learning.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks
Authors:
Seong-Hyeon Hwang,
Minsu Kim,
Steven Euijong Whang
Abstract:
We study the problem of robust data augmentation for regression tasks in the presence of noisy data. Data augmentation is essential for generalizing deep learning models, but most of the techniques like the popular Mixup are primarily designed for classification tasks on image data. Recently, there are also Mixup techniques that are specialized to regression tasks like C-Mixup. In comparison to Mi…
▽ More
We study the problem of robust data augmentation for regression tasks in the presence of noisy data. Data augmentation is essential for generalizing deep learning models, but most of the techniques like the popular Mixup are primarily designed for classification tasks on image data. Recently, there are also Mixup techniques that are specialized to regression tasks like C-Mixup. In comparison to Mixup, which takes linear interpolations of pairs of samples, C-Mixup is more selective in which samples to mix based on their label distances for better regression performance. However, C-Mixup does not distinguish noisy versus clean samples, which can be problematic when mixing and lead to suboptimal model performance. At the same time, robust training has been heavily studied where the goal is to train accurate models against noisy data through multiple rounds of model training. We thus propose our data augmentation strategy RC-Mixup, which tightly integrates C-Mixup with multi-round robust training methods for a synergistic effect. In particular, C-Mixup improves robust training in identifying clean data, while robust training provides cleaner data to C-Mixup for it to perform better. A key advantage of RC-Mixup is that it is data-centric where the robust model training algorithm itself does not need to be modified, but can simply benefit from data mixing. We show in our experiments that RC-Mixup significantly outperforms C-Mixup and robust training baselines on noisy data benchmarks and can be integrated with various robust training methods.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Extremal correlation coefficient for functional data
Authors:
Mihyun Kim,
Piotr Kokoszka
Abstract:
We propose a coefficient that measures dependence in paired samples of functions. It has properties similar to the Pearson correlation, but differs in significant ways: 1) it is designed to measure dependence between curves, 2) it focuses only on extreme curves. The new coefficient is derived within the framework of regular variation in Banach spaces. A consistent estimator is proposed and justifi…
▽ More
We propose a coefficient that measures dependence in paired samples of functions. It has properties similar to the Pearson correlation, but differs in significant ways: 1) it is designed to measure dependence between curves, 2) it focuses only on extreme curves. The new coefficient is derived within the framework of regular variation in Banach spaces. A consistent estimator is proposed and justified by an asymptotic analysis and a simulation study. The usefulness of the new coefficient is illustrated on financial and and climate functional data.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
MHONGOOSE discovery of a gas-rich low-surface brightness galaxy in the Dorado Group
Authors:
F. M. Maccagni,
W. J. G. de Blok,
P. E. Mancera Piña,
R. Ragusa,
E. Iodice,
M. Spavone,
S. McGaugh,
K. A. Oman,
T. A. Oosterloo,
B. S. Koribalski,
M. Kim,
E. A. K. Adams,
P. Amram,
A. Bosma,
F. Bigiel,
E. Brinks,
L. Chemin,
F. Combes,
B. Gibson,
J. Healy,
B. W. Holwerda,
G. I. G. Józsa,
P. Kamphuis,
D. Kleiner,
S. Kurapati
, et al. (6 additional authors not shown)
Abstract:
We present the discovery of a low-mass gas-rich low-surface brightness galaxy in the Dorado Group, at a distance of 17.7 Mpc. Combining deep MeerKAT 21-cm observations from the MeerKAT HI Observations of Nearby Galactic Objects: Observing Southern Emitters (MHONGOOSE) survey with deep photometric images from the VST Early-type Galaxy Survey (VEGAS) we find a stellar and neutral atomic hydrogen (HI…
▽ More
We present the discovery of a low-mass gas-rich low-surface brightness galaxy in the Dorado Group, at a distance of 17.7 Mpc. Combining deep MeerKAT 21-cm observations from the MeerKAT HI Observations of Nearby Galactic Objects: Observing Southern Emitters (MHONGOOSE) survey with deep photometric images from the VST Early-type Galaxy Survey (VEGAS) we find a stellar and neutral atomic hydrogen (HI) gas mass of $M_\star = 2.23\times10^6$ M$_\odot$ and $M_{\rm HI}=1.68\times10^6$ M$_\odot$, respectively. This low-surface brightness galaxy is the lowest mass HI detection found in a group beyond the Local Universe ($D\gtrsim 10$ Mpc). The dwarf galaxy has the typical overall properties of gas-rich low surface brightness galaxies in the Local group, but with some striking differences. Namely, the MHONGOOSE observations reveal a very low column density ($\sim 10^{18-19}$ cm$^{-2}$) HI disk with asymmetrical morphology possibly supported by rotation and higher velocity dispersion in the centre. There, deep optical photometry and UV-observations suggest a recent enhancement of the star formation. Found at galactocentric distances where in the Local Group dwarf galaxies are depleted of cold gas (at $390$ projected-kpc distance from the group centre), this galaxy is likely on its first orbit within the Dorado group. We discuss the possible environmental effects that may have caused the formation of the HI disk and the enhancement of star formation, highlighting the short-lived phase (a few hundreds of Myr) of the gaseous disk, before either SF or hydrodynamical forces will deplete the gas of the galaxy.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
Authors:
Yeachan Park,
Minseok Kim,
Yeoneung Kim
Abstract:
We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative bi…
▽ More
We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.