-
Revisiting Backdoor Attacks against Large Vision-Language Models
Authors:
Siyuan Liang,
Jiawei Liang,
Tianyu Pang,
Chao Du,
Aishan Liu,
Ee-Chien Chang,
Xiaochun Cao
Abstract:
Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor atta…
▽ More
Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs for the first time, revealing certain limitations of most backdoor strategies in practical scenarios. We quantitatively evaluate the generalizability of six typical backdoor attacks on image caption benchmarks across multiple LVLMs, considering both visual and textual domain offsets. Our findings indicate that attack generalizability is positively correlated with the backdoor trigger's irrelevance to specific images/models and the preferential correlation of the trigger pattern. Additionally, we modify existing backdoor attacks based on the above key observations, demonstrating significant improvements in cross-domain scenario generalizability (+86% attack success rate). Notably, even without access to the instruction datasets, a multimodal instruction set can be successfully poisoned with a very low poisoning rate (0.2%), achieving an attack success rate of over 97%. This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs, necessitating more attention and in-depth research.
△ Less
Submitted 1 July, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Detecting Phase Coherence of 2D Bose Gases via Noise Correlations
Authors:
Shinichi Sunami,
Vijay P. Singh,
Erik Rydow,
Abel Beregi,
En Chang,
Ludwig Mathey,
Christopher J. Foot
Abstract:
We measure the noise correlations of two-dimensional (2D) Bose gases after free expansion, which allows us to characterize the in-situ phase coherence across the Berezinskii-Kosterlitz-Thouless (BKT) transition. The noise correlation function features a characteristic spatial oscillatory behavior in the superfluid phase, which gives direct access to the superfluid exponent. This oscillatory behavi…
▽ More
We measure the noise correlations of two-dimensional (2D) Bose gases after free expansion, which allows us to characterize the in-situ phase coherence across the Berezinskii-Kosterlitz-Thouless (BKT) transition. The noise correlation function features a characteristic spatial oscillatory behavior in the superfluid phase, which gives direct access to the superfluid exponent. This oscillatory behavior vanishes above the BKT critical point, as we demonstrate for both single-layer and decoupled bilayer 2D Bose gases. Our work establishes noise interferometry as a general tool to probe and identify many-body states of bilayer quantum gases.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Authors:
Yang Li,
Changsheng Zhao,
Hyungtak Lee,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of L…
▽ More
Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework
Authors:
Edward Y. Chang
Abstract:
Misdiagnosis is a significant issue in healthcare, leading to harmful consequences for patients. The propagation of mislabeled data through machine learning models into clinical practice is unacceptable. This paper proposes EVINCE, a system designed to 1) improve diagnosis accuracy and 2) rectify misdiagnoses and minimize training data errors. EVINCE stands for Entropy Variation through Informatio…
▽ More
Misdiagnosis is a significant issue in healthcare, leading to harmful consequences for patients. The propagation of mislabeled data through machine learning models into clinical practice is unacceptable. This paper proposes EVINCE, a system designed to 1) improve diagnosis accuracy and 2) rectify misdiagnoses and minimize training data errors. EVINCE stands for Entropy Variation through Information Duality with Equal Competence, leveraging this novel theory to optimize the diagnostic process using multiple Large Language Models (LLMs) in a structured debate framework. Our empirical study verifies EVINCE to be effective in achieving its design goals.
△ Less
Submitted 28 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models
Authors:
Edward Y. Chang
Abstract:
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling…
▽ More
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include map** emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.
△ Less
Submitted 13 May, 2024; v1 submitted 11 May, 2024;
originally announced May 2024.
-
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models
Authors:
Yongheng Zhang,
Tingwen Du,
Yunshan Ma,
Xiang Wang,
Yi Xie,
Guozheng Yang,
Yuliang Lu,
Ee-Chien Chang
Abstract:
Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of ex…
▽ More
Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
How to surpass no-go limits in Gaussian quantum error correction and entangled Gaussian state distillation?
Authors:
En-Jui Chang,
Ching-Yi Lai
Abstract:
Gaussian quantum information processing with continuous-variable (CV) quantum information carriers holds significant promise for applications in quantum communication and quantum internet. However, applying Gaussian state distillation and quantum error correction (QEC) faces limitations imposed by no-go results concerning local Gaussian unitary operations and classical communications. This paper i…
▽ More
Gaussian quantum information processing with continuous-variable (CV) quantum information carriers holds significant promise for applications in quantum communication and quantum internet. However, applying Gaussian state distillation and quantum error correction (QEC) faces limitations imposed by no-go results concerning local Gaussian unitary operations and classical communications. This paper introduces a Gaussian QEC protocol that relies solely on local Gaussian resources. A pivotal component of our approach is CV gate teleportation using entangled Gaussian states, which facilitates the implementation of the partial transpose operation on a quantum channel. Consequently, we can efficiently construct a two-mode noise-polarized channel from two noisy Gaussian channels. Furthermore, this QEC protocol naturally extends to a nonlocal Gaussian state distillation protocol.
△ Less
Submitted 7 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Modeling Emotions and Ethics with Large Language Models
Authors:
Edward Y. Chang
Abstract:
This paper explores the integration of human-like emotions and ethical considerations into Large Language Models (LLMs). We first model eight fundamental human emotions, presented as opposing pairs, and employ collaborative LLMs to reinterpret and express these emotions across a spectrum of intensity. Our focus extends to embedding a latent ethical dimension within LLMs, guided by a novel self-sup…
▽ More
This paper explores the integration of human-like emotions and ethical considerations into Large Language Models (LLMs). We first model eight fundamental human emotions, presented as opposing pairs, and employ collaborative LLMs to reinterpret and express these emotions across a spectrum of intensity. Our focus extends to embedding a latent ethical dimension within LLMs, guided by a novel self-supervised learning algorithm with human feedback (SSHF). This approach enables LLMs to perform self-evaluations and adjustments concerning ethical guidelines, enhancing their capability to generate content that is not only emotionally resonant but also ethically aligned. The methodologies and case studies presented herein illustrate the potential of LLMs to transcend mere text and image generation, venturing into the realms of empathetic interaction and principled decision-making, thereby setting a new precedent in the development of emotionally aware and ethically conscious AI systems.
△ Less
Submitted 25 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Towards Automated Generation of Smart Grid Cyber Range for Cybersecurity Experiments and Training
Authors:
Daisuke Mashima,
Muhammad M. Roomi,
Bennet Ng,
Zbigniew Kalbarczyk,
S. M. Suhail Hussain,
Ee-chien Chang
Abstract:
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid…
▽ More
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid cyber range, has been demanded by industry players as well as academia. A smart grid cyber range is typically implemented as a combination of cyber system emulation, which allows interactivity, and physical system (i.e., power grid) simulation that are tightly coupled for consistent cyber and physical behaviours. However, its design and implementation require intensive expertise and efforts in cyber and physical aspects of smart power systems as well as software/system engineering. While many industry players, including power grid operators, device vendors, research and education sectors are interested, availability of the smart grid cyber range is limited to a small number of research labs. To address this challenge, we have developed a framework for modelling a smart grid cyber range using an XML-based language, called SG-ML, and for "compiling" the model into an operational cyber range with minimal engineering efforts. The modelling language includes standardized schema from IEC 61850 and IEC 61131, which allows industry players to utilize their existing configurations. The SG-ML framework aims at making a smart grid cyber range available to broader user bases to facilitate cybersecurity R\&D and hands-on exercises.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Object Detectors in the Open Environment: Challenges, Solutions, and Outlook
Authors:
Siyuan Liang,
Wei Wang,
Ruoyu Chen,
Aishan Liu,
Boxi Wu,
Ee-Chien Chang,
Xiaochun Cao,
Dacheng Tao
Abstract:
With the emergence of foundation models, deep learning-based object detectors have shown practical usability in closed set scenarios. However, for real-world tasks, object detectors often operate in open environments, where crucial factors (e.g., data distribution, objective) that influence model learning are often changing. The dynamic and intricate nature of the open environment poses novel and…
▽ More
With the emergence of foundation models, deep learning-based object detectors have shown practical usability in closed set scenarios. However, for real-world tasks, object detectors often operate in open environments, where crucial factors (e.g., data distribution, objective) that influence model learning are often changing. The dynamic and intricate nature of the open environment poses novel and formidable challenges to object detectors. Unfortunately, current research on object detectors in open environments lacks a comprehensive analysis of their distinctive characteristics, challenges, and corresponding solutions, which hinders their secure deployment in critical real-world scenarios. This paper aims to bridge this gap by conducting a comprehensive review and analysis of object detectors in open environments. We initially identified limitations of key structural components within the existing detection pipeline and propose the open environment object detector challenge framework that includes four quadrants (i.e., out-of-domain, out-of-category, robust learning, and incremental learning) based on the dimensions of the data / target changes. For each quadrant of challenges in the proposed framework, we present a detailed description and systematic analysis of the overarching goals and core difficulties, systematically review the corresponding solutions, and benchmark their performance over multiple widely adopted datasets. In addition, we engage in a discussion of open problems and potential avenues for future research. This paper aims to provide a fresh, comprehensive, and systematic understanding of the challenges and solutions associated with open-environment object detectors, thus catalyzing the development of more solid applications in real-world scenarios. A project related to this survey can be found at https://github.com/LiangSiyuan21/OEOD_Survey.
△ Less
Submitted 9 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning
Authors:
Siyuan Liang,
Kuanrong Liu,
Jiajun Gong,
Jiawei Liang,
Yuan Xun,
Ee-Chien Chang,
Xiaochun Cao
Abstract:
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the in…
▽ More
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the inference phase, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the adverse impacts of such attacks, these defenses often degrade the clean accuracy and necessitate the construction of extensive clean training pairs. In this paper, we explore the possibility of a less-cost defense from the perspective of model unlearning, that is, whether the model can be made to quickly \textbf{u}nlearn \textbf{b}ackdoor \textbf{t}hreats (UBT) by constructing a small set of poisoned samples. Specifically, we strengthen the backdoor shortcuts to discover suspicious samples through overfitting training prioritized by weak similarity samples. Building on the initial identification of suspicious samples, we introduce an innovative token-based localized forgetting training regime. This technique specifically targets the poisoned aspects of the model, applying a focused effort to unlearn the backdoor associations and trying not to damage the integrity of the overall model. Experimental results show that our method not only ensures a minimal success rate for attacks, but also preserves the model's high clean accuracy.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Driven-dissipative phase separation in free-space atomic ensembles
Authors:
Daniel Goncalves,
Lisa Bombieri,
Giovanni Ferioli,
Sara Pancaldi,
Igor Ferrier-Barbut,
Antoine Browaeys,
Ephraim Shahmoon,
Darrick E. Chang
Abstract:
The driven Dicke model, wherein an ensemble of atoms is driven by an external field and undergoes collective spontaneous emission due to coupling to a leaky cavity mode, is a paradigmatic example of a system exhibiting a driven-dissipative phase transition as a function of driving strength. Recently, a similar phenomenon was experimentally observed, not in a cavity setting, but rather in a free-sp…
▽ More
The driven Dicke model, wherein an ensemble of atoms is driven by an external field and undergoes collective spontaneous emission due to coupling to a leaky cavity mode, is a paradigmatic example of a system exhibiting a driven-dissipative phase transition as a function of driving strength. Recently, a similar phenomenon was experimentally observed, not in a cavity setting, but rather in a free-space atomic ensemble. The reason why similar behavior should emerge in free space is not obvious, as the system interacts with a continuum of optical modes, which encodes light propagation effects. Here, we present and solve a simple model to explain the behavior of the free-space system, based on the one-dimensional Maxwell-Bloch equations. On one hand, we show that a free-space ensemble at a low optical depth can exhibit similar behavior as the cavity system, as spatial propagation effects are negligible. On the other hand, in the thermodynamic limit of large atom number, we show that certain observables such as the transmittance or the atomic excited population exhibit non-analytic behavior as a function of the driving intensity, reminiscent of a phase transition. However, a closer analysis reveals that the atomic properties are highly inhomogeneous in space, and based on this we argue that the free-space system does not undergo a phase transition but rather a ``phase separation", roughly speaking, between saturated and unsaturated regions.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
On Practicality of Using ARM TrustZone Trusted Execution Environment for Securing Programmable Logic Controllers
Authors:
Zhiang Li,
Daisuke Mashima,
Wen Shei Ong,
Ertem Esiner,
Zbigniew Kalbarczyk,
Ee-Chien Chang
Abstract:
Programmable logic controllers (PLCs) are crucial devices for implementing automated control in various industrial control systems (ICS), such as smart power grids, water treatment systems, manufacturing, and transportation systems. Owing to their importance, PLCs are often the target of cyber attackers that are aiming at disrupting the operation of ICS, including the nation's critical infrastruct…
▽ More
Programmable logic controllers (PLCs) are crucial devices for implementing automated control in various industrial control systems (ICS), such as smart power grids, water treatment systems, manufacturing, and transportation systems. Owing to their importance, PLCs are often the target of cyber attackers that are aiming at disrupting the operation of ICS, including the nation's critical infrastructure, by compromising the integrity of control logic execution. While a wide range of cybersecurity solutions for ICS have been proposed, they cannot counter strong adversaries with a foothold on the PLC devices, which could manipulate memory, I/O interface, or PLC logic itself. These days, many ICS devices in the market, including PLCs, run on ARM-based processors, and there is a promising security technology called ARM TrustZone, to offer a Trusted Execution Environment (TEE) on embedded devices. Envisioning that such a hardware-assisted security feature becomes available for ICS devices in the near future, this paper investigates the application of the ARM TrustZone TEE technology for enhancing the security of PLC. Our aim is to evaluate the feasibility and practicality of the TEE-based PLCs through the proof-of-concept design and implementation using open-source software such as OP-TEE and OpenPLC. Our evaluation assesses the performance and resource consumption in real-world ICS configurations, and based on the results, we discuss bottlenecks in the OP-TEE secure OS towards a large-scale ICS and desired changes for its application on ICS devices. Our implementation is made available to public for further study and research.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Authors:
Zechun Liu,
Changsheng Zhao,
Forrest Iandola,
Chen Lai,
Yuandong Tian,
Igor Fedorov,
Yunyang Xiong,
Ernie Chang,
Yangyang Shi,
Raghuraman Krishnamoorthi,
Liangzhen Lai,
Vikas Chandra
Abstract:
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in…
▽ More
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.
△ Less
Submitted 26 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
Authors:
Xiaoxia Li,
Siyuan Liang,
Jiyi Zhang,
Han Fang,
Aishan Liu,
Ee-Chien Chang
Abstract:
Large Language Models (LLMs), used in creative writing, code generation, and translation, generate text based on input sequences but are vulnerable to jailbreak attacks, where crafted prompts induce harmful outputs. Most jailbreak prompt methods use a combination of jailbreak templates followed by questions to ask to create jailbreak prompts. However, existing jailbreak prompt designs generally su…
▽ More
Large Language Models (LLMs), used in creative writing, code generation, and translation, generate text based on input sequences but are vulnerable to jailbreak attacks, where crafted prompts induce harmful outputs. Most jailbreak prompt methods use a combination of jailbreak templates followed by questions to ask to create jailbreak prompts. However, existing jailbreak prompt designs generally suffer from excessive semantic differences, resulting in an inability to resist defenses that use simple semantic metrics as thresholds. Jailbreak prompts are semantically more varied than the original questions used for queries. In this paper, we introduce a Semantic Mirror Jailbreak (SMJ) approach that bypasses LLMs by generating jailbreak prompts that are semantically similar to the original question. We model the search for jailbreak prompts that satisfy both semantic similarity and jailbreak validity as a multi-objective optimization problem and employ a standardized set of genetic algorithms for generating eligible prompts. Compared to the baseline AutoDAN-GA, SMJ achieves attack success rates (ASR) that are at most 35.4% higher without ONION defense and 85.2% higher with ONION defense. SMJ's better performance in all three semantic meaningfulness metrics of Jailbreak Prompt, Similarity, and Outlier, also means that SMJ is resistant to defenses that use those metrics as thresholds.
△ Less
Submitted 27 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Authors:
Jiawei Liang,
Siyuan Liang,
Man Luo,
Aishan Liu,
Dongchen Han,
Ee-Chien Chang,
Xiaochun Cao
Abstract:
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting pois…
▽ More
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images, enabling malicious manipulation of the victim model's predictions with predefined triggers. Nevertheless, the frozen visual encoder in autoregressive VLMs imposes constraints on the learning of conventional image triggers. Additionally, adversaries may encounter restrictions in accessing the parameters and architectures of the victim model. To address these challenges, we propose a multimodal instruction backdoor attack, namely VL-Trojan. Our approach facilitates image trigger learning through an isolating and clustering strategy and enhance black-box-attack efficacy via an iterative character-level text trigger generation method. Our attack successfully induces target outputs during inference, significantly surpassing baselines (+62.52\%) in ASR. Moreover, it demonstrates robustness across various model scales and few-shot in-context reasoning scenarios.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition
Authors:
Yang Li,
Yuan Shangguan,
Yuhao Wang,
Liangzhen Lai,
Ernie Chang,
Changsheng Zhao,
Yangyang Shi,
Vikas Chandra
Abstract:
Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are inv…
▽ More
Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. Armed with this insight, we developed design guidelines aimed at optimizing on-device speech recognition models. These guidelines focus on minimizing power use without substantially affecting accuracy. Our method, which employs targeted compression based on the varying sensitivities of weight parameters, demonstrates superior performance compared to state-of-the-art compression methods. It achieves a reduction in energy usage of up to 47% while maintaining similar model accuracy and improving the real-time factor.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
SocraSynth: Multi-LLM Reasoning with Conditional Statistics
Authors:
Edward Y. Chang
Abstract:
Large language models (LLMs), while promising, face criticisms for biases, hallucinations, and a lack of reasoning capability. This paper introduces SocraSynth, a multi-LLM agent reasoning platform developed to mitigate these issues. SocraSynth utilizes conditional statistics and systematic context enhancement through continuous arguments, alongside adjustable debate contentiousness levels. The pl…
▽ More
Large language models (LLMs), while promising, face criticisms for biases, hallucinations, and a lack of reasoning capability. This paper introduces SocraSynth, a multi-LLM agent reasoning platform developed to mitigate these issues. SocraSynth utilizes conditional statistics and systematic context enhancement through continuous arguments, alongside adjustable debate contentiousness levels. The platform typically involves a human moderator and two LLM agents representing opposing viewpoints on a given subject. SocraSynth operates in two main phases: knowledge generation and reasoning evaluation. In the knowledge generation phase, the moderator defines the debate topic and contentiousness level, prompting the agents to formulate supporting arguments for their respective stances. The reasoning evaluation phase then employs Socratic reasoning and formal logic principles to appraise the quality of the arguments presented. The dialogue concludes with the moderator adjusting the contentiousness from confrontational to collaborative, gathering final, conciliatory remarks to aid in human reasoning and decision-making. Through case studies in three distinct application domains, this paper showcases SocraSynth's effectiveness in fostering rigorous research, dynamic reasoning, comprehensive assessment, and enhanced collaboration. This underscores the value of multi-agent interactions in leveraging LLMs for advanced knowledge extraction and decision-making support.
△ Less
Submitted 19 January, 2024;
originally announced February 2024.
-
Domain Bridge: Generative model-based domain forensic for black-box models
Authors:
Jiyi Zhang,
Han Fang,
Ee-Chien Chang
Abstract:
In forensic investigations of machine learning models, techniques that determine a model's data domain play an essential role, with prior work relying on large-scale corpora like ImageNet to approximate the target model's domain. Although such methods are effective in finding broad domains, they often struggle in identifying finer-grained classes within those domains. In this paper, we introduce a…
▽ More
In forensic investigations of machine learning models, techniques that determine a model's data domain play an essential role, with prior work relying on large-scale corpora like ImageNet to approximate the target model's domain. Although such methods are effective in finding broad domains, they often struggle in identifying finer-grained classes within those domains. In this paper, we introduce an enhanced approach to determine not just the general data domain (e.g., human face) but also its specific attributes (e.g., wearing glasses). Our approach uses an image embedding model as the encoder and a generative model as the decoder. Beginning with a coarse-grained description, the decoder generates a set of images, which are then presented to the unknown target model. Successful classifications by the model guide the encoder to refine the description, which in turn, are used to produce a more specific set of images in the subsequent iteration. This iterative refinement narrows down the exact class of interest. A key strength of our approach lies in leveraging the expansive dataset, LAION-5B, on which the generative model Stable Diffusion is trained. This enlarges our search space beyond traditional corpora, such as ImageNet. Empirical results showcase our method's performance in identifying specific attributes of a model's input domain, paving the way for more detailed forensic analyses of deep learning models.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
R$\times$R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Authors:
Gagan Khandate,
Tristan L. Saidi,
Siqi Shang,
Eric T. Chang,
Yang Liu,
Seth Dennis,
Johnson Adams,
Matei Ciocarlie
Abstract:
We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work prese…
▽ More
We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: https://sbrl.cs.columbia.edu
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Exploring Diffusion Time-steps for Unsupervised Representation Learning
Authors:
Zhongqi Yue,
Jiankun Wang,
Qianru Sun,
Lei Ji,
Eric I-Chao Chang,
Hanwang Zhang
Abstract:
Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised l…
▽ More
Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised learning. Specifically, the forward diffusion process incrementally adds Gaussian noise to samples at each time-step, which essentially collapses different samples into similar ones by losing attributes, e.g., fine-grained attributes such as texture are lost with less noise added (i.e., early time-steps), while coarse-grained ones such as shape are lost by adding more noise (i.e., late time-steps). To disentangle the modular attributes, at each time-step t, we learn a t-specific feature to compensate for the newly lost attribute, and the set of all 1,...,t-specific features, corresponding to the cumulative set of lost attributes, are trained to make up for the reconstruction error of a pre-trained DM at time-step t. On CelebA, FFHQ, and Bedroom datasets, the learned feature significantly improves attribute classification and enables faithful counterfactual generation, e.g., interpolating only one specified attribute between two images, validating the disentanglement quality. Codes are in https://github.com/yue-zhongqi/diti.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Transversally exponentially stable Euclidean space extension technique for discrete time systems
Authors:
Soham Shanbhag,
Dong Eui Chang
Abstract:
We propose a modification technique for discrete time systems for exponentially fast convergence to compact sets. The extension technique allows us to use tools defined on Euclidean spaces to systems evolving on manifolds by modifying the dynamics of the system such that the manifold is an attractor set. We show the stability properties of this technique using the simulation of the rigid body rota…
▽ More
We propose a modification technique for discrete time systems for exponentially fast convergence to compact sets. The extension technique allows us to use tools defined on Euclidean spaces to systems evolving on manifolds by modifying the dynamics of the system such that the manifold is an attractor set. We show the stability properties of this technique using the simulation of the rigid body rotation system on the unit sphere $S^3$. We also show the improvement afforded due to this technique on a Luenberger like observer designed for the rigid body rotation system on $S^3$.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Machine learning based state observer for discrete time systems evolving on Lie groups
Authors:
Soham Shanbhag,
Dong Eui Chang
Abstract:
In this paper, a machine learning based observer for systems evolving on manifolds is designed such that the state of the observer is restricted to the Lie group on which the system evolves. Conventional techniques involving machine learning based observers on systems evolving on Lie groups involve designing charts for the Lie group, training a machine learning based observer for each chart, and s…
▽ More
In this paper, a machine learning based observer for systems evolving on manifolds is designed such that the state of the observer is restricted to the Lie group on which the system evolves. Conventional techniques involving machine learning based observers on systems evolving on Lie groups involve designing charts for the Lie group, training a machine learning based observer for each chart, and switching between the trained models based on the state of the system. We propose a novel deep learning based technique whose predictions are restricted to a measure 0 subset of Euclidean space without using charts. Using this network, we design an observer ensuring that the state of the observer is restricted to the Lie group, and predicting the state using only one trained algorithm. The deep learning network predicts an ``error term'' on the Lie algebra of the Lie group, uses the map from the Lie algebra to the group, and uses the group action and the present state to estimate the state at the next epoch. This model being purely data driven does not require the model of the system. The proposed algorithm provides a novel framework for constraining the output of machine learning networks to a measure 0 subset of a Euclidean space without chart specific training and without requiring switching. We show the validity of this method using Monte Carlo simulations performed of the rigid body rotation and translation system.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Angular velocity and linear acceleration measurement bias estimators for the rigid body system with global exponential convergence
Authors:
Soham Shanbhag,
Dong Eui Chang
Abstract:
Rigid body systems usually consider measurements of the pose of the body using onboard cameras/LiDAR systems, that of linear acceleration using an accelerometer and of angular velocity using an IMU. However, the measurements of the linear acceleration and angular velocity are usually biased with an unknown constant or slowly varying bias. We propose a measurement bias estimator for such systems un…
▽ More
Rigid body systems usually consider measurements of the pose of the body using onboard cameras/LiDAR systems, that of linear acceleration using an accelerometer and of angular velocity using an IMU. However, the measurements of the linear acceleration and angular velocity are usually biased with an unknown constant or slowly varying bias. We propose a measurement bias estimator for such systems under assumption of boundedness of angular velocity. We also provide continuous estimates to the state of the system, i.e. the pose, linear velocity, and position of the body. These estimates are globally exponentially convergent to the state of the rigid body system. We propose two bias estimators designed with the estimate of the pose in the ambient Euclidean space of the Special Euclidean group and show global exponential convergence of the proposed observers to the state of the system. The first observer assumes knowledge of bounds of the angular velocity, while the second observer uses a Riccati observer to overcome this limitation. We show the convergence with an example of a rigid body rotation and translation system on the special Euclidean group. We show that the observer is able to estimate the bias using data collected from an Intel Realsense camera.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Globally exponentially convergent observer for systems evolving on matrix Lie groups
Authors:
Soham Shanbhag,
Dong Eui Chang
Abstract:
We propose a globally exponentially convergent observer for the dynamical system evolving on matrix Lie groups with bounded velocity with unknown bound. We design the observer in the ambient Euclidean space and show exponential convergence of the observer to the state of the system. We show the convergence with an example of a rigid body rotation and translation system on the special Euclidean gro…
▽ More
We propose a globally exponentially convergent observer for the dynamical system evolving on matrix Lie groups with bounded velocity with unknown bound. We design the observer in the ambient Euclidean space and show exponential convergence of the observer to the state of the system. We show the convergence with an example of a rigid body rotation and translation system on the special Euclidean group. We compare the proposed observer with an observer present in the literature.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Near-resonant light scattering by an atom in a state-dependent trap
Authors:
Teresa D. Karanikolaou,
Robert J. Bettles,
Darrick E. Chang
Abstract:
The optical properties of a fixed atom are well-known and investigated. For example, the extraordinarily large cross section of a single atom as seen by a resonant photon is essential for quantum optical applications. Mechanical effects associated with light scattering are also well-studied, forming the basis of laser cooling and trap**, for example. Despite this, there is one fundamental proble…
▽ More
The optical properties of a fixed atom are well-known and investigated. For example, the extraordinarily large cross section of a single atom as seen by a resonant photon is essential for quantum optical applications. Mechanical effects associated with light scattering are also well-studied, forming the basis of laser cooling and trap**, for example. Despite this, there is one fundamental problem that surprisingly has not been extensively studied, yet is relevant to a number of emerging quantum optics experiments. In these experiments, the ground state of the atom experiences a tight optical trap formed by far-off-resonant light, to facilitate efficient interactions with near-resonant light. However, the excited state might experience a different potential, or even be anti-trapped. Here, we systematically analyze the effects of unequal trap** on near-resonant atom-light interactions. In particular, we identify regimes where such trap** can lead to significant excess heating, and a reduction of total and elastic scattering cross sections associated with a decreased atom-photon interaction efficiency. Understanding these effects can be valuable for optimizing quantum optics platforms where efficient atom-light interactions on resonance are desired, but achieving equal trap** is not feasible.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Far from equilibrium field theory for strongly coupled light and matter: dynamics of frustrated multi-mode cavity QED
Authors:
Hossein Hosseinabadi,
Darrick E. Chang,
Jamir Marino
Abstract:
Light-matter interfaces have now entered a new stage marked by the ability to engineer quantum correlated states under driven-dissipative conditions. To propel this new generation of experiments, we are confronted with the need to model non-unitary many-body dynamics in strongly coupled regimes, by transcending traditional approaches in quantum optics. In this work, we contribute to this program b…
▽ More
Light-matter interfaces have now entered a new stage marked by the ability to engineer quantum correlated states under driven-dissipative conditions. To propel this new generation of experiments, we are confronted with the need to model non-unitary many-body dynamics in strongly coupled regimes, by transcending traditional approaches in quantum optics. In this work, we contribute to this program by adapting a functional integral technique, conventionally employed in high-energy physics, in order to obtain non-equilibrium dynamics for interacting light-matter systems. Our approach is grounded in constructing 'two-particle irreducible' (2PI) effective actions, which provide a non-perturbative and conserving framework for describing quantum evolution at a polynomial cost in time. We apply our method to complement the analysis of spin glass formation in the context of frustrated multi-mode cavity quantum electrodynamics, initiated in our accompanying work [H. Hosseinabadi, D. Chang, J. Marino, arXiv:2311.05682]. Finally, we outline the capability of the technique to describe other near-term platforms in many-body quantum optics, and its potential to make predictions for this new class of experiments.
△ Less
Submitted 6 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Authors:
Siyuan Liang,
Mingli Zhu,
Aishan Liu,
Baoyuan Wu,
Xiaochun Cao,
Ee-Chien Chang
Abstract:
Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defen…
▽ More
Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses and introduces the \emph{\toolns} attack, which is resistant to backdoor detection and model fine-tuning defenses. To achieve this, we draw motivations from the perspective of the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural trigger patterns. Additionally, we optimize the visual trigger patterns to align the poisoned samples with target vision features in order to hinder the backdoor unlearning through clean fine-tuning. Extensive experiments demonstrate that our attack significantly outperforms state-of-the-art baselines (+45.3% ASR) in the presence of SoTA backdoor defenses, rendering these mitigation and detection strategies virtually ineffective. Furthermore, our approach effectively attacks some more rigorous scenarios like downstream tasks. We believe that this paper raises awareness regarding the potential threats associated with the practical application of multimodal contrastive learning and encourages the development of more robust defense mechanisms.
△ Less
Submitted 4 March, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Improving Adversarial Transferability by Stable Diffusion
Authors:
Jiayang Liu,
Siyu Zhu,
Siyuan Liang,
Jie Zhang,
Han Fang,
Weiming Zhang,
Ee-Chien Chang
Abstract:
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversa…
▽ More
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario. Among these, input transformation-based attacks have demonstrated their effectiveness. In this paper, we explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability. This approach draws inspiration from recent research that harnessed synthetic data generated by Stable Diffusion to enhance model generalization. In particular, previous work has highlighted the correlation between the presence of both real and synthetic data and improved model generalization. Building upon this insight, we introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Furthermore, we propose a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability. Our extensive experimental results demonstrate that our method outperforms state-of-the-art baselines by a substantial margin. Moreover, our approach is compatible with existing transfer-based attacks to further enhance adversarial transferability.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Quantum-to-classical crossover in the spin glass dynamics of cavity QED simulators
Authors:
Hossein Hosseinabadi,
Darrick E. Chang,
Jamir Marino
Abstract:
By solving the quench dynamics of a frustrated many-body spin-boson problem, we investigate the role of spin size on the dynamical formation of spin glass order. In particular, we observe that quantum and classical spin glasses exhibit markedly different evolution. The former displays a quick relaxation of magnetization together with an exponential dependence of the spin glass order parameter on s…
▽ More
By solving the quench dynamics of a frustrated many-body spin-boson problem, we investigate the role of spin size on the dynamical formation of spin glass order. In particular, we observe that quantum and classical spin glasses exhibit markedly different evolution. The former displays a quick relaxation of magnetization together with an exponential dependence of the spin glass order parameter on spin size, while the latter has long-lasting prethermal magnetization and a spin glass order parameter independent of spin size. The quantum-to-classical crossover is sharp and occurs for relatively small spins, highlighting the fragility of the quantum regime. Furthermore, we show that spin glass order is resonantly enhanced when the frequency of the bosonic mediators of the interactions approaches the value of the transverse field. Our predictions are relevant for all spin glass systems with $SU(2)$ degrees of freedom away from equilibrium, and can be examined in recently developed multi-mode cavity QED experiments.
△ Less
Submitted 6 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
On The Open Prompt Challenge In Conditional Audio Generation
Authors:
Ernie Chang,
Sidd Srinivasan,
Mahi Luthra,
Pin-Jie Lin,
Varun Nagaraja,
Forrest Iandola,
Zechun Liu,
Zhaoheng Ni,
Changsheng Zhao,
Yangyang Shi,
Vikas Chandra
Abstract:
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two ke…
▽ More
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two key insights: (1) User prompts are generally under-specified, leading to a large alignment gap between user prompts and training prompts. (2) There is a distribution of audio descriptions for which TTA models are better at generating higher quality audio, which we refer to as ``audionese''. To this end, we rewrite prompts with instruction-tuned models and propose utilizing text-audio alignment as feedback signals via margin ranking learning for audio improvements. On both objective and subjective human evaluations, we observed marked improvements in both text-audio alignment and music audio quality.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
In-Context Prompt Editing For Conditional Audio Generation
Authors:
Ernie Chang,
Pin-Jie Lin,
Yang Li,
Sidd Srinivasan,
Gael Le Lan,
David Kant,
Yangyang Shi,
Forrest Iandola,
Vikas Chandra
Abstract:
Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au…
▽ More
Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional audio generation in the wild as user prompts are under-specified. In particular, we observe a consistent audio quality degradation in generated audio samples with user prompts, as opposed to training set prompts. To this end, we present a retrieval-based in-context prompt editing framework that leverages the training captions as demonstrative exemplars to revisit the user prompts. We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Authors:
Seongsu Bae,
Daeun Kyung,
Jaehee Ryu,
Eunbyeol Cho,
Gyubok Lee,
Sunjun Kweon,
Jungwoo Oh,
Lei Ji,
Eric I-Chao Chang,
Tackeun Kim,
Edward Choi
Abstract:
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o…
▽ More
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa.
△ Less
Submitted 25 December, 2023; v1 submitted 28 October, 2023;
originally announced October 2023.
-
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Authors:
Zexue He,
Yu Wang,
An Yan,
Yao Liu,
Eric Y. Chang,
Amilcare Gentili,
Julian McAuley,
Chun-Nan Hsu
Abstract:
Curated datasets for healthcare are often limited due to the need of human annotations from experts. In this paper, we present MedEval, a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare. MedEval is comprehensive and consists of data from several healthcare systems and spans 35 human body regions from 8 examination modaliti…
▽ More
Curated datasets for healthcare are often limited due to the need of human annotations from experts. In this paper, we present MedEval, a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare. MedEval is comprehensive and consists of data from several healthcare systems and spans 35 human body regions from 8 examination modalities. With 22,779 collected sentences and 21,228 reports, we provide expert annotations at multiple levels, offering a granular potential usage of the data and supporting a wide range of tasks. Moreover, we systematically evaluated 10 generic and domain-specific language models under zero-shot and finetuning settings, from domain-adapted baselines in healthcare to general-purposed state-of-the-art large language models (e.g., ChatGPT). Our evaluations reveal varying effectiveness of the two categories of language models across different tasks, from which we notice the importance of instruction tuning for few-shot usage of large language models. Our investigation paves the way toward benchmarking language models for healthcare and provides valuable insights into the strengths and limitations of adopting large language models in medical domains, informing their practical applications and future advancements.
△ Less
Submitted 14 November, 2023; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Do self-supervised speech and language models extract similar representations as human brain?
Authors:
Peili Chen,
Linyang He,
Li Fu,
Lu Fan,
Edward F. Chang,
Yuanning Li
Abstract:
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models,…
▽ More
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech contextual representations in SSL models and their alignment with the neural network underlying speech perception, offering valuable insights into both SSL models and the neural basis of speech and language processing.
△ Less
Submitted 31 January, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction
Authors:
Jiawei Li,
Chunxu Guo,
Li Fu,
Lu Fan,
Edward F. Chang,
Yuanning Li
Abstract:
Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performan…
▽ More
Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility.
△ Less
Submitted 31 January, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
An Investigation of Multi-feature Extraction and Super-resolution with Fast Microphone Arrays
Authors:
Eric T. Chang,
Runsheng Wang,
Peter Ballentine,
**gxi Xu,
Trey Smith,
Brian Coltin,
Ioannis Kymissis,
Matei Ciocarlie
Abstract:
In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled qu…
▽ More
In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled quickly, are affordable, and occupy a very small footprint. Our prototype sensor uses only a sparse array (8-9 mm spacing) of distributed MEMS microphones (<$1, 3.76 x 2.95 x 1.10 mm) embedded under an elastomer. We use transformer-based architectures for data analysis, taking advantage of the microphones' high sampling rate to run our models on time-series data as opposed to individual snapshots. This approach allows us to obtain 77.3% average accuracy on 4-class texture classification (84.2% when excluding the slowest drag velocity), 1.8 mm mean error on contact localization, and 5.6 mm/s mean error on contact velocity. We show that the learned texture and localization models are robust to varying velocity and generalize to unseen velocities. We also report that our sensor provides fast contact detection, an important advantage of fast transducers. This investigation illustrates the capabilities one can achieve with a MEMS microphone array alone, leaving valuable sensor real estate available for integration with complementary tactile sensing modalities.
△ Less
Submitted 7 March, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Mostree : Malicious Secure Private Decision Tree Evaluation with Sublinear Communication
Authors:
Jianli Bai,
Xiangfu Song,
Xiaowu Zhang,
Qifan Wang,
Shujie Cui,
Ee-Chien Chang,
Giovanni Russello
Abstract:
A private decision tree evaluation (PDTE) protocol allows a feature vector owner (FO) to classify its data using a tree model from a model owner (MO) and only reveals an inference result to the FO. This paper proposes Mostree, a PDTE protocol secure in the presence of malicious parties with sublinear communication. We design Mostree in the three-party honest-majority setting, where an (untrusted)…
▽ More
A private decision tree evaluation (PDTE) protocol allows a feature vector owner (FO) to classify its data using a tree model from a model owner (MO) and only reveals an inference result to the FO. This paper proposes Mostree, a PDTE protocol secure in the presence of malicious parties with sublinear communication. We design Mostree in the three-party honest-majority setting, where an (untrusted) computing party (CP) assists the FO and MO in the secure computation. We propose two low-communication oblivious selection (OS) protocols by exploiting nice properties of three-party replicated secret sharing (RSS) and distributed point function. Mostree combines OS protocols with a tree encoding method and three-party secure computation to achieve sublinear communication. We observe that most of the protocol components already maintain privacy even in the presence of a malicious adversary, and what remains to achieve is correctness. To ensure correctness, we propose a set of lightweight consistency checks and seamlessly integrate them into Mostree. As a result, Mostree achieves sublinear communication and malicious security simultaneously. We implement Mostree and compare it with the state-of-the-art. Experimental results demonstrate that Mostree is efficient and comparable to semi-honest PDTE schemes with sublinear communication. For instance, when evaluated on the MNIST dataset in a LAN setting, Mostree achieves an evaluation using approximately 768 ms with communication of around 168 KB.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
FoleyGen: Visually-Guided Audio Generation
Authors:
Xinhao Mei,
Varun Nagaraja,
Gael Le Lan,
Zhaoheng Ni,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we…
▽ More
Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we introduce FoleyGen, an open-domain V2A generation system built on a language modeling paradigm. FoleyGen leverages an off-the-shelf neural audio codec for bidirectional conversion between waveforms and discrete tokens. The generation of audio tokens is facilitated by a single Transformer model, which is conditioned on visual features extracted from a visual encoder. A prevalent problem in V2A generation is the misalignment of generated audio with the visible actions in the video. To address this, we explore three novel visual attention mechanisms. We further undertake an exhaustive evaluation of multiple visual encoders, each pretrained on either single-modal or multi-modal tasks. The experimental results on VGGSound dataset show that our proposed FoleyGen outperforms previous systems across all objective metrics and human evaluations.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
A solid-state platform for cooperative quantum dynamics driven by correlated emission
Authors:
Xin Li,
Jamir Marino,
Darrick E. Chang,
Benedetta Flebus
Abstract:
While traditionally regarded as an obstacle to quantum coherence, recent breakthroughs in quantum optics have shown that the dissipative interaction of a qubit with its environment can be leveraged to protect quantum states and synthesize many-body entanglement. Inspired by this progress, here we set the stage for the -- yet uncharted -- exploration of analogous cooperative phenomena in hybrid sol…
▽ More
While traditionally regarded as an obstacle to quantum coherence, recent breakthroughs in quantum optics have shown that the dissipative interaction of a qubit with its environment can be leveraged to protect quantum states and synthesize many-body entanglement. Inspired by this progress, here we set the stage for the -- yet uncharted -- exploration of analogous cooperative phenomena in hybrid solid-state platforms. We develop a comprehensive formalism for the quantum many-body dynamics of an ensemble of solid-state spin defects interacting dissipatively with the magnetic field fluctuations of a common solid-state reservoir. Our framework applies to any solid-state reservoir whose fluctuating spin, pseudospin, or charge degrees of freedom generate magnetic fields. To understand whether correlations induced by dissipative processes can play a relevant role in a realistic experimental setup, we apply our model to a qubit array interacting via the spin fluctuations of a ferromagnetic bath. Our results show that the low-temperature collective relaxation rates of the qubit ensemble can display clear signatures of super- and subradiance, i.e., forms of cooperative dynamics traditionally achieved in atomic ensembles. We find that the solid-state analog of these cooperative phenomena is robust against spatial disorder in the qubit ensemble and thermal fluctuations of the magnetic reservoir, providing a route for their feasibility in near-term experiments. Our work lays the foundation for a multi-qubit approach to quantum sensing of solid-state systems and the direct generation of many-body entanglement in spin-defect ensembles. Furthermore, we discuss how the tunability of solid-state reservoirs opens up novel pathways for exploring cooperative phenomena in regimes beyond the reach of conventional quantum optics setups.
△ Less
Submitted 3 June, 2024; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Stack-and-Delay: a new codebook pattern for music generation
Authors:
Gael Le Lan,
Varun Nagaraja,
Ernie Chang,
David Kant,
Zhaoheng Ni,
Yangyang Shi,
Forrest Iandola,
Vikas Chandra
Abstract:
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay…
▽ More
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay style of decoding strategy to improve upon the flat pattern decoding where generation speed is four times faster as opposed to vanilla flat decoding. This brings the inference time close to that of the delay decoding strategy, and allows for faster inference on GPU for small batch sizes. For the same inference efficiency budget as the delay pattern, we show that the proposed approach performs better in objective evaluations, almost closing the gap with the flat pattern in terms of quality. The results are corroborated by subjective evaluations which show that samples generated by the new model are slightly more often preferred to samples generated by the competing model given the same text prompts.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Enhance audio generation controllability through representation similarity regularization
Authors:
Yangyang Shi,
Gael Le Lan,
Varun Nagaraja,
Zhaoheng Ni,
Xinhao Mei,
Ernie Chang,
Forrest Iandola,
Yang Liu,
Vikas Chandra
Abstract:
This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regula…
▽ More
This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regularization to ensure the alignment between the chosen text representation and the language model's predictions. Our proposal involves the incorporation of audio and text representation regularization, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The aim of this proposed representation regularization is to minimize discrepancies in audio and text similarity compared to other samples within the same training batch. Experimental results on both music and audio generation tasks demonstrate that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in the human perception for audio generation.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Authors:
Yang Li,
Liangzhen Lai,
Yuan Shangguan,
Forrest N. Iandola,
Zhaoheng Ni,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear pr…
▽ More
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.
△ Less
Submitted 18 January, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Historia: Refuting Callback Reachability with Message-History Logics (Extended Version)
Authors:
Shawn Meier,
Sergio Mover,
Gowtham Kaki,
Bor-Yuh Evan Chang
Abstract:
This paper determines if a callback can be called by an event-driven framework in an unexpected state.Event-driven programming frameworks are pervasive for creating user-interactive apps on just about every modern platform.Control flow between callbacks is determined by the framework and largely opaque to the programmer.This opacity of the callback control flow not only causes difficulty for the p…
▽ More
This paper determines if a callback can be called by an event-driven framework in an unexpected state.Event-driven programming frameworks are pervasive for creating user-interactive apps on just about every modern platform.Control flow between callbacks is determined by the framework and largely opaque to the programmer.This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those develo** static analysis.Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse or too burdensome and tricky to get right.Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest.The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code - enabling a decoupling of the specification of callback control flow from the analysis of app code.We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning.In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner.Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow.We implemented a prototype message history-based verifier called Historia that enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.
△ Less
Submitted 11 September, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions
Authors:
Wesley Tann,
Yuancheng Liu,
Jun Heng Sim,
Choon Meng Seah,
Ee-Chien Chang
Abstract:
The assessment of cybersecurity Capture-The-Flag (CTF) exercises involves participants finding text strings or ``flags'' by exploiting system vulnerabilities. Large Language Models (LLMs) are natural-language models trained on vast amounts of words to understand and generate text; they can perform well on many CTF challenges. Such LLMs are freely available to students. In the context of CTF exerci…
▽ More
The assessment of cybersecurity Capture-The-Flag (CTF) exercises involves participants finding text strings or ``flags'' by exploiting system vulnerabilities. Large Language Models (LLMs) are natural-language models trained on vast amounts of words to understand and generate text; they can perform well on many CTF challenges. Such LLMs are freely available to students. In the context of CTF exercises in the classroom, this raises concerns about academic integrity. Educators must understand LLMs' capabilities to modify their teaching to accommodate generative AI assistance. This research investigates the effectiveness of LLMs, particularly in the realm of CTF challenges and questions. Here we evaluate three popular LLMs, OpenAI ChatGPT, Google Bard, and Microsoft Bing. First, we assess the LLMs' question-answering performance on five Cisco certifications with varying difficulty levels. Next, we qualitatively study the LLMs' abilities in solving CTF challenges to understand their limitations. We report on the experience of using the LLMs for seven test cases in all five types of CTF challenges. In addition, we demonstrate how jailbreak prompts can bypass and break LLMs' ethical safeguards. The paper concludes by discussing LLM's impact on CTF exercises and its implications.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data
Authors:
Cheol Jun Cho,
Edward F. Chang,
Gopala K. Anumanchipalli
Abstract:
Understanding the neural implementation of complex human behaviors is one of the major goals in neuroscience. To this end, it is crucial to find a true representation of the neural data, which is challenging due to the high complexity of behaviors and the low signal-to-ratio (SNR) of the signals. Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-co…
▽ More
Understanding the neural implementation of complex human behaviors is one of the major goals in neuroscience. To this end, it is crucial to find a true representation of the neural data, which is challenging due to the high complexity of behaviors and the low signal-to-ratio (SNR) of the signals. Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-constrained, behaviorally relevant neural representations of complex behaviors. The key idea is to align representations across repeated trials to learn cross-trial consistent information. Furthermore, we propose a novel, fully differentiable time war** model (TWM) to resolve the temporal misalignment of trials. When applied to intracranial electrocorticography (ECoG) of natural speaking, our model learns better representations for decoding behaviors than the baseline models, especially in lower dimensional space. The TWM is empirically validated by measuring behavioral coherence between aligned trials. The proposed framework learns more cross-trial consistent representations than the baselines, and when visualized, the manifold reveals shared neural trajectories across trials.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Upper bounds on the $2$-colorability threshold of random $d$-regular $k$-uniform hypergraphs for $k\geq 3$
Authors:
Evan Chang,
Neel Kolhe,
Youngtak Sohn
Abstract:
For a large class of random constraint satisfaction problems (CSP), deep but non-rigorous theory from statistical physics predict the location of the sharp satisfiability transition. The works of Ding, Sly, Sun (2014, 2016) and Coja-Oghlan, Panagiotou (2014) established the satisfiability threshold for random regular $k$-NAE-SAT, random $k$-SAT, and random regular $k$-SAT for large enough…
▽ More
For a large class of random constraint satisfaction problems (CSP), deep but non-rigorous theory from statistical physics predict the location of the sharp satisfiability transition. The works of Ding, Sly, Sun (2014, 2016) and Coja-Oghlan, Panagiotou (2014) established the satisfiability threshold for random regular $k$-NAE-SAT, random $k$-SAT, and random regular $k$-SAT for large enough $k\geq k_0$ where $k_0$ is a large non-explicit constant. Establishing the same for small values of $k\geq 3$ remains an important open problem in the study of random CSPs.
In this work, we study two closely related models of random CSPs, namely the $2$-coloring on random $d$-regular $k$-uniform hypergraphs and the random $d$-regular $k$-NAE-SAT model. For every $k\geq 3$, we prove that there is an explicit $d_{\ast}(k)$ which gives a satisfiability upper bound for both of the models. Our upper bound $d_{\ast}(k)$ for $k\geq 3$ matches the prediction from statistical physics for the hypergraph $2$-coloring by Dall'Asta, Ramezanpour, Zecchina (2008), thus conjectured to be sharp. Moreover, $d_{\ast}(k)$ coincides with the satisfiability threshold of random regular $k$-NAE-SAT for large enough $k\geq k_0$ by Ding, Sly, Sun (2014).
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
Authors:
Gangwoo Kim,
Hajung Kim,
Lei Ji,
Seongsu Bae,
Chanhwi Kim,
Mujeen Sung,
Hyunjae Kim,
Kun Yan,
Eric Chang,
Jaewoo Kang
Abstract:
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively…
▽ More
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Authors:
Pin-Jie Lin,
Muhammed Saeed,
Ernie Chang,
Merel Scholman
Abstract:
Develo** effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lin…
▽ More
Develo** effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Revisiting Sample Size Determination in Natural Language Understanding
Authors:
Ernie Chang,
Muhammad Hassan Rashid,
Pin-Jie Lin,
Changsheng Zhao,
Vera Demberg,
Yangyang Shi,
Vikas Chandra
Abstract:
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored…
▽ More
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.