Search | arXiv e-print repository

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01288 [pdf, other]

Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided

Authors: Hongli Zhan, Allen Zheng, Yoon Kyung Lee, **a Suh, Junyi Jessy Li, Desmond C. Ong

Abstract: Large language models (LLMs) have offered new opportunities for emotional support, and recent work has shown that they can produce empathic responses to people in distress. However, long-term mental well-being requires emotional self-regulation, where a one-time empathic response falls short. This work takes a first step by engaging with cognitive reappraisals, a strategy from psychology practitio… ▽ More Large language models (LLMs) have offered new opportunities for emotional support, and recent work has shown that they can produce empathic responses to people in distress. However, long-term mental well-being requires emotional self-regulation, where a one-time empathic response falls short. This work takes a first step by engaging with cognitive reappraisals, a strategy from psychology practitioners that uses language to targetedly change negative appraisals that an individual makes of the situation; such appraisals is known to sit at the root of human emotional experience. We hypothesize that psychologically grounded principles could enable such advanced psychology capabilities in LLMs, and design RESORT which consists of a series of reappraisal constitutions across multiple dimensions that can be used as LLM instructions. We conduct a first-of-its-kind expert evaluation (by clinical psychologists with M.S. or Ph.D. degrees) of an LLM's zero-shot ability to generate cognitive reappraisal responses to medium-length social media messages asking for support. This fine-grained evaluation showed that even LLMs at the 7B scale guided by RESORT are capable of generating empathic responses that can help users reappraise their situations. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.01039 [pdf, other]

A Survey on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide

Authors: Sunwoo Kim, Soo Yong Lee, Yue Gao, Alessia Antelmi, Mirko Polato, Kijung Shin

Abstract: Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications, and thus investigation of deep learning for HOIs has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraph… ▽ More Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications, and thus investigation of deep learning for HOIs has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given the emerging trend, we present the first survey dedicated to HNNs, with an in-depth and step-by-step guide. Broadly, the present survey overviews HNN architectures, training strategies, and applications. First, we break existing HNNs down into four design components: (i) input features, (ii) input structures, (iii) message-passing schemes, and (iv) training strategies. Second, we examine how HNNs address and learn HOIs with each of their components. Third, we overview the recent applications of HNNs in recommendation, biological and medical science, time series analysis, and computer vision. Lastly, we conclude with a discussion on limitations and future directions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00638 [pdf, other]

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Authors: Sunwoo Kim, Shinhwan Kang, Fanchen Bu, Soo Yong Lee, Jaemin Yoo, Kijung Shin

Abstract: Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex… ▽ More Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex hypergraph topology. Designing a generative SSL strategy for hypergraphs, however, is not straightforward. Questions remain with regard to its generative SSL task, connection to downstream tasks, and empirical properties of learned representations. In light of the promises and challenges, we propose a novel generative SSL strategy for hypergraphs. We first formulate a generative SSL task on hypergraphs, hyperedge filling, and highlight its theoretical connection to node classification. Based on the generative SSL task, we propose a hypergraph SSL method, HypeBoy. HypeBoy learns effective general-purpose hypergraph representations, outperforming 16 baseline methods across 11 benchmark datasets. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2404.00300 [pdf, other]

Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation

Authors: Seoyeon Bae, Yoon Kyung Lee, Jungcheol Lee, Jaeheon Kim, Haeseong Jeon, Seung-Hwan Lim, Byung-Cheol Kim, Sowon Hahn

Abstract: A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a step** stone, enabling players to identify with t… ▽ More A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a step** stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 9 pages, 2 figures, 1 table

arXiv:2404.00154 [pdf, other]

Sampling error mitigation through spectrum smoothing in ensemble data assimilation

Authors: Bosu Choi, Yoonsang Lee

Abstract: In data assimilation, an ensemble provides a nonintrusive way to evolve a probability density described by a nonlinear prediction model. Although a large ensemble size is required for statistical accuracy, the ensemble size is typically limited to a small number due to the computational cost of running the prediction model, which leads to a sampling error. Several methods, such as localization, ex… ▽ More In data assimilation, an ensemble provides a nonintrusive way to evolve a probability density described by a nonlinear prediction model. Although a large ensemble size is required for statistical accuracy, the ensemble size is typically limited to a small number due to the computational cost of running the prediction model, which leads to a sampling error. Several methods, such as localization, exist to mitigate the sampling error, often requiring problem-dependent fine-tuning and design. This work introduces another sampling error mitigation method using a smoothness constraint in the Fourier space. In particular, this work smoothes out the spectrum of the system to increase the stability and accuracy even under a small ensemble size. The efficacy of the new idea is validated through a suite of stringent test problems, including Lorenz 96 and Kuramoto-Sivashinsky turbulence models. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2404.00060 [pdf, other]

Temporal Graph Networks for Graph Anomaly Detection in Financial Networks

Authors: Ye** Kim, Youngbin Lee, Minyoung Choe, Sungju Oh, Yongjae Lee

Abstract: This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Networ… ▽ More This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Network (GNN) baselines, as well as cutting-edge hypergraph neural network baselines using DGraph dataset for a realistic financial context. Our results demonstrate that TGN significantly outperforms other models in terms of AUC metrics. This superior performance underlines TGN's potential as an effective tool for detecting financial fraud, showcasing its ability to adapt to the dynamic and complex nature of modern financial systems. We also experimented with various graph embedding modules within the TGN framework and compared the effectiveness of each module. In conclusion, we demonstrated that, even with variations within TGN, it is possible to achieve good performance in the anomaly detection task. △ Less

Submitted 27 March, 2024; originally announced April 2024.

Comments: Presented at the AAAI 2024 Workshop on AI in Finance for Social Impact (https://sites.google.com/view/aifin-aaai2024)

arXiv:2403.19146 [pdf, ps, other]

Improving the Bit Complexity of Communication for Distributed Convex Optimization

Authors: Mehrdad Ghadiri, Yin Tat Lee, Swati Padmanabhan, William Swartworth, David Woodruff, Guanghao Ye

Abstract: We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank app… ▽ More We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank approximation; for a number of these fundamental problems our bounds are nearly optimal, as proven by our lower bounds. Among our techniques, we use the notion of block leverage scores, which have been relatively unexplored in this context, as well as drop** all but the ``middle" bits in Richardson-style algorithms. We also introduce a new communication problem for accurately approximating inner products and establish a lower bound using the spherical Radon transform. Our lower bound can be used to show the first separation of linear programming and linear systems in the distributed model when the number of constraints is polynomial, addressing an open question in prior work. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: To appear in STOC '24. Abstract shortened to meet the arXiv limits. Comments welcome!

arXiv:2403.18881 [pdf]

Transmission IR Microscopy for the Quantitation of Biomolecular Mass In Live Cells

Authors: Yow-Ren Chang, Seong-Min Kim, Young Jong Lee

Abstract: Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applyi… ▽ More Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applying a solvent absorption compensation (SAC) technique to a home-built quantum cascade laser IR microscope. SAC-IR microscopy improves the chemical sensitivity considerably by adjusting the incident light intensity to pre-compensate the IR absorption by water while retaining the full dynamic range. We demonstrate the label-free chemical imaging of key biomolecules of a cell, such as protein, fatty acid, and nucleic acid, with sub-cellular spatial resolution. By imaging live fibroblast cells over twelve hours, we monitor the mass change of the three molecular species of single cells at various phases, including cell division. While the current live-cell imaging demonstration involved three wavenumbers, more wavenumber images could measure more biomolecules in live cells with higher accuracy. As a label-free method to measure absolute quantities of various molecules in a cell, SAC-IR microscopy can potentially become a standard chemical characterization tool for live cells in biology, medicine, and biotechnology. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Body: 19 pages, 5 figures. Supplemental: 11 pages, 6 figures

arXiv:2403.18771 [pdf, other]

CheckEval: Robust Evaluation Framework using Large Language Model via Checklist

Authors: Yukyung Lee, Joonghoon Kim, Jaehee Kim, Hyowon Cho, Pilsung Kang

Abstract: We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more… ▽ More We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more interpretable but also significantly enhances the robustness and reliability of results by focusing on specific evaluation dimensions. Validated through a focused case study using the SummEval benchmark, CheckEval indicates a strong correlation with human judgments. Furthermore, it demonstrates a highly consistent Inter-Annotator Agreement. These findings highlight the effectiveness of CheckEval for objective, flexible, and precise evaluations. By offering a customizable and interactive framework, CheckEval sets a new standard for the use of LLMs in evaluation, responding to the evolving needs of the field and establishing a clear method for future LLM-based evaluation. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: HEAL at CHI 2024

arXiv:2403.18305 [pdf, other]

A Recommender System for NFT Collectibles with Item Feature

Authors: Minjoo Choi, Seonmi Kim, Ye** Kim, Youngbin Lee, Joohwan Hong, Yongjae Lee

Abstract: Recommender systems have been actively studied and applied in various domains to deal with information overload. Although there are numerous studies on recommender systems for movies, music, and e-commerce, comparatively less attention has been paid to the recommender system for NFTs despite the continuous growth of the NFT market. This paper presents a recommender system for NFTs that utilizes a… ▽ More Recommender systems have been actively studied and applied in various domains to deal with information overload. Although there are numerous studies on recommender systems for movies, music, and e-commerce, comparatively less attention has been paid to the recommender system for NFTs despite the continuous growth of the NFT market. This paper presents a recommender system for NFTs that utilizes a variety of data sources, from NFT transaction records to external item features, to generate precise recommendations that cater to individual preferences. We develop a data-efficient graph-based recommender system to efficiently capture the complex relationship between each item and users and generate node(item) embeddings which incorporate both node feature information and graph structure. Furthermore, we exploit inputs beyond user-item interactions, such as image feature, text feature, and price feature. Numerical experiments verify the performance of the graph-based recommender system improves significantly after utilizing all types of item features as side information, thereby outperforming all other baselines. △ Less

Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Presented at the AAAI 2023 Bridge on AI for Financial Services (https://sites.google.com/view/aaai-ai-fin/home)

arXiv:2403.18177 [pdf, ps, other]

Growth rate of liquidity provider's wealth in G3Ms

Authors: Cheuk Yin Lee, Shen-Ning Tung, Tai-Ho Wang

Abstract: Geometric mean market makers (G3Ms), such as Uniswap and Balancer, represent a widely used class of automated market makers (AMMs). These G3Ms are characterized by the following rule: the reserves of the AMM must maintain the same (weighted) geometric mean before and after each trade. This paper investigates the effects of trading fees on liquidity providers' (LP) profitability in a G3M, as well a… ▽ More Geometric mean market makers (G3Ms), such as Uniswap and Balancer, represent a widely used class of automated market makers (AMMs). These G3Ms are characterized by the following rule: the reserves of the AMM must maintain the same (weighted) geometric mean before and after each trade. This paper investigates the effects of trading fees on liquidity providers' (LP) profitability in a G3M, as well as the adverse selection faced by LPs due to arbitrage activities involving a reference market. Our work expands the model described in previous studies for G3Ms, integrating transaction fees and continuous-time arbitrage into the analysis. Within this context, we analyze G3M dynamics, characterized by stochastic storage processes, and calculate the growth rate of LP wealth. In particular, our results align with and extend the results concerning the constant product market maker, commonly referred to as Uniswap v2. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 27 pages

MSC Class: 91G15

arXiv:2403.18148 [pdf, other]

Large Language Models Produce Responses Perceived to be Empathic

Authors: Yoon Kyung Lee, **a Suh, Hongli Zhan, Junyi Jessy Li, Desmond C. Ong

Abstract: Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we sh… ▽ More Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17938 [pdf, other]

Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit Optimization

Authors: Mingi Kwon, Yeonjun Lee, Ickhyun Song

Abstract: This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating… ▽ More This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating how to fulfill the performance parameters with diverse circuit parameters. To overcome issues observed in the traditional Genetic Algorithm (GA), the concept of the Circuit-centric Genetic Algorithm (CGA) is proposed as a viable approach. The new method adopts an inference process that is simpler and computationally more efficient than the existing deep learning models. In addition, CGA offers significant advantages over manual design of finding optimal points and the conventional GA, mitigating the designer's workload while searching for superior optimum points. △ Less

Submitted 18 November, 2023; originally announced March 2024.

Comments: 15 pages, 6 figures, submission to Circuits, Systems and Signal Processing

arXiv:2403.17069 [pdf, other]

Tensor network formulation of symmetry protected topological phases in mixed states

Authors: Hanyu Xue, Jong Yeon Lee, Yimu Bao

Abstract: We define and classify symmetry-protected topological (SPT) phases in mixed states based on the tensor network formulation of the density matrix. In one dimension, we introduce strong injective matrix product density operators (MPDO), which describe a broad class of short-range correlated mixed states, including the locally decohered SPT states. We map strong injective MPDO to a pure state in the… ▽ More We define and classify symmetry-protected topological (SPT) phases in mixed states based on the tensor network formulation of the density matrix. In one dimension, we introduce strong injective matrix product density operators (MPDO), which describe a broad class of short-range correlated mixed states, including the locally decohered SPT states. We map strong injective MPDO to a pure state in the doubled Hilbert space and define the SPT phases according to the cohomology class of the symmetry group in the doubled state. Although the doubled state exhibits an enlarged symmetry, the possible SPT phases are also constrained by the Hermiticity and the semi-positivity of the density matrix. We here obtain a complete classification of SPT phases with a direct product of strong $G$ and weak $K$ unitary symmetry given by the cohomology group $H^2(G, \text{U}(1))\oplus H^1(K, H^1(G, \text{U}(1)))$. The SPT phases in our definition are preserved under symmetric local circuits consisting of non-degenerate channels. This motivates an alternative definition of SPT phases according to the equivalence class of mixed states under a ``one-way" connection using symmetric non-degenerate channels. In locally purifiable MPDO with strong symmetry, we prove that this alternative definition reproduces the cohomology classification. We further extend our results to two-dimensional mixed states described by strong semi-injective tensor network density operators and classify the possible SPT phases. △ Less

Submitted 15 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Appendix D is fixed

arXiv:2403.16837 [pdf, other]

doi 10.1103/PhysRevD.109.082002

Impact of noise transients on gravitational-wave burst detection efficiency of the BayesWave pipeline with multi-detector networks

Authors: Yi Shuen C. Lee, Margaret Millhouse, Andrew Melatos

Abstract: Detection confidence of the source-agnostic gravitational-wave burst search pipeline BayesWave is quantified by the log signal-versus-glitch Bayes factor, $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$. A recent study shows that $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$ increases with the number of detectors. However, the increasing frequency of non-Gaussian noise transients (glitches) in expanded de… ▽ More Detection confidence of the source-agnostic gravitational-wave burst search pipeline BayesWave is quantified by the log signal-versus-glitch Bayes factor, $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$. A recent study shows that $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$ increases with the number of detectors. However, the increasing frequency of non-Gaussian noise transients (glitches) in expanded detector networks is not accounted for in the study. Glitches can mimic or mask burst signals resulting in false alarm detections, consequently reducing detection confidence. This paper an empirical study on the impact of false alarms on the overall performance of BayesWave, with expanded detector networks. The noise background of BayesWave for the Hanford-Livingston (HL, two-detector) and Hanford-Livingston-Virgo (HLV, three-detector) networks are measured using a set of non-astrophysical background triggers from the first half of Advanced LIGO and Advanced Virgo's Third Observing Run (O3a). Efficiency curves are constructed by combining $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$ of simulated binary black hole signals with the background measurements, to characterize BayesWave's detection efficiency as a function of the per-trigger false alarm probability. The HL and HLV network efficiency curves are shown to be similar. A separate analysis finds that detection significance of O3 gravitational-wave candidates as measured by BayesWave are also comparable for the HL and HLV networks. Consistent results from the two independent analyses suggests that the overall burst detection performance of BayesWave does not improve with the addition of Virgo at O3a sensitivity, because the increased false alarm probability offsets the advantage of higher $\ln\mathcal{B}_{\mathcal{S},\mathcal{G}}$. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 16 pages, 8 figures

arXiv:2403.16066 [pdf, other]

A Temporal Graph Network Framework for Dynamic Recommendation

Authors: Ye** Kim, Youngbin Lee, Vincent Yuan, Annika Lee, Yongjae Lee

Abstract: Recommender systems, crucial for user engagement on platforms like e-commerce and streaming services, often lag behind users' evolving preferences due to static data reliance. After Temporal Graph Networks (TGNs) were proposed, various studies have shown that TGN can significantly improve situations where the features of nodes and edges dynamically change over time. However, despite its promising… ▽ More Recommender systems, crucial for user engagement on platforms like e-commerce and streaming services, often lag behind users' evolving preferences due to static data reliance. After Temporal Graph Networks (TGNs) were proposed, various studies have shown that TGN can significantly improve situations where the features of nodes and edges dynamically change over time. However, despite its promising capabilities, it has not been directly applied in recommender systems to date. Our study bridges this gap by directly implementing Temporal Graph Networks (TGN) in recommender systems, a first in this field. Using real-world datasets and a range of graph and history embedding methods, we show TGN's adaptability, confirming its effectiveness in dynamic recommendation scenarios. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Presented at the AAAI 2024 Workshop on Recommendation Ecosystems: Modeling, Optimization and Incentive Design

arXiv:2403.15902 [pdf, other]

Utilizing Motion Matching with Deep Reinforcement Learning for Target Location Tasks

Authors: Jeongmin Lee, Taesoo Kwon, Hyunju Shin, Yoonsang Lee

Abstract: We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we pr… ▽ More We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we propose a unique hit reward and obstacle curriculum scheme to enhance policy learning in environments with moving obstacles. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Eurographics 2024 Short Papers

arXiv:2403.15388 [pdf, other]

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

Abstract: Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which… ▽ More Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which further increases the number of visual tokens significantly. However, due to the inherent design of the Transformer architecture, the computational costs of these models tend to increase quadratically with the number of input tokens. To tackle this problem, we explore a token reduction mechanism that identifies significant spatial redundancy among visual tokens. In response, we propose PruMerge, a novel adaptive visual token reduction strategy that significantly reduces the number of visual tokens without compromising the performance of LMMs. Specifically, to metric the importance of each token, we exploit the sparsity observed in the visual encoder, characterized by the sparse distribution of attention scores between the class token and visual tokens. This sparsity enables us to dynamically select the most crucial visual tokens to retain. Subsequently, we cluster the selected (unpruned) tokens based on their key similarity and merge them with the unpruned tokens, effectively supplementing and enhancing their informational content. Empirically, when applied to LLaVA-1.5, our approach can compress the visual tokens by 14 times on average, and achieve comparable performance across diverse visual question-answering and reasoning tasks. Code and checkpoints are at https://llava-prumerge.github.io/. △ Less

Submitted 22 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Project page: https://llava-prumerge.github.io/

arXiv:2403.14963 [pdf, other]

Enabling Physical Localization of Uncooperative Cellular Devices

Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Dinh-Tuan Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

Abstract: In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is cam** on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal… ▽ More In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is cam** on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal source without its cooperation is challenging even for operators and authorities. Particularly, three challenges remain for fine-grained localization: i) localization works only if devices generate enough uplink traffic reliably over time, ii) the target device might generate its uplink traffic with significantly low power, and iii) cellular repeater may add too much noise to true uplink signals. While these challenges present practical hurdles for localization, they have been overlooked in prior works. In this work, we investigate the impact of these real-world challenges on cellular localization and propose an Uncooperative Multiangulation Attack (UMA) that addresses these challenges. UMA can 1) force a target device to transmit traffic continuously, 2) boost the target's signal strength to the maximum, and 3) uniquely distinguish traffic from the target and the repeaters. Notably, the UMA technique works without privilege on cellular operators or user devices, which makes it operate on any LTE network. Our evaluations show that UMA effectively resolves the challenges in real-world environments when devices are not cooperative for localization. Our approach exploits the current cellular design vulnerabilities, which we have responsibly disclosed to GSMA. △ Less

Submitted 25 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.14353 [pdf, other]

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Authors: Yoonsung Kim, Changhun Oh, **woo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park

Abstract: Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a l… ▽ More Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power. △ Less

Submitted 28 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13589 [pdf, other]

ReGround: Improving Textual and Spatial Grounding at No Cost

Authors: Yuseung Lee, Minhyuk Sung

Abstract: When an image generation process is guided by both a text prompt and spatial cues, such as a set of bounding boxes, do these elements work in harmony, or does one dominate the other? Our analysis of a pretrained image diffusion model that integrates gated self-attention into the U-Net reveals that spatial grounding often outweighs textual grounding due to the sequential flow from gated self-attent… ▽ More When an image generation process is guided by both a text prompt and spatial cues, such as a set of bounding boxes, do these elements work in harmony, or does one dominate the other? Our analysis of a pretrained image diffusion model that integrates gated self-attention into the U-Net reveals that spatial grounding often outweighs textual grounding due to the sequential flow from gated self-attention to cross-attention. We demonstrate that such bias can be significantly mitigated without sacrificing accuracy in either grounding by simply rewiring the network architecture, changing from sequential to parallel for gated self-attention and cross-attention. This surprisingly simple yet effective solution does not require any fine-tuning of the network but significantly reduces the trade-off between the two groundings. Our experiments demonstrate significant improvements from the original GLIGEN to the rewired version in the trade-off between textual grounding and spatial grounding. △ Less

Submitted 30 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Project page: https://re-ground.github.io/

arXiv:2403.12945 [pdf, other]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2403.11385 [pdf, other]

Stochastic approach for elliptic problems in perforated domains

Authors: Jihun Han, Yoonsang Lee

Abstract: A wide range of applications in science and engineering involve a PDE model in a domain with perforations, such as perforated metals or air filters. Solving such perforated domain problems suffers from computational challenges related to resolving the scale imposed by the geometries of perforations. We propose a neural network-based mesh-free approach for perforated domain problems. The method is… ▽ More A wide range of applications in science and engineering involve a PDE model in a domain with perforations, such as perforated metals or air filters. Solving such perforated domain problems suffers from computational challenges related to resolving the scale imposed by the geometries of perforations. We propose a neural network-based mesh-free approach for perforated domain problems. The method is robust and efficient in capturing various configuration scales, including the averaged macroscopic behavior of the solution that involves a multiscale nature induced by small perforations. The new approach incorporates the derivative-free loss method that uses a stochastic representation or the Feynman-Kac formulation. In particular, we implement the Neumann boundary condition for the derivative-free loss method to handle the interface between the domain and perforations. A suite of stringent numerical tests is provided to support the proposed method's efficacy in handling various perforation scales. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 18 pages, 6 figures

MSC Class: 65N99; 65C05; 68T07

arXiv:2403.10882 [pdf, other]

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

Authors: ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, Hye** Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim

Abstract: Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly… ▽ More Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models. △ Less

Submitted 21 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10576 [pdf, other]

Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain

Authors: Eugene Jang, Jian Cui, Dayeon Yim, Young** **, **-Woo Chung, Seungwon Shin, Yongjae Lee

Abstract: Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (suc… ▽ More Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (such as URLs and hash values) that could be unsuitable with the established pretraining methodologies. Previous work in other domains have removed or filtered such text as noise, but the effectiveness of these methods have not been investigated, especially in the cybersecurity domain. We propose different pretraining methodologies and evaluate their effectiveness through downstream tasks and probing tasks. Our proposed strategy (selective MLM and jointly training NLE token classification) outperforms the commonly taken approach of replacing non-linguistic elements (NLEs). We use our domain-customized methodology to train CyBERTuned, a cybersecurity domain language model that outperforms other cybersecurity PLMs on most tasks. △ Less

Submitted 2 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: To appear in NAACL Findings 2024

ACM Class: I.2.7

arXiv:2403.10506 [pdf, other]

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Authors: Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel

Abstract: Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBe… ▽ More Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands and a variety of challenging whole-body manipulation and locomotion tasks. Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies, such as walking or reaching. With HumanoidBench, we provide the robotics community with a platform to identify the challenges arising when solving diverse tasks with humanoid robots, facilitating prompt verification of algorithms and ideas. The open-source code is available at https://humanoid-bench.github.io. △ Less

Submitted 18 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09168 [pdf, other]

VIVID: Human-AI Collaborative Authoring of Vicarious Dialogues from Lecture Videos

Authors: Seulgi Choi, Hyewon Lee, Yoonjoo Lee, Juho Kim

Abstract: The lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and s… ▽ More The lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and seven instructors to present key guidelines and the potential use of large language models (LLM) to transform a monologue lecture script into pedagogically meaningful dialogue. Applying these design guidelines, we created VIVID which allows instructors to collaborate with LLMs to design, evaluate, and modify pedagogical dialogues. In a within-subjects study with instructors (N=12), we show that VIVID helped instructors select and revise dialogues efficiently, thereby supporting the authoring of quality dialogues. Our findings demonstrate the potential of LLMs to assist instructors with creating high-quality educational dialogues across various learning stages. △ Less

Submitted 10 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08500 [pdf]

Highly confined epsilon-near-zero- and surface-phonon polaritons in SrTiO3 membranes

Authors: Ruijuan Xu, Iris Crassee, Hans A. Bechtel, Yixi Zhou, Adrien Bercher, Lukas Korosec, Carl Willem Rischau, Jérémie Teyssier, Kevin J. Crust, Yonghun Lee, Stephanie N. Gilbert Corder, Jiarui Li, Jennifer A. Dionne, Harold Y. Hwang, Alexey B. Kuzmenko, Yin Liu

Abstract: Recent theoretical studies have suggested that transition metal perovskite oxide membranes can enable surface phonon polaritons in the infrared range with low loss and much stronger subwavelength confinement than bulk crystals. Such modes, however, have not been experimentally observed so far. Here, using a combination of far-field Fourier-transform infrared (FTIR) spectroscopy and near-field sync… ▽ More Recent theoretical studies have suggested that transition metal perovskite oxide membranes can enable surface phonon polaritons in the infrared range with low loss and much stronger subwavelength confinement than bulk crystals. Such modes, however, have not been experimentally observed so far. Here, using a combination of far-field Fourier-transform infrared (FTIR) spectroscopy and near-field synchrotron infrared nanospectroscopy (SINS) imaging, we study the phonon-polaritons in a 100 nm thick freestanding crystalline membrane of SrTiO3 transferred on metallic and dielectric substrates. We observe a symmetric-antisymmetric mode splitting giving rise to epsilon-near-zero and Berreman modes as well as highly confined (by a factor of 10) propagating phonon polaritons, both of which result from the deep-subwavelength thickness of the membranes. Theoretical modeling based on the analytical finite-dipole model and numerical finite-difference methods fully corroborate the experimental results. Our work reveals the potential of oxide membranes as a promising platform for infrared photonics and polaritonics. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08272 [pdf, other]

RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education

Authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh

Abstract: The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, studen… ▽ More The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, students engaged in dialogues with ChatGPT to revise their essays. RECIPE4U includes comprehensive records of these interactions, including conversation logs, students' intent, students' self-rated satisfaction, and students' essay edit histories. In particular, we annotate the students' utterances in RECIPE4U with 13 intention labels based on our coding schemes. We establish baseline results for two subtasks in task-oriented dialogue systems within educational contexts: intent detection and satisfaction estimation. As a foundational step, we explore student-ChatGPT interaction patterns through RECIPE4U and analyze them by focusing on students' dialogue, essay data statistics, and students' essay edits. We further illustrate potential applications of RECIPE4U dataset for enhancing the incorporation of LLMs in educational frameworks. RECIPE4U is publicly available at https://zeunie.github.io/RECIPE4U/. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2309.13243

arXiv:2403.08058 [pdf, other]

CHAI: Clustered Head Attention for Efficient LLM Inference

Authors: Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Ye** Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

Abstract: Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and comput… ▽ More Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and compute requirement. We observe that there is a high amount of redundancy across heads on which tokens they pay attention to. Based on this insight, we propose Clustered Head Attention (CHAI). CHAI combines heads with a high amount of correlation for self-attention at runtime, thus reducing both memory and compute. In our experiments, we show that CHAI is able to reduce the memory requirements for storing K,V cache by up to 21.4% and inference time latency by up to 1.73x without any fine-tuning required. CHAI achieves this with a maximum 3.2% deviation in accuracy across 3 different models (i.e. OPT-66B, LLAMA-7B, LLAMA-33B) and 5 different evaluation datasets. △ Less

Submitted 27 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07598 [pdf, other]

Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference

Authors: Changmin Jeon, Seonjun Kim, Juheon Yi, Youngki Lee

Abstract: In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressi… ▽ More In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism. In particular, our system quickly extracts ROIs and dynamically shrinks them, reflecting the effect of the fast-changing characteristics of objects and scenes. It then intelligently combines such scaled ROIs into large canvases to maximize the utilization of inference accelerators such as GPU. Evaluation across various datasets, models, and devices shows Mondrian outperforms state-of-the-art baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by 15.0-19.7% higher accuracy, leading to $\times$6.65 higher throughput than frame-wise inference for processing various 1080p video streams. We will release the code after the paper review. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06009 [pdf, other]

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope. △ Less

Submitted 13 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.04613 [pdf, other]

Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score $ε$-Discretization

Authors: Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen

Abstract: We study the problem of simultaneous predictive inference on multiple outcomes missing at random. We consider a suite of possible simultaneous coverage properties, conditionally on the missingness pattern and on the -- possibly discretized/binned -- feature values. For data with discrete feature distributions, we develop a procedure which attains feature- and missingness-conditional coverage; and… ▽ More We study the problem of simultaneous predictive inference on multiple outcomes missing at random. We consider a suite of possible simultaneous coverage properties, conditionally on the missingness pattern and on the -- possibly discretized/binned -- feature values. For data with discrete feature distributions, we develop a procedure which attains feature- and missingness-conditional coverage; and further improve it via pooling its results after partitioning the unobserved outcomes. To handle general continuous feature distributions, we introduce methods based on discretized feature values. To mitigate the issue that feature-discretized data may fail to remain missing at random, we propose propensity score $ε$-discretization. This approach is inspired by the balancing property of the propensity score, namely that the missing data mechanism is independent of the outcome conditional on the propensity [Rosenbaum and Rubin (1983)]. We show that the resulting pro-CP method achieves propensity score discretized feature- and missingness-conditional coverage, when the propensity score is known exactly or is estimated sufficiently accurately. Furthermore, we consider a stronger inferential target, the squared-coverage guarantee, which penalizes the spread of the coverage proportion. We propose methods -- termed pro-CP2 -- to achieve it with similar conditional properties as we have shown for usual coverage. A key novel technical contribution in our results is that propensity score discretization leads to a notion of approximate balancing, which we formalize and characterize precisely. In extensive empirical experiments on simulated data and on a job search intervention dataset, we illustrate that our procedures provide informative prediction sets with valid conditional coverage. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04191 [pdf, other]

doi 10.1007/JHEP06(2024)166

Probing the mixing between sterile and tau neutrinos in the SHiP experiment

Authors: Ki-Young Choi, Sung Hyun Kim, Yeong Gyun Kim, Kang Young Lee, Kyong Sei Lee, Byung Do Park, Jong Yoon Sohn, Seong Moon Yoo, Chun Sil Yoon

Abstract: We study the expected sensitivity to the mixing between sterile and tau neutrinos directly from the tau neutrino disappearance in the high-energy fixed target experiment. Here, the beam energy is large enough to produce tau neutrinos at the target with large luminosity. During their propagation to the detector, tau neutrinos may oscillate into sterile neutrinos. By examining the energy spectrum of… ▽ More We study the expected sensitivity to the mixing between sterile and tau neutrinos directly from the tau neutrino disappearance in the high-energy fixed target experiment. Here, the beam energy is large enough to produce tau neutrinos at the target with large luminosity. During their propagation to the detector, tau neutrinos may oscillate into sterile neutrinos. By examining the energy spectrum of the observed tau neutrino events, we can probe the mixing between sterile and tau neutrinos directly. In this paper, we consider Scattering and Neutrino Detector (SND) at SHiP experiment as a showcase, which uses 400 GeV protons from SPS at CERN, and expect to observe 7,300 tau and anti-tau neutrinos from the $2\times 10^{20}$ POT for 5 years operation. Assuming the uncertainty of 10\%, we find the sensitivity $|U_{τ4}|^2 \sim 0.08$\, (90\% CL) for $Δm_{41}^2 \sim 500\ \mathrm{eV}^2$ with 10\% background to the signal. We also consider a far SND at the end of the SHiP Hidden Sector Decay Spectrometer (HSDS), in which case the sensitivity would be enhanced to $|U_{τ4}|^2 \sim 0.02$. Away from this mass, the sensitivity becomes lower than $|U_{τ4}|^2 \sim 0.15$ for $Δm_{41}^2 \lesssim 100\ \mathrm{eV}^2$ or $Δm_{41}^2\gtrsim 10^4 \mathrm{eV}^2$. △ Less

Submitted 26 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

Journal ref: J. High Energ. Phys. 2024, 166 (2024)

arXiv:2403.03742 [pdf, other]

Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape room game. To ensure consistent and reliable performance data of older players, our experimenters simulated older participants following specific guidelines. After completing the game, we found a significant reduction in ageism among younger participants. Furthermore, we introduce a new game mechanism that encourages intergenerational collaboration. Our research highlights the potential of VR collaborative games as a practical tool for mitigating ageism. It provides valuable insights for designing immersive VR experiences that foster enhanced intergenerational collaboration. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03392 [pdf, other]

Pulse shape discrimination in an organic scintillation phoswich detector using machine learning techniques

Authors: Yu** Lee, **young Kim, Byoung-cheol Koh, Young Soo Yoon, Chang Hyon Ha

Abstract: We developed machine learning algorithms for distinguishing scintillation signals from a plastic-liquid coupled detector known as a phoswich. The challenge lies in discriminating signals from organic scintillators with similar shapes and short decay times. Using a single-readout phoswich detector, we successfully identified $γ$ radiation signals from two scintillating components. Our Boosted Decis… ▽ More We developed machine learning algorithms for distinguishing scintillation signals from a plastic-liquid coupled detector known as a phoswich. The challenge lies in discriminating signals from organic scintillators with similar shapes and short decay times. Using a single-readout phoswich detector, we successfully identified $γ$ radiation signals from two scintillating components. Our Boosted Decision Tree algorithm demonstrated a maximum discrimination power of 3.02 $\pm$ 0.85 standard deviation in the 950 keV region, providing an efficient solution for self-shielding and enhancing radiation detection capabilities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 11pages, 7 figures

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02939 [pdf, other]

doi 10.1145/3613904.3642196

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

Authors: Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue

Abstract: With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper tit… ▽ More With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles and abstracts. To help researchers spot these connections, we present PaperWeaver, an enriched paper alerts system that provides contextualized text descriptions of recommended papers based on user-collected papers. PaperWeaver employs a computational method based on Large Language Models (LLMs) to infer users' research interests from their collected papers, extract context-specific aspects of papers, and compare recommended and collected papers on these aspects. Our user study (N=15) showed that participants using PaperWeaver were able to better understand the relevance of recommended papers and triage them more confidently when compared to a baseline that presented the related work sections from recommended papers. △ Less

Submitted 9 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted to CHI 2024

arXiv:2403.02892 [pdf, other]

Enhancing Long-Term Person Re-Identification Using Global, Local Body Part, and Head Streams

Authors: Duy Tran Thanh, Yee** Lee, Byeongkeun Kang

Abstract: This work addresses the task of long-term person re-identification. Typically, person re-identification assumes that people do not change their clothes, which limits its applications to short-term scenarios. To overcome this limitation, we investigate long-term person re-identification, which considers both clothes-changing and clothes-consistent scenarios. In this paper, we propose a novel framew… ▽ More This work addresses the task of long-term person re-identification. Typically, person re-identification assumes that people do not change their clothes, which limits its applications to short-term scenarios. To overcome this limitation, we investigate long-term person re-identification, which considers both clothes-changing and clothes-consistent scenarios. In this paper, we propose a novel framework that effectively learns and utilizes both global and local information. The proposed framework consists of three streams: global, local body part, and head streams. The global and head streams encode identity-relevant information from an entire image and a cropped image of the head region, respectively. Both streams encode the most distinct, less distinct, and average features using the combinations of adversarial erasing, max pooling, and average pooling. The local body part stream extracts identity-related information for each body part, allowing it to be compared with the same body part from another image. Since body part annotations are not available in re-identification datasets, pseudo-labels are generated using clustering. These labels are then utilized to train a body part segmentation head in the local body part stream. The proposed framework is trained by backpropagating the weighted summation of the identity classification loss, the pair-based loss, and the pseudo body part segmentation loss. To demonstrate the effectiveness of the proposed method, we conducted experiments on three publicly available datasets (Celeb-reID, PRCC, and VC-Clothes). The experimental results demonstrate that the proposed method outperforms the previous state-of-the-art method. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 16 pages

Journal ref: Neurocomputing, 2024

arXiv:2403.02870 [pdf, other]

Precise Extraction of Deep Learning Models via Side-Channel Attacks on Edge/Endpoint Devices

Authors: Younghan Lee, Sohee Jun, Yungi Cho, Woorim Han, Hyungon Moon, Yunheung Paek

Abstract: With growing popularity, deep learning (DL) models are becoming larger-scale, and only the companies with vast training datasets and immense computing power can manage their business serving such large models. Most of those DL models are proprietary to the companies who thus strive to keep their private models safe from the model extraction attack (MEA), whose aim is to steal the model by training… ▽ More With growing popularity, deep learning (DL) models are becoming larger-scale, and only the companies with vast training datasets and immense computing power can manage their business serving such large models. Most of those DL models are proprietary to the companies who thus strive to keep their private models safe from the model extraction attack (MEA), whose aim is to steal the model by training surrogate models. Nowadays, companies are inclined to offload the models from central servers to edge/endpoint devices. As revealed in the latest studies, adversaries exploit this opportunity as new attack vectors to launch side-channel attack (SCA) on the device running victim model and obtain various pieces of the model information, such as the model architecture (MA) and image dimension (ID). Our work provides a comprehensive understanding of such a relationship for the first time and would benefit future MEA studies in both offensive and defensive sides in that they may learn which pieces of information exposed by SCA are more important than the others. Our analysis additionally reveals that by gras** the victim model information from SCA, MEA can get highly effective and successful even without any prior knowledge of the model. Finally, to evince the practicality of our analysis results, we empirically apply SCA, and subsequently, carry out MEA under realistic threat assumptions. The results show up to 5.8 times better performance than when the adversary has no model information about the victim model. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted by 27th European Symposium on Research in Computer Security (ESORICS 2022)

arXiv:2403.02846 [pdf, other]

FLGuard: Byzantine-Robust Federated Learning via Ensemble of Contrastive Models

Authors: Younghan Lee, Yungi Cho, Woorim Han, Ho Bae, Yunheung Paek

Abstract: Federated Learning (FL) thrives in training a global model with numerous clients by only sharing the parameters of their local models trained with their private training datasets. Therefore, without revealing the private dataset, the clients can obtain a deep learning (DL) model with high performance. However, recent research proposed poisoning attacks that cause a catastrophic loss in the accurac… ▽ More Federated Learning (FL) thrives in training a global model with numerous clients by only sharing the parameters of their local models trained with their private training datasets. Therefore, without revealing the private dataset, the clients can obtain a deep learning (DL) model with high performance. However, recent research proposed poisoning attacks that cause a catastrophic loss in the accuracy of the global model when adversaries, posed as benign clients, are present in a group of clients. Therefore, recent studies suggested byzantine-robust FL methods that allow the server to train an accurate global model even with the adversaries present in the system. However, many existing methods require the knowledge of the number of malicious clients or the auxiliary (clean) dataset or the effectiveness reportedly decreased hugely when the private dataset was non-independently and identically distributed (non-IID). In this work, we propose FLGuard, a novel byzantine-robust FL method that detects malicious clients and discards malicious local updates by utilizing the contrastive learning technique, which showed a tremendous improvement as a self-supervised learning method. With contrastive models, we design FLGuard as an ensemble scheme to maximize the defensive capability. We evaluate FLGuard extensively under various poisoning attacks and compare the accuracy of the global model with existing byzantine-robust FL methods. FLGuard outperforms the state-of-the-art defense methods in most cases and shows drastic improvement, especially in non-IID settings. https://github.com/201younghanlee/FLGuard △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted by 28th European Symposium on Research in Computer Security (ESORICS 2023)

arXiv:2403.02752 [pdf, other]

HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents

Authors: Sam Yu-Te Lee, Kwan-Liu Ma

Abstract: Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, computational linguistics, etc. Previous works approach this problem either from a topic- or entity-based perspective, but they lack interpretability and trust due to poor model alignment. In this paper, we present HIN… ▽ More Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, computational linguistics, etc. Previous works approach this problem either from a topic- or entity-based perspective, but they lack interpretability and trust due to poor model alignment. In this paper, we present HINTs, a visual analytics approach that combines topic- and entity-based techniques seamlessly and integrates Large Language Models (LLMs) as both a general NLP task solver and an intelligent agent. By leveraging the extraction capability of LLMs in the data preparation stage, we model the corpus as a hypergraph that matches the user's mental model when making sense of the corpus. The constructed hypergraph is hierarchically organized with an agglomerative clustering algorithm by combining semantic and connectivity similarity. The system further integrates an LLM-based intelligent chatbot agent in the interface to facilitate sensemaking. To demonstrate the generalizability and effectiveness of the HINTs system, we present two case studies on different domains and a comparative user study. We report our insights on the behavior patterns and challenges when intelligent agents are used to facilitate sensemaking. We find that while intelligent agents can address many challenges in sensemaking, the visual hints that visualizations provide are necessary to address the new problems brought by intelligent agents. We discuss limitations and future work for combining interactive visualization and LLMs more profoundly to better support corpus analysis. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02734 [pdf, ps, other]

doi 10.1016/j.apsusc.2024.159801

Strain tunable electronic ground states in two-dimensional iridate thin films

Authors: Donghan Kim, Byungmin Sohn, Yeonjae Lee, Jeongkeun Song, Mi Kyung Kim, Minjae Kim, Tae Won Noh, Changyoung Kim

Abstract: Quantum phases of matter such as superconducting, ferromagnetic and Wigner crystal states are often driven by the two-dimensionality (2D) of correlated systems. Meanwhile, spin-orbit coupling (SOC) is a fundamental element leading to nontrivial topology which gives rise to quantum phenomena such as the large anomalous Hall effect and nontrivial superconductivity. However, the search for controllab… ▽ More Quantum phases of matter such as superconducting, ferromagnetic and Wigner crystal states are often driven by the two-dimensionality (2D) of correlated systems. Meanwhile, spin-orbit coupling (SOC) is a fundamental element leading to nontrivial topology which gives rise to quantum phenomena such as the large anomalous Hall effect and nontrivial superconductivity. However, the search for controllable platforms with both 2D and SOC has been relatively overlooked so far. Here, we control and study the electronic ground states of iridate ultrathin films having both 2D and SOC by angle-resolved photoemission spectroscopy (ARPES) and dynamical mean field theory (DMFT) calculations. The metallicity of SrIrO$_3$ ultrathin films is controlled down to a monolayer by dimensional and strain manipulation. Our results suggest that the iridate ultrathin films can be a controllable 2D SOC platform exhibiting a variety of phenomena for future functional devices. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

arXiv:2403.02638 [pdf, other]

Real-time portable muography with Hankuk Atmospheric-muon Wide Landsca** : HAWL

Authors: J. Seo, N. Carlin, D. F. F. S. Cavalcante, J. S. Chung, L. E. Franca, C. Ha, J. Kim, J. Y. Kim, H. Kimku, B. C. Koh, Y. J. Lee, B. B. Manzato, S. W. Oh, R. L. C. Pitta, S. J. Won

Abstract: Cosmic ray muons prove valuable across various fields, from particle physics experiments to non-invasive tomography, thanks to their high flux and exceptional penetrating capability. Utilizing a scintillator detector, one can effectively study the topography of mountains situated above tunnels and underground spaces. The Hankuk Atmospheric-muon Wide Landsca** (HAWL) project successfully charts t… ▽ More Cosmic ray muons prove valuable across various fields, from particle physics experiments to non-invasive tomography, thanks to their high flux and exceptional penetrating capability. Utilizing a scintillator detector, one can effectively study the topography of mountains situated above tunnels and underground spaces. The Hankuk Atmospheric-muon Wide Landsca** (HAWL) project successfully charts the mountainous region of eastern Korea by measuring cosmic ray muons with a detector in motion. The real-time muon flux measurement shows a tunnel length accuracy of 6.0 %, with a detectable overburden range spanning from 8 to 400 meter-water-equivalent depth. This is the first real-time portable muon tomography. △ Less

Submitted 28 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 10pages, 12 figures

arXiv:2403.01858 [pdf, other]

An Improved Traditional Chinese Evaluation Suite for Foundation Model

Authors: Zhi-Rui Tam, Ya-Ting Pai, Yen-Wei Lee, Sega Cheng, Hong-Han Shuai

Abstract: We present TMMLU+, a comprehensive dataset designed for the Traditional Chinese massive multitask language understanding dataset. TMMLU+ is a multiple-choice question-answering dataset with 66 subjects from elementary to professional level. Compared to its predecessor, TMMLU, TMMLU+ is six times larger and boasts a more balanced subject distribution. We included benchmark results in TMMLU+ from cl… ▽ More We present TMMLU+, a comprehensive dataset designed for the Traditional Chinese massive multitask language understanding dataset. TMMLU+ is a multiple-choice question-answering dataset with 66 subjects from elementary to professional level. Compared to its predecessor, TMMLU, TMMLU+ is six times larger and boasts a more balanced subject distribution. We included benchmark results in TMMLU+ from closed-source models and 24 open-weight Chinese large language models of parameters ranging from 1.8B to 72B. Our findings reveal that Traditional Chinese models still trail behind their Simplified Chinese counterparts. Additionally, current large language models have yet to outperform human performance in average scores. We publicly release our dataset and the corresponding benchmark source code. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01749 [pdf, other]

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab… ▽ More Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications. Our code and data are available at https://github.com/AI-secure/aug-pe. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01479 [pdf, other]

Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation

Authors: Heegon **, Seonil Son, Jemin Park, Youngseok Kim, Hyungjong Noh, Yeonsoo Lee

Abstract: The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, w… ▽ More The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, we introduce the 'Align-to-Distill' (A2D) strategy, designed to address the feature map** problem by adaptively aligning student attention heads with their teacher counterparts during training. The Attention Alignment Module in A2D performs a dense head-by-head comparison between student and teacher attention heads across layers, turning the combinatorial map** heuristics into a learning problem. Our experiments show the efficacy of A2D, demonstrating gains of up to +3.61 and +0.63 BLEU points for WMT-2022 De->Dsb and WMT-2014 En->De, respectively, compared to Transformer baselines. △ Less

Submitted 25 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2403.00827 [pdf, other]

Self-Refinement of Language Models from External Proxy Metrics Feedback

Authors: Keshav Ramji, Young-Suk Lee, Ramón Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos

Abstract: It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial re… ▽ More It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial response along key dimensions of quality guided by external metrics feedback, yielding an overall better final response. ProMiSe leverages feedback on response quality through principle-specific proxy metrics, and iteratively refines its response one principle at a time. We apply ProMiSe to open source language models Flan-T5-XXL and Llama-2-13B-Chat, to evaluate its performance on document-grounded question answering datasets, MultiDoc2Dial and QuAC, demonstrating that self-refinement improves response quality. We further show that fine-tuning Llama-2-13B-Chat on the synthetic dialogue data generated by ProMiSe yields significant performance improvements over the zero-shot baseline as well as a supervised fine-tuned model on human annotated data. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2403.00334 [pdf, other]

NOVA: A visual interface for assessing polarizing media coverage

Authors: Keshav Dasu, Sam Yu-Te Lee, Ying-Cheng Chen, Kwan-Liu Ma

Abstract: Within the United States, the majority of the populace receives their news online. U.S mainstream media outlets both generate and influence the news consumed by U.S citizens. Many of these citizens have their personal beliefs about these outlets and question the fairness of their reporting. We offer an interactive visualization system for the public to assess their perception of the mainstream med… ▽ More Within the United States, the majority of the populace receives their news online. U.S mainstream media outlets both generate and influence the news consumed by U.S citizens. Many of these citizens have their personal beliefs about these outlets and question the fairness of their reporting. We offer an interactive visualization system for the public to assess their perception of the mainstream media's coverage of a topic against the data. Our system combines belief elicitation techniques and narrative structure designs, emphasizing transparency and user-friendliness to facilitate users' self-assessment on personal beliefs. We gathered $\sim${25k} articles from the span of 2020-2022 from six mainstream media outlets as a testbed. To evaluate our system, we present usage scenarios alongside a user study with a qualitative analysis of user exploration strategies for personal belief assessment. We report our observations from this study and discuss future work and challenges of develo** tools for the public to assess media outlet coverage and belief updating on provocative topics. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Showing 101–150 of 3,305 results for author: Lee, Y