Search | arXiv e-print repository

Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper, we analyze the impact of such shortcuts in AIGT detection. We propose Feedback-based Adversarial Instruction List Optimization (FAILOpt), an attack that searches for instructions deceptive to AIGT detectors exploiting prompt-specific shortcuts. FAILOpt effectively drops the detection performance of the target detector, comparable to other attacks based on adversarial in-context examples. We also utilize our method to enhance the robustness of the detector by mitigating the shortcuts. Based on the findings, we further train the classifier with the dataset augmented by FAILOpt prompt. The augmented classifier exhibits improvements across generation models, tasks, and attacks. Our code will be available at https://github.com/zxcvvxcz/FAILOpt. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures, 13 tables, under review

arXiv:2404.11972 [pdf, other]

Aligning Language Models to Explicitly Handle Ambiguity

Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

Abstract: In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure… ▽ More In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. △ Less

Submitted 16 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.19254 [pdf, other]

Imperceptible Protection against Style Imitation from Diffusion Models

Authors: Namhyuk Ahn, Wonhyuk Ahn, KiYoon Yoo, Daesik Kim, Seung-Hun Nam

Abstract: Recent progress in diffusion models has profoundly enhanced the fidelity of image generation. However, this has raised concerns about copyright infringements. While prior methods have introduced adversarial perturbations to prevent style imitation, most are accompanied by the degradation of artworks' visual quality. Recognizing the importance of maintaining this, we develop a visually improved pro… ▽ More Recent progress in diffusion models has profoundly enhanced the fidelity of image generation. However, this has raised concerns about copyright infringements. While prior methods have introduced adversarial perturbations to prevent style imitation, most are accompanied by the degradation of artworks' visual quality. Recognizing the importance of maintaining this, we develop a visually improved protection method that preserves its protection capability. To this end, we create a perceptual map to identify areas most sensitive to human eyes. We then adjust the protection intensity guided by an instance-aware refinement. We also integrate a perceptual constraints bank to further improve the imperceptibility. Results show that our method substantially elevates the quality of the protected image without compromising on protection efficacy. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.11548 [pdf, other]

KMMLU: Measuring Massive Multitask Language Understanding in Korean

Authors: Gui** Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman

Abstract: We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. While prior Korean benchmarks are translated from existing English benchmarks, KMMLU is collected from original Korean exams, capturing linguistic and cultural aspects of the Korean language. We test 27 public and proprietary LLMs and observe the best publ… ▽ More We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. While prior Korean benchmarks are translated from existing English benchmarks, KMMLU is collected from original Korean exams, capturing linguistic and cultural aspects of the Korean language. We test 27 public and proprietary LLMs and observe the best public model to score 50.5%, leaving significant room for improvement. This model was primarily trained for English and Chinese, not Korean. Current LLMs tailored to Korean, such as Polyglot-Ko, perform far worse. Surprisingly, even the most capable proprietary LLMs, e.g., GPT-4 and HyperCLOVA X do not exceed 60%. This suggests that further work is needed to improve LLMs for Korean, and we believe KMMLU offers the appropriate tool to track this progress. We make our dataset publicly available on the Hugging Face Hub and integrate the benchmark into EleutherAI's Language Model Evaluation Harness. △ Less

Submitted 6 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Under Review

arXiv:2402.11253 [pdf, other]

Aligning Large Language Models by On-Policy Self-Judgment

Authors: Sangkyu Lee, Sungdong Kim, Ashkan Yousefpour, Minjoon Seo, Kang Min Yoo, Youngjae Yu

Abstract: Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, SELF-JUDGE that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we p… ▽ More Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, SELF-JUDGE that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we propose Judge-augmented Supervised Fine-Tuning (JSFT) to train a single model to act as both a policy and a judge. Specifically, we view the pairwise judgment task, choosing the better response from a response pair, as a special case of the instruction-following task. The resulting model can judge preferences of on-the-fly responses from current policy initialized from itself. Experimental results show the efficacy of SELF-JUDGE, outperforming baselines in preference benchmarks. We also show that the rejecting sampling by itself can improve performance further without an additional evaluator. △ Less

Submitted 25 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: Published as a main conference paper at ACL 2024

arXiv:2402.08093 [pdf, other]

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Authors: Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

Abstract: We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra… ▽ More We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts raw texts into discrete codes ("speechcodes") followed by a convolution-based decoder which converts these speechcodes into waveforms in an incremental, streamable manner. Further, our speechcodes are built using a novel speech tokenization technique that features speaker ID disentanglement and compression with byte-pair encoding. Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences. We design and share a specialized dataset to measure these emergent abilities for text-to-speech. We showcase state-of-the-art naturalness of BASE TTS by evaluating against baselines that include publicly available large-scale text-to-speech systems: YourTTS, Bark and TortoiseTTS. Audio samples generated by the model can be heard at https://amazon-ltts-paper.com/. △ Less

Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: v1.1 (fixed typos)

arXiv:2402.05706 [pdf, other]

Unified Speech-Text Pretraining for Spoken Dialog Modeling

Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

Abstract: While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with… ▽ More While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with organic prosodic features relevant to the given input speech without relying on automatic speech recognition (ASR) or text-to-speech (TTS) solutions. Our approach employs a multi-step speech-text inference scheme that leverages chain-of-reasoning capabilities exhibited by the underlying LLM. We also propose a generalized speech-text pretraining scheme that helps with capturing cross-modal semantics. Automatic and human evaluations show that the proposed approach is effective in generating natural-sounding spoken responses, outperforming both prior and cascaded baselines. Detailed comparative studies reveal that, despite the cascaded approach being stronger in individual components, the joint speech-text modeling improves robustness against recognition errors and speech quality. Demo is available at https://unifiedsdm.github.io. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.05141 [pdf, other]

Open Domain Generalization with a Single Network by Regularization Exploiting Pre-trained Features

Authors: Inseop Chung, KiYoon Yoo, Nojun Kwak

Abstract: Open Domain Generalization (ODG) is a challenging task as it not only deals with distribution shifts but also category shifts between the source and target datasets. To handle this task, the model has to learn a generalizable representation that can be applied to unseen domains while also identify unknown classes that were not present during training. Previous work has used multiple source-specifi… ▽ More Open Domain Generalization (ODG) is a challenging task as it not only deals with distribution shifts but also category shifts between the source and target datasets. To handle this task, the model has to learn a generalizable representation that can be applied to unseen domains while also identify unknown classes that were not present during training. Previous work has used multiple source-specific networks, which involve a high computation cost. Therefore, this paper proposes a method that can handle ODG using only a single network. The proposed method utilizes a head that is pre-trained by linear-probing and employs two regularization terms, each targeting the regularization of feature extractor and the classification head, respectively. The two regularization terms fully utilize the pre-trained features and collaborate to modify the head of the model without excessively altering the feature extractor. This ensures a smoother softmax output and prevents the model from being biased towards the source domains. The proposed method shows improved adaptability to unseen domains and increased capability to detect unseen classes as well. Extensive experiments show that our method achieves competitive performance in several benchmarks. We also justify our method with careful analysis of the effect on the logits, features, and the head. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.08766 [pdf]

Dirac Bilayer Metasurfaces as an Inverse Gires-Tournois Etalon

Authors: Ki Young Lee, Kwang Wook Yoo, Francesco Monticone, Jae Woong Yoon

Abstract: Efficient transmissive pure-phase resonances are highly desirable for optical modulation and wavefront engineering. Here, we propose a novel principle to realize a pure-phase resonance in an extremely broad transmission band, as opposed to previous approaches restricted to operating in reflection mode or over a narrow spectral band. We show that a glide-symmetric bilayer metasurface mathematically… ▽ More Efficient transmissive pure-phase resonances are highly desirable for optical modulation and wavefront engineering. Here, we propose a novel principle to realize a pure-phase resonance in an extremely broad transmission band, as opposed to previous approaches restricted to operating in reflection mode or over a narrow spectral band. We show that a glide-symmetric bilayer metasurface mathematically mimicking a two-dimensional Dirac semimetal induces unidirectional guided-mode excitation and perfect leakage-radiation blazing at the transmission channel. These effects create a peculiar resonant-scattering configuration, similar to the classical reflective Gires-Tournois etalon, but in transmission, providing full 2pi phase modulation with constant transmittance near 100%. Most importantly, this effect persists over an extremely wide band, associated with topological effects. Hence, our proposed approach produces a spectrally and parametrically robust pure-phase resonance effect in transmission, which is highly beneficial for practical applications. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 11 pages, 5 figures

arXiv:2311.07820 [pdf, other]

On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model

Authors: Nohil Park, Joonsuk Park, Kang Min Yoo, Sungroh Yoon

Abstract: An exciting advancement in the field of multilingual models is the emergence of autoregressive models with zero- and few-shot capabilities, a phenomenon widely reported in large-scale language models. To further improve model adaptation to cross-lingual tasks, another trend is to further fine-tune the language models with either full fine-tuning or parameter-efficient tuning. However, the interact… ▽ More An exciting advancement in the field of multilingual models is the emergence of autoregressive models with zero- and few-shot capabilities, a phenomenon widely reported in large-scale language models. To further improve model adaptation to cross-lingual tasks, another trend is to further fine-tune the language models with either full fine-tuning or parameter-efficient tuning. However, the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models has yet to be studied. Specifically, we lack an understanding of the role of linguistic distributions in multilingual models in the effectiveness of token-based prompt tuning. To address this question, we conduct experiments comparing prompt tuning and fine-tuning on the decoder-based multilingual model, XGLM, with four cross-lingual tasks (XNLI, PAWS-X, POS, NER). According to our study, prompt tuning achieves on par or better performance over fine-tuning across all languages while updating at most 0.13\% of the model parameters. Moreover, we empirically show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning. Our further analysis shows that the phenomenon is related to the tokenization scheme of the multilingual model. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.14849 [pdf, other]

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

Authors: Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Abstract: When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the… ▽ More When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model's generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model's performance. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023

arXiv:2310.09518 [pdf, other]

Instruction Tuning with Human Curriculum

Authors: Bruce W. Lee, Hyunsoo Cho, Kang Min Yoo

Abstract: In this work, we (1) introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework that complements our theoretical approach. Distinct from the existing instruction tuning dataset, our generation pipeline is systematically structured to emulate the sequential and orde… ▽ More In this work, we (1) introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework that complements our theoretical approach. Distinct from the existing instruction tuning dataset, our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning. Additionally, we describe a methodology for generating instruction-response datasets that extensively span the various stages of human education, from middle school through the graduate level, utilizing educational subject catalogs. Before training, we meticulously organize the instruction data to ensure that questions escalate in difficulty regarding (A) the subject matter and (B) the intricacy of the instructions. The findings of our study reveal that substantial improvements in performance can be achieved through the mere application of curriculum ordering to instruction data (achieving gains of +4.76 on TruthfulQA, +2.98 on MMLU, +2.8 on OpenbookQA, and +1.28 on ARC-hard) compared to random shuffling. This enhancement is achieved without incurring additional computational expenses. Through comprehensive experimentation, we observe that the advantages of our proposed method are consistently evident across nine benchmarks. △ Less

Submitted 16 June, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: NAACL 2024

arXiv:2308.00221 [pdf, other]

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Authors: KiYoon Yoo, Wonhyuk Ahn, Nojun Kwak

Abstract: We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. While existing zero-bit watermark methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language m… ▽ More We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. While existing zero-bit watermark methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Through allocating tokens onto different parts of the messages, we embed longer messages in high corruption settings without added latency. By independently embedding sub-units of messages, the proposed method outperforms the existing works in terms of robustness and latency. Leveraging the benefits of zero-bit watermarking, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Code is released here: https://github.com/bangawayoo/mb-lm-watermarking △ Less

Submitted 19 March, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: NAACL 2024 main. 9 pages and appendix

arXiv:2306.15801 [pdf, other]

doi 10.1140/epjc/s10052-023-12137-y

Production of antihydrogen atoms by 6 keV antiprotons through a positronium cloud

Authors: P. Adrich, P. Blumer, G. Caratsch, M. Chung, P. Cladé, P. Comini, P. Crivelli, O. Dalkarov, P. Debu, A. Douillet, D. Drapier, P. Froelich, N. Garroum, S. Guellati-Khelifa, J. Guyomard, P-A. Hervieux, L. Hilico, P. Indelicato, S. Jonsell, J-P. Karr, B. Kim, S. Kim, E-S. Kim, Y. J. Ko, T. Kosinski , et al. (39 additional authors not shown)

Abstract: We report on the first production of an antihydrogen beam by charge exchange of 6.1 keV antiprotons with a cloud of positronium in the GBAR experiment at CERN. The antiproton beam was delivered by the AD/ELENA facility. The positronium target was produced from a positron beam itself obtained from an electron linear accelerator. We observe an excess over background indicating antihydrogen productio… ▽ More We report on the first production of an antihydrogen beam by charge exchange of 6.1 keV antiprotons with a cloud of positronium in the GBAR experiment at CERN. The antiproton beam was delivered by the AD/ELENA facility. The positronium target was produced from a positron beam itself obtained from an electron linear accelerator. We observe an excess over background indicating antihydrogen production with a significance of 3-4 standard deviations. △ Less

Submitted 3 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Journal ref: European Physical Journal C 83, 1004 (2023)

arXiv:2305.14152 [pdf, other]

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Authors: Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, Dongsoo Lee

Abstract: Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during fine-tuning, the inherent size of pre-trained LLM weights continues to be a pressing concern. Even though quantization techniques are widely proposed… ▽ More Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during fine-tuning, the inherent size of pre-trained LLM weights continues to be a pressing concern. Even though quantization techniques are widely proposed to ease memory demands and accelerate LLM inference, most of these techniques are geared towards the deployment phase. To bridge this gap, this paper presents Parameter-Efficient and Quantization-aware Adaptation (PEQA) - a simple yet effective method that combines the advantages of PEFT with quantized LLMs. By updating solely the quantization scales, PEQA can be directly applied to quantized LLMs, ensuring seamless task transitions. Parallel to existing PEFT methods, PEQA significantly reduces the memory overhead associated with the optimizer state. Furthermore, it leverages the advantages of quantization to substantially reduce model sizes. Even after fine-tuning, the quantization structure of a PEQA-tuned LLM remains intact, allowing for accelerated inference on the deployment stage. We employ PEQA-tuning for task-specific adaptation on LLMs with up to 65 billion parameters. To assess the logical reasoning and language comprehension of PEQA-tuned LLMs, we fine-tune low-bit quantized LLMs using a instruction dataset. Our results show that even when LLMs are quantized to below 4-bit precision, their capabilities in language modeling, few-shot in-context learning, and comprehension can be resiliently restored to (or even improved over) their full-precision original performances with PEQA. △ Less

Submitted 28 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Published at NeurIPS 2023. Camera-ready version

arXiv:2305.13735 [pdf, other]

Aligning Large Language Models through Synthetic Feedback

Authors: Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, Minjoon Seo

Abstract: Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprieta… ▽ More Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprietary LLMs. First, we perform reward modeling (RM) with synthetic feedback by contrasting responses from vanilla LLMs with various sizes and prompts. Then, we use the RM to simulate high-quality demonstrations to train a supervised policy and further optimize the model with reinforcement learning. Our resulting model, Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms recent open-sourced models, which are trained on the outputs of InstructGPT or human-annotated demonstrations, in alignment benchmarks. In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55.0% and 58.5% of the time, respectively. Further analyses demonstrate the efficacy and importance of synthetic feedback in our framework. The code is available at https://github.com/naver-ai/almost △ Less

Submitted 20 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 main conference

arXiv:2305.01904 [pdf, other]

Robust Multi-bit Natural Language Watermarking through Invariant Features

Authors: KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, Nojun Kwak

Abstract: Recent years have witnessed a proliferation of valuable original natural language contents found in subscription-based media outlets, web novel platforms, and outputs of large language models. However, these contents are susceptible to illegal piracy and potential misuse without proper security measures. This calls for a secure watermarking system to guarantee copyright protection through leakage… ▽ More Recent years have witnessed a proliferation of valuable original natural language contents found in subscription-based media outlets, web novel platforms, and outputs of large language models. However, these contents are susceptible to illegal piracy and potential misuse without proper security measures. This calls for a secure watermarking system to guarantee copyright protection through leakage tracing or ownership identification. To effectively combat piracy and protect copyrights, a multi-bit watermarking framework should be able to embed adequate bits of information and extract the watermarks in a robust manner despite possible corruption. In this work, we explore ways to advance both payload and robustness by following a well-known proposition from image watermarking and identify features in natural language that are invariant to minor corruption. Through a systematic analysis of the possible sources of errors, we further propose a corruption-resistant infill model. Our full method improves upon the previous work on robustness by +16.8% point on average on four datasets, three corruption types, and two corruption ratios. Code available at https://github.com/bangawayoo/nlp-watermarking. △ Less

Submitted 9 June, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: ACL 2023 long

arXiv:2305.00215 [pdf]

Observation of linear magnetoelectric effect in a Dirac magnon antiferromagnet Cu$_3$TeO$_6$

Authors: Aga Shahee, Kyongjun Yoo, B. Koteswararao, N. V. Ter-Oganessian, Kee Hoon Kim

Abstract: Cu$_3$TeO$_6$, a three-dimensional antiferromagnet forming a unique spin-web lattice of spin-1/2 Cu2+ ions below the Neel temperature T$_N$ = 62 K, has recently been found to exhibit topological Dirac or nodal magnon dispersion. In this study, we report the discovery of the linear magnetoelectric (ME) effects in Cu$_3$TeO$_6$ below TN. Our pyroelectric current measurements at a constant magnetic f… ▽ More Cu$_3$TeO$_6$, a three-dimensional antiferromagnet forming a unique spin-web lattice of spin-1/2 Cu2+ ions below the Neel temperature T$_N$ = 62 K, has recently been found to exhibit topological Dirac or nodal magnon dispersion. In this study, we report the discovery of the linear magnetoelectric (ME) effects in Cu$_3$TeO$_6$ below TN. Our pyroelectric current measurements at a constant magnetic field (H) reveal a linear increase of electric polarization (P) with H for both P // H and P $\perp$ H configurations; a maximum P // [110] = 20 $μ$C/m$^2$ is obtained at u0H // [1-10] = 14 T, corresponding to a linear ME coefficient 1.8 ps/m. Magnetic point group analysis and Monte-Carlo simulations confirm that finite linear ME coefficients are allowed in the off-diagonal and diagonal ME tensor components, consistent with the magnetic point group of $\bar{3}'$. As the parity-time symmetry can be broken in the presence of H or electric field E in the linear ME materials, we envisage that Cu$_3$TeO$_6$ should exhibit a H- or E-induced transformation in the topological magnon dispersion from a Dirac point/nodal line type into two Weyl point types. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2301.11660 [pdf, other]

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

Authors: Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, Sang-goo Lee

Abstract: As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclea… ▽ More As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various language models with different scales. △ Less

Submitted 13 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: *SEM 2023

arXiv:2212.10938 [pdf, other]

Critic-Guided Decoding for Controlled Text Generation

Authors: Minbeom Kim, Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung

Abstract: Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language… ▽ More Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: 11 pages, 6 figures

arXiv:2212.10873 [pdf, other]

Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners

Authors: Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Abstract: Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feat… ▽ More Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario. △ Less

Submitted 13 June, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: AAAI 2023

arXiv:2210.11034 [pdf, other]

Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

Authors: Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee

Abstract: Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience. Most recent studies in OOD detection utilize the information from a single representation that resides in the penultimate layer to determine whether the input is anomalous or not. Although such a method is straightforward, th… ▽ More Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution, which is crucial to maintaining high reliability and a good user experience. Most recent studies in OOD detection utilize the information from a single representation that resides in the penultimate layer to determine whether the input is anomalous or not. Although such a method is straightforward, the potential of diverse information in the intermediate layers is overlooked. In this paper, we propose a novel framework based on contrastive learning that encourages intermediate features to learn layer-specialized representations and assembles them implicitly into a single representation to absorb rich information in the pre-trained language model. Extensive experiments in various intent classification and OOD datasets demonstrate that our approach is significantly more effective than other works. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: EMNLP Findings 2022

arXiv:2210.06870 [pdf]

doi 10.1007/s40042-022-00471-5

Direct visualization and control of SrOx segregation on semiconducting Nb doped SrTiO3 (100) surface

Authors: Hyang Keun Yoo, Daniel Schwarz, Soren Ulstrup, Woo** Kim, Chris Jozwiak, Aaron Bostwick, Tae Won Noh, Eli Rotenberg, Young Jun Chang

Abstract: We investigated how SrOx segregates on a Nb doped SrTiO3 (100) surface by in air annealing. Using atomic force and photoemission electron microscopes, we can directly visualize the morphology and the electronic phase changes with SrOx segregation. SrOx islands less than 2 micron meter in size and 1-5 unit cells thick nucleate first and grow in a labyrinth domain pattern. After prolonged annealing,… ▽ More We investigated how SrOx segregates on a Nb doped SrTiO3 (100) surface by in air annealing. Using atomic force and photoemission electron microscopes, we can directly visualize the morphology and the electronic phase changes with SrOx segregation. SrOx islands less than 2 micron meter in size and 1-5 unit cells thick nucleate first and grow in a labyrinth domain pattern. After prolonged annealing, SrOx forms a ~10 nm thick film. We show that the domain pattern can be controlled by introducing a surface miscut angle of SrTiO3. Additionally, the segregated SrOx has a lower work function, compared to that of SrTiO3. These results suggest that the control and tunability of SrOx segregation is applicable to the design of a new functional electronic devices in the semiconducting SrTiO3 based heterostructure. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Journal ref: Journal of the Korean Physical Society 22, 4715 (2022)

arXiv:2210.03858 [pdf, other]

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Authors: Se Jung Kwon, Jeonghoon Kim, Jeongin Bae, Kang Min Yoo, **-Hwa Kim, Baeseong Park, Byeongwook Kim, Jung-Woo Ha, Nako Sung, Dongsoo Lee

Abstract: There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving co… ▽ More There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are frozen for all tasks, while the scaling factors are fine-tuned for the downstream task. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: Findings of EMNLP 2022

arXiv:2209.01765 [pdf, other]

Continuous Decomposition of Granularity for Neural Paraphrase Generation

Authors: Xiaodong Gu, Zhaowei Zhang, Sang-Woo Lee, Kang Min Yoo, Jung-Woo Ha

Abstract: While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information. Prior work has shown that decomposing the levels of granularity~(e.g., word, phrase, or sentence) for input tokens has produced substantial improvements, suggesting the possibility of enhancing Transformers via more fine-grain… ▽ More While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information. Prior work has shown that decomposing the levels of granularity~(e.g., word, phrase, or sentence) for input tokens has produced substantial improvements, suggesting the possibility of enhancing Transformers via more fine-grained modeling of granularity. In this work, we propose a continuous decomposition of granularity for neural paraphrase generation (C-DNPG). In order to efficiently incorporate granularity into sentence encoding, C-DNPG introduces a granularity-aware attention (GA-Attention) mechanism which extends the multi-head self-attention with: 1) a granularity head that automatically infers the hierarchical structure of a sentence by neurally estimating the granularity level of each input token; and 2) two novel attention masks, namely, granularity resonance and granularity scope, to efficiently encode granularity into attention. Experiments on two benchmarks, including Quora question pairs and Twitter URLs have shown that C-DNPG outperforms baseline models by a remarkable margin and achieves state-of-the-art results in terms of many metrics. Qualitative analysis reveals that C-DNPG indeed captures fine-grained levels of granularity with effectiveness. △ Less

Submitted 16 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: Accepted to be published in COLING 2022

arXiv:2207.07754 [pdf]

Lab-on-a-Chip Optical Biosensor Platform: Micro Ring Resonator Integrated with Near-Infrared Fourier Transform Spectrometer

Authors: Kyoung Min Yoo, May Hlaing, Sourabh Jain, James Fan, Yue An, Ray T. Chen

Abstract: A micro-ring-resonator (MRR) optical biosensor based on the evanescent field sensing mechanism has been extensively studied due to its high sensitivity and compact device size. However, a suitable on-chip integrated spectrometer device has to be demonstrated for the lab-on-a-chip applications, which can read the resonance wavelength shift from MRR biosensors based on minuscule changes in refractiv… ▽ More A micro-ring-resonator (MRR) optical biosensor based on the evanescent field sensing mechanism has been extensively studied due to its high sensitivity and compact device size. However, a suitable on-chip integrated spectrometer device has to be demonstrated for the lab-on-a-chip applications, which can read the resonance wavelength shift from MRR biosensors based on minuscule changes in refractive index. In this paper, we demonstrated the design and experimental results of the near-infrared lab-on-a-chip optical biosensor platform that monolithically integrates the MRR and the on-chip spectrometer on the silicon-on-insulator (SOI) wafer, which can eliminate the external optical spectrum analyzer for scanning the wavelength spectrum. The symmetric add-drop MRR biosensor is designed to have a free spectral range (FSR) of ~19 nm, and a bulk sensitivity of ~73 nm/RIU; then the drop-port output resonance peaks are reconstructed from the integrated spatial-heterodyne Fourier transform spectrometer (SHFTS) with the spectral resolution of ~3.1 nm and bandwidth of ~50 nm, which results in the limit of detection of 0.042 RIU. The MRR output spectrum with air- and water-claddings are measured and reconstructed from the MRR-SHFTS integrated device experimentally to validate the wavelength shifting measurement. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: 23 pages, 9 figures including supplementary

arXiv:2206.08082 [pdf, other]

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Authors: Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Abstract: Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task. Such a process (i.e., in-context learning), however, naturally leads to high reliance on the demonstrations which are usually selected from external datasets… ▽ More Large-scale pre-trained language models (PLMs) are well-known for being capable of solving a task simply by conditioning a few input-label pairs dubbed demonstrations on a prompt without being explicitly tuned for the desired downstream task. Such a process (i.e., in-context learning), however, naturally leads to high reliance on the demonstrations which are usually selected from external datasets. In this paper, we propose self-generated in-context learning (SG-ICL), which generates demonstrations for in-context learning from PLM itself to minimize the reliance on the external demonstration. We conduct experiments on four different text classification tasks and show SG-ICL significantly outperforms zero-shot learning and is generally worth approximately 0.6 gold training samples. Moreover, our generated demonstrations show more consistent performance with low variance compared to randomly selected demonstrations from the training dataset. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: NAACL 2022 Workshop on Large-scale Pre-trained Language Models

arXiv:2205.13445 [pdf, other]

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

Authors: **-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee

Abstract: Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trai… ▽ More Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID). To validate, we extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks. The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model. We look forward to seeing the underrepresented implications of the Gaussian cross-mutual information in multimodal representation learning and the future works based on this novel proposition. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.12685 [pdf, other]

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

Authors: Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim

Abstract: Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive. Intuitively, ground-truth labels should have as much impact in in-context learning (ICL) as supervised learning, but recent work reported that the input-label correspondence is significantly less important than previously thought. Intrigued… ▽ More Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive. Intuitively, ground-truth labels should have as much impact in in-context learning (ICL) as supervised learning, but recent work reported that the input-label correspondence is significantly less important than previously thought. Intrigued by this counter-intuitive observation, we re-examine the importance of ground-truth labels in in-context learning. With the introduction of two novel metrics, namely Label-Correctness Sensitivity and Ground-truth Label Effect Ratio (GLER), we were able to conduct quantifiable analysis on the impact of ground-truth label demonstrations. Through extensive analyses, we find that the correct input-label map**s can have varying impacts on the downstream in-context learning performances, depending on the experimental configuration. Through additional studies, we identify key components, such as the verbosity of prompt templates and the language model size, as the controlling factor to achieve more noise-resilient ICL. △ Less

Submitted 24 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted to EMNLP Long. Kang Min Yoo and Junyeob Kim contributed equally. Kang Min Yoo and Taeuk Kim are the corresponding authors

arXiv:2205.12609 [pdf, other]

Generating Information-Seeking Conversations from Unlabeled Documents

Authors: Gangwoo Kim, Sungdong Kim, Kang Min Yoo, Jaewoo Kang

Abstract: In this paper, we introduce a novel framework, SIMSEEK, (Simulating information-Seeking conversation from unlabeled documents), and compare its two variants. In our baseline SIMSEEK-SYM, a questioner generates follow-up questions upon the predetermined answer by an answerer. On the contrary, SIMSEEK-ASYM first generates the question and then finds its corresponding answer under the conversational… ▽ More In this paper, we introduce a novel framework, SIMSEEK, (Simulating information-Seeking conversation from unlabeled documents), and compare its two variants. In our baseline SIMSEEK-SYM, a questioner generates follow-up questions upon the predetermined answer by an answerer. On the contrary, SIMSEEK-ASYM first generates the question and then finds its corresponding answer under the conversational context. Our experiments show that they can synthesize effective training resources for CQA and conversational search tasks. As a result, conversations from SIMSEEK-ASYM not only make more improvements in our experiments but also are favorably reviewed in a human evaluation. We finally release a large-scale resource of synthetic conversations, WIKI-SIMSEEK, containing 2 million CQA pairs built upon Wikipedia documents. With the dataset, our CQA model achieves state-of-the-art performance on a recent CQA benchmark, QuAC. △ Less

Submitted 24 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted to EMNLP 2022 main conference

arXiv:2205.04530 [pdf, other]

doi 10.1016/j.nima.2022.167263

Positron accumulation in the GBAR experiment

Authors: P. Blumer, M. Charlton, M. Chung, P. Clade, P. Comini, P. Crivelli, O. Dalkarov, P. Debu, L. Dodd, A. Douillet, S. Guellati, P. -A Hervieux, L. Hilico, P. Indelicato, G. Janka, S. Jonsell, J. -P. Karr, B. H. Kim, E. S. Kim, S. K. Kim, Y. Ko, T. Kosinski, N. Kuroda, B. M. Latacz, B. Lee , et al. (45 additional authors not shown)

Abstract: We present a description of the GBAR positron (e+) trap** apparatus, which consists of a three stage Buffer Gas Trap (BGT) followed by a High Field Penning Trap (HFT), and discuss its performance. The overall goal of the GBAR experiment is to measure the acceleration of the neutral antihydrogen (H) atom in the terrestrial gravitational field by neutralising a positive antihydrogen ion (H+), whic… ▽ More We present a description of the GBAR positron (e+) trap** apparatus, which consists of a three stage Buffer Gas Trap (BGT) followed by a High Field Penning Trap (HFT), and discuss its performance. The overall goal of the GBAR experiment is to measure the acceleration of the neutral antihydrogen (H) atom in the terrestrial gravitational field by neutralising a positive antihydrogen ion (H+), which has been cooled to a low temperature, and observing the subsequent H annihilation following free fall. To produce one H+ ion, about 10^10 positrons, efficiently converted into positronium (Ps), together with about 10^7 antiprotons (p), are required. The positrons, produced from an electron linac-based system, are accumulated first in the BGT whereafter they are stacked in the ultra-high vacuum HFT, where we have been able to trap 1.4(2) x 10^9 positrons in 1100 seconds. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Journal ref: Nuclear Instruments and Methods in Physics Research Section A, Volume 1040, 2022, 167263

arXiv:2205.02035 [pdf, other]

Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking

Authors: Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung

Abstract: Despite the recent advances in abstractive summarization systems, it is still difficult to determine whether a generated summary is factual consistent with the source text. To this end, the latest approach is to train a factual consistency classifier on factually consistent and inconsistent summaries. Luckily, the former is readily available as reference summaries in existing summarization dataset… ▽ More Despite the recent advances in abstractive summarization systems, it is still difficult to determine whether a generated summary is factual consistent with the source text. To this end, the latest approach is to train a factual consistency classifier on factually consistent and inconsistent summaries. Luckily, the former is readily available as reference summaries in existing summarization datasets. However, generating the latter remains a challenge, as they need to be factually inconsistent, yet closely relevant to the source text to be effective. In this paper, we propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked. Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models and show a competitive correlation with human judgments. We also analyze the characteristics of the summaries generated using our method. We will release the pre-trained model and the code at https://github.com/hwanheelee1993/MFMA. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: NAACL 2022 Findings

arXiv:2204.14017 [pdf, other]

Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling

Authors: KiYoon Yoo, Nojun Kwak

Abstract: Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due to the potential risks of adversaries participating in the framework to poison the global model for an adversarial purpose. This paper investigates the feasibility of model poisoning for backdoor attacks through rare wor… ▽ More Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due to the potential risks of adversaries participating in the framework to poison the global model for an adversarial purpose. This paper investigates the feasibility of model poisoning for backdoor attacks through rare word embeddings of NLP models. In text classification, less than 1% of adversary clients suffices to manipulate the model output without any drop in the performance on clean sentences. For a less complex dataset, a mere 0.1% of adversary clients is enough to poison the global model effectively. We also propose a technique specialized in the federated learning scheme called Gradient Ensemble, which enhances the backdoor performance in all our experimental settings. △ Less

Submitted 23 October, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted to EMNLP 2022, 9 pages and Appendix

arXiv:2203.01677 [pdf, other]

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Authors: KiYoon Yoo, Jangho Kim, Jiho Jang, Nojun Kwak

Abstract: Word-level adversarial attacks have shown success in NLP models, drastically decreasing the performance of transformer-based models in recent years. As a countermeasure, adversarial defense has been explored, but relatively few efforts have been made to detect adversarial examples. However, detecting adversarial examples may be crucial for automated tasks (e.g. review sentiment analysis) that wish… ▽ More Word-level adversarial attacks have shown success in NLP models, drastically decreasing the performance of transformer-based models in recent years. As a countermeasure, adversarial defense has been explored, but relatively few efforts have been made to detect adversarial examples. However, detecting adversarial examples may be crucial for automated tasks (e.g. review sentiment analysis) that wish to amass information about a certain population and additionally be a step towards a robust defense system. To this end, we release a dataset for four popular attack methods on four datasets and four models to encourage further research in this field. Along with it, we propose a competitive baseline based on density estimation that has the highest AUC on 29 out of 30 dataset-attack-model combinations. Source code is available in https://github.com/anoymous92874838/text-adv-detection. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2112.07027 [pdf]

Dual-Polarization Bandwidth-Bridged On-Chip Bandpass Sampling Fourier Transform Spectrometer from Visible to Near-Infrared

Authors: Kyoung Min Yoo, Ray T. Chen

Abstract: The on-chip broadband optical spectrometers which cover the entire tissue transparency window (λ=650-1050 nm) with high resolution are highly demanded for the miniaturized bio-sensing and bio-imaging applications. Here, we propose a novel type of spatial heterodyne Fourier transform spectrometer (SHFTS) integrated with a sub-wavelength grating coupler (SWGC) for the dual-polarization bandpass samp… ▽ More The on-chip broadband optical spectrometers which cover the entire tissue transparency window (λ=650-1050 nm) with high resolution are highly demanded for the miniaturized bio-sensing and bio-imaging applications. Here, we propose a novel type of spatial heterodyne Fourier transform spectrometer (SHFTS) integrated with a sub-wavelength grating coupler (SWGC) for the dual-polarization bandpass sampling on the Si3N4 platform. Through tuning the coupling angles with different polarization, we experimentally demonstrated the unprecedented broadband spectrum retrieval results with the overall bandwidth coverage of 400 nm, bridging the wavelengths from 650 nm to 1050 nm, with the resolution of 2-5 nm. By applying the bandpass sampling theorem, we circumvented the intrinsic trade-off limitation between the bandwidth and resolution of SHFTS without having an outrageous number of Mach-Zehnder interferometer (MZI) arrays or adding additional active components. The bandpass sampling SHFTS is designed to have linearly unbalanced 32 MZIs with the maximum optical path length difference of 93 μm within an overall footprint size of 4.7 mm x 0.65 mm, and the coupling angles of SWGC are varied from 0° to 32° to cover the entire tissue transparency window. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 48 Pages, 6 figures, 14 supportive figures

arXiv:2111.12958 [pdf, other]

Self-Distilled Self-Supervised Representation Learning

Authors: Jiho Jang, Seonhoon Kim, Kiyoon Yoo, Chaerin Kong, Jangho Kim, Nojun Kwak

Abstract: State-of-the-art frameworks in self-supervised learning have recently shown that fully utilizing transformer-based models can lead to performance boost compared to conventional CNN models. Striving to maximize the mutual information of two views of an image, existing works apply a contrastive loss to the final representations. Motivated by self-distillation in the supervised regime, we further exp… ▽ More State-of-the-art frameworks in self-supervised learning have recently shown that fully utilizing transformer-based models can lead to performance boost compared to conventional CNN models. Striving to maximize the mutual information of two views of an image, existing works apply a contrastive loss to the final representations. Motivated by self-distillation in the supervised regime, we further exploit this by allowing the intermediate representations to learn from the final layer via the contrastive loss. Through self-distillation, the intermediate layers are better suited for instance discrimination, making the performance of an early-exited sub-network not much degraded from that of the full network. This renders the pretext task easier also for the final layer, leading to better representations. Our method, Self-Distilled Self-Supervised Learning (SDSSL), outperforms competitive baselines (SimCLR, BYOL and MoCo v3) using ViT on various tasks and datasets. In the linear evaluation and k-NN protocol, SDSSL not only leads to superior performance in the final layers, but also in most of the lower layers. Furthermore, qualitative and quantitative analyses show how representations are formed more effectively along the transformer layers. Code is available at https://github.com/hagiss/SDSSL. △ Less

Submitted 23 November, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: WACV 23, 11 pages

arXiv:2111.02643 [pdf, other]

Response Generation with Context-Aware Prompt Learning

Authors: Xiaodong Gu, Kang Min Yoo, Sang-Woo Lee

Abstract: Pre-trained language models (PLM) have marked a huge leap in neural dialogue modeling. While PLMs are pre-trained on large-scale text corpora, they are usually fine-tuned on scarce dialogue data with specific domain knowledge and dialogue styles. However, tailoring the language models while fully utilizing prior knowledge in large pre-trained models remains a challenge. In this paper, we present a… ▽ More Pre-trained language models (PLM) have marked a huge leap in neural dialogue modeling. While PLMs are pre-trained on large-scale text corpora, they are usually fine-tuned on scarce dialogue data with specific domain knowledge and dialogue styles. However, tailoring the language models while fully utilizing prior knowledge in large pre-trained models remains a challenge. In this paper, we present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task. Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts, which appropriately elicit knowledge from the large pre-trained model. To encourage the model to better utilize the prompt embeddings, the prompt encoders are designed to be dynamically generated based on the dialogue context. Experiments on popular conversation datasets show that our approach significantly outperforms the fine-tuning baseline and the generic prompt-learning methods. Furthermore, human evaluations strongly support the superiority of DialogPrompt in regard to response generation quality. △ Less

Submitted 13 December, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.03461 [pdf, other]

Self-Evolutionary Optimization for Pareto Front Learning

Authors: Simyung Chang, KiYoon Yoo, Jiho Jang, Nojun Kwak

Abstract: Multi-task learning (MTL), which aims to improve performance by learning multiple tasks simultaneously, inherently presents an optimization challenge due to multiple objectives. Hence, multi-objective optimization (MOO) approaches have been proposed for multitasking problems. Recent MOO methods approximate multiple optimal solutions (Pareto front) with a single unified model, which is collectively… ▽ More Multi-task learning (MTL), which aims to improve performance by learning multiple tasks simultaneously, inherently presents an optimization challenge due to multiple objectives. Hence, multi-objective optimization (MOO) approaches have been proposed for multitasking problems. Recent MOO methods approximate multiple optimal solutions (Pareto front) with a single unified model, which is collectively referred to as Pareto front learning (PFL). In this paper, we show that PFL can be re-formulated into another MOO problem with multiple objectives, each of which corresponds to different preference weights for the tasks. We leverage an evolutionary algorithm (EA) to propose a method for PFL called self-evolutionary optimization (SEO) by directly maximizing the hypervolume. By using SEO, the neural network learns to approximate the Pareto front conditioned on multiple hyper-parameters that drastically affect the hypervolume. Then, by generating a population of approximations simply by inferencing the network, the hyper-parameters of the network can be optimized by EA. Utilizing SEO for PFL, we also introduce self-evolutionary Pareto networks (SEPNet), enabling the unified model to approximate the entire Pareto front set that maximizes the hypervolume. Extensive experimental results confirm that SEPNet can find a better Pareto front than the current state-of-the-art methods while minimizing the increase in model size and training cost. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 16 pages

arXiv:2109.07953 [pdf, other]

Efficient Attribute Injection for Pretrained Language Models

Authors: Reinald Kim Amplayo, Kang Min Yoo, Sang-Woo Lee

Abstract: Metadata attributes (e.g., user and product IDs from reviews) can be incorporated as additional inputs to neural-based NLP models, by modifying the architecture of the models, in order to improve their performance. Recent models however rely on pretrained language models (PLMs), where previously used techniques for attribute injection are either nontrivial or ineffective. In this paper, we propose… ▽ More Metadata attributes (e.g., user and product IDs from reviews) can be incorporated as additional inputs to neural-based NLP models, by modifying the architecture of the models, in order to improve their performance. Recent models however rely on pretrained language models (PLMs), where previously used techniques for attribute injection are either nontrivial or ineffective. In this paper, we propose a lightweight and memory-efficient method to inject attributes to PLMs. We extend adapters, i.e. tiny plug-in feed-forward modules, to include attributes both independently of or jointly with the text. To limit the increase of parameters especially when the attribute vocabulary is large, we use low-rank approximations and hypercomplex multiplications, significantly decreasing the total parameters. We also introduce training mechanisms to handle domains in which attributes can be multi-labeled or sparse. Extensive experiments and analyses on eight datasets from different domains show that our method outperforms previous attribute injection methods and achieves state-of-the-art performance on various datasets. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2109.04660 [pdf, other]

Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model via Refined Gradients for Pruned Weights

Authors: Jangho Kim, Jayeon Yoo, Yeji Song, KiYoon Yoo, Nojun Kwak

Abstract: With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. ST… ▽ More With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. In this work, to tackle this issue, we introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights. We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets. We verify the usefulness of the refined gradients by showing enhancements in the training stability and the model performance on the CIFAR and ImageNet datasets. DCIL outperforms various previously proposed pruning schemes including other dynamic pruning methods with enhanced stability during training. △ Less

Submitted 31 July, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: Accepted to ACM MM 2023, code is in https://github.com/Jangho-Kim/DCIL-pytorch

arXiv:2109.04650 [pdf, other]

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Authors: Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, **uk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, **seong Park , et al. (12 additional authors not shown)

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a K… ▽ More GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototy** capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications. △ Less

Submitted 28 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP2021 as a long paper. Fixed some typos

arXiv:2106.07345 [pdf, other]

Self-Guided Contrastive Learning for BERT Sentence Representations

Authors: Taeuk Kim, Kang Min Yoo, Sang-goo Lee

Abstract: Although BERT and its variants have reshaped the NLP landscape, it still remains unclear how best to derive sentence embeddings from such pre-trained Transformers. In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations. Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation,… ▽ More Although BERT and its variants have reshaped the NLP landscape, it still remains unclear how best to derive sentence embeddings from such pre-trained Transformers. In this work, we propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations. Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [CLS] token embeddings to function as sentence vectors. Moreover, we redesign the contrastive learning objective (NT-Xent) and apply it to sentence representation learning. We demonstrate with extensive experiments that our approach is more effective than competitive baselines on diverse sentence-related tasks. We also show it is efficient at inference and robust to domain shifts. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: ACL 2021

arXiv:2105.06061 [pdf]

doi 10.1016/j.cap.2021.04.027

In situ investigation of conducting interface formation in LaAlO3/SrTiO3 heterostructure

Authors: Hyang Keun Yoo, Luca Moreschini, Aaron Bostwick, Andrew L. Walter, Tae Won Noh, Eli Rotenberg, Young Jun Chang

Abstract: The high-mobility conducting interface (CI) between LaAlO_{3}(LAO) and SrTiO_{3}(STO) has revealed many fascinating phenomena, including exotic magnetism and superconductivity. But, the formation mechanism of the CI has not been conclusively explained. Here, using in situ angle-resolved photoemission spectroscopy, we elucidated the mechanisms for the CI formation. In as-grown samples, we observed… ▽ More The high-mobility conducting interface (CI) between LaAlO_{3}(LAO) and SrTiO_{3}(STO) has revealed many fascinating phenomena, including exotic magnetism and superconductivity. But, the formation mechanism of the CI has not been conclusively explained. Here, using in situ angle-resolved photoemission spectroscopy, we elucidated the mechanisms for the CI formation. In as-grown samples, we observed a built-in potential (V_{bi}) proportional to the polar LAO thickness starting from the first unit cell (UC) with CI formation appearing above 3 UCs. However, we found that the V bi is removed by synchrotron ultraviolet (UV)-irradiation; The built-in potential is recovered by oxygen gas (O_{2}(g))-exposure. Furthermore, after UV-irradiation, the CI appears even below 3UC of LAO. Our results demonstrate not only the V_{bi}-driven CI formation in asgrown LAO/STO, but also a new route to control of the interface state by UV lithographic patterning or other surface modification. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 18 pages, 4 figures

arXiv:2105.06059 [pdf]

doi 10.1016/j.cap.2020.08.019

Enhanced tunability of two-dimensional electron gas on SrTiO3 through heterostructuring

Authors: Hyang Keun Yoo, Luca Moreschini, Andrew L. Walter, Aaron Bostwick, Karsten Horn, Eli Rotenberg, Young Jun Chang

Abstract: Two-dimensional electron gases (2DEGs) on the SrTiO3 (STO) surface or in STO-based heterostructures have exhibited many intriguing phenomena, which are strongly dependent on the 2DEG-carrier density. We report that the tunability of the 2DEG-carrier density is significantly enhanced by adding a monolayer LaTiO3 (LTO) onto the STO. Ultraviolet (UV) irradiation induced maximum carrier density of the… ▽ More Two-dimensional electron gases (2DEGs) on the SrTiO3 (STO) surface or in STO-based heterostructures have exhibited many intriguing phenomena, which are strongly dependent on the 2DEG-carrier density. We report that the tunability of the 2DEG-carrier density is significantly enhanced by adding a monolayer LaTiO3 (LTO) onto the STO. Ultraviolet (UV) irradiation induced maximum carrier density of the 2DEG in LTO/STO is increased by a factor of ~4 times, compared to that of the bare STO. By oxygen gas exposure, it becomes 10 times smaller than that of the bare STO. This enhanced tunability is attributed to the drastic surface property change of a polar LTO layer by UV irradiation and O2 exposure. This indicates that the 2DEG controllability in LTO/STO is more reliable than that on the bare STO driven by defects, such an oxygen vacancy. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 19 pages, 4 figures

Journal ref: Current Applied Physics 20, 1268 (2020)

arXiv:2104.08826 [pdf, other]

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Authors: Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park

Abstract: Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks data and inference scalability. This paper proposes a novel data augmentation technique that leverages large-scale language models to generate realistic text sa… ▽ More Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks data and inference scalability. This paper proposes a novel data augmentation technique that leverages large-scale language models to generate realistic text samples from a mixture of real samples. We also propose utilizing soft-labels predicted by the language models, effectively distilling knowledge from the large-scale language models and creating textual perturbations simultaneously. We perform data augmentation experiments on diverse classification tasks and show that our method hugely outperforms existing text augmentation methods. Ablation studies and a qualitative analysis provide more insights into our approach. △ Less

Submitted 18 November, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to EMNLP2021 Findings; 11 pages, 7 tables, 2 figures

arXiv:2104.07541 [pdf, other]

Reward Optimization for Neural Machine Translation with Learned Metrics

Authors: Raphael Shu, Kang Min Yoo, Jung-Woo Ha

Abstract: Neural machine translation (NMT) models are conventionally trained with token-level negative log-likelihood (NLL), which does not guarantee that the generated translations will be optimized for a selected sequence-level evaluation metric. Multiple approaches are proposed to train NMT with BLEU as the reward, in order to directly improve the metric. However, it was reported that the gain in BLEU do… ▽ More Neural machine translation (NMT) models are conventionally trained with token-level negative log-likelihood (NLL), which does not guarantee that the generated translations will be optimized for a selected sequence-level evaluation metric. Multiple approaches are proposed to train NMT with BLEU as the reward, in order to directly improve the metric. However, it was reported that the gain in BLEU does not translate to real quality improvement, limiting the application in industry. Recently, it became clear to the community that BLEU has a low correlation with human judgment when dealing with state-of-the-art models. This leads to the emerging of model-based evaluation metrics. These new metrics are shown to have a much higher human correlation. In this paper, we investigate whether it is beneficial to optimize NMT models with the state-of-the-art model-based metric, BLEURT. We propose a contrastive-margin loss for fast and stable reward optimization suitable for large NMT models. In experiments, we perform automatic and human evaluations to compare models trained with smoothed BLEU and BLEURT to the baseline models. Results show that the reward optimization with BLEURT is able to increase the metric scores by a large margin, in contrast to limited gain when training with smoothed BLEU. The human evaluation shows that models trained with BLEURT improve adequacy and coverage of translations. Code is available via https://github.com/naver-ai/MetricMT. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2012.01775 [pdf, other]

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

Authors: Xiaodong Gu, Kang Min Yoo, Jung-Woo Ha

Abstract: Recent advances in pre-trained language models have significantly improved neural response generation. However, existing methods usually view the dialogue context as a linear sequence of tokens and learn to generate the next word through token-level self-attention. Such token-level encoding hinders the exploration of discourse-level coherence among utterances. This paper presents DialogBERT, a nov… ▽ More Recent advances in pre-trained language models have significantly improved neural response generation. However, existing methods usually view the dialogue context as a linear sequence of tokens and learn to generate the next word through token-level self-attention. Such token-level encoding hinders the exploration of discourse-level coherence among utterances. This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models. DialogBERT employs a hierarchical Transformer architecture. To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression and distributed utterance order ranking in analogy to the original BERT training. Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines, such as BART and DialoGPT, in terms of quantitative evaluation. The human evaluation suggests that DialogBERT generates more coherent, informative, and human-like responses than the baselines with significant margins. △ Less

Submitted 13 December, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Published as a conference paper at AAAI 2021

arXiv:2010.10338 [pdf, other]

Edge Bias in Federated Learning and its Solution by Buffered Knowledge Distillation

Authors: Sangho Lee, Kiyoon Yoo, Nojun Kwak

Abstract: Federated learning (FL), which utilizes communication between the server (core) and local devices (edges) to indirectly learn from more data, is an emerging field in deep learning research. Recently, Knowledge Distillation-based FL methods with notable performance and high applicability have been suggested. In this paper, we choose knowledge distillation-based FL method as our baseline and tackle… ▽ More Federated learning (FL), which utilizes communication between the server (core) and local devices (edges) to indirectly learn from more data, is an emerging field in deep learning research. Recently, Knowledge Distillation-based FL methods with notable performance and high applicability have been suggested. In this paper, we choose knowledge distillation-based FL method as our baseline and tackle a challenging problem that ensues from using these methods. Especially, we focus on the problem incurred in the server model that tries to mimic different datasets, each of which is unique to an individual edge device. We dub the problem 'edge bias', which occurs when multiple teacher models trained on different datasets are used individually to distill knowledge. We introduce this nuisance that occurs in certain scenarios of FL, and to alleviate it, we propose a simple yet effective distillation scheme named 'buffered distillation'. In addition, we also experimentally show that this scheme is effective in mitigating the straggler problem caused by delayed edges. △ Less

Submitted 9 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 10 pages

arXiv:2010.06690 [pdf, other]

doi 10.1063/5.0037452

Disease transmission through expiratory aerosols on an urban bus

Authors: Zhihang Zhang, Taehoon Han, Kwang Hee Yoo, Jesse Capecelatro, Andre Boehman, Kevin Maki

Abstract: Airborne respiratory diseases such as SARS-CoV-2 (COVID-19) pose significant challenges for public transportation. Several recent outbreaks of SARS-CoV-2 indicate the high risk of transmission among passengers on public buses if special precautions are not taken. This study presents a combined experimental and numerical analysis to identify transmission mechanisms on an urban bus and assess strate… ▽ More Airborne respiratory diseases such as SARS-CoV-2 (COVID-19) pose significant challenges for public transportation. Several recent outbreaks of SARS-CoV-2 indicate the high risk of transmission among passengers on public buses if special precautions are not taken. This study presents a combined experimental and numerical analysis to identify transmission mechanisms on an urban bus and assess strategies to reduce risk. The effects of the ventilation and air-conditioning systems, opening windows and doors, and wearing masks are analyzed. Specific attention is made to the transport of sub-micron and micron-size particles relevant to typical respiratory droplets. High-resolution instrumentation was used to measure size distribution and aerosol response time on a University of Michigan campus bus under these different conditions. Computational fluid dynamics was employed to measure the airflow within the bus and evaluate risk. A risk metric was adopted based on the number of particles exposed to susceptible passengers. The flow that carries these aerosols is predominantly controlled by the ventilation system, which acts to uniformly distribute the aerosol concentration throughout the bus while simultaneously diluting it with fresh air. The opening of doors and windows was found to reduce the concentration by approximately one half, albeit its benefit does not uniformly impact all passengers on the bus due to recirculation of airflow caused by entrainment through windows. Finally, it was found that well fitted surgical masks, when worn by both infected and susceptible passengers, can nearly eliminate the transmission of the disease. △ Less

Submitted 15 November, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: 22 pages, 15 figures

Showing 1–50 of 90 results for author: Yoo, K