-
Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
Authors:
Janghoon Han,
Changho Lee,
Joongbo Shin,
Stanley Jungkyu Choi,
Honglak Lee,
Kynghoon Bae
Abstract:
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins…
▽ More
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks
Authors:
Changho Lee,
Janghoon Han,
Seonghyeon Ye,
Stanley Jungkyu Choi,
Honglak Lee,
Kyunghoon Bae
Abstract:
Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelev…
▽ More
Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelevant tasks. Our research reveals that leveraging instruction information \textit{alone} enables the identification of pertinent tasks for instruction tuning. This approach is notably simpler compared to traditional methods that necessitate complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Furthermore, by additionally learning the unique instructional template style of the meta-dataset, we observe an improvement in task selection accuracy, which contributes to enhanced overall performance. Experimental results demonstrate that training on a small set of tasks, chosen solely based on the instructions, leads to substantial performance improvements on benchmarks like P3, Big-Bench, NIV2, and Big-Bench Hard. Significantly, these improvements exceed those achieved by prior task selection methods, highlighting the efficacy of our approach.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Toward TransfORmers: Revolutionizing the Solution of Mixed Integer Programs with Transformers
Authors:
Joshua F. Cooper,
Seung ** Choi,
I. Esra Buyuktahtakin
Abstract:
In this study, we introduce an innovative deep learning framework that employs a transformer model to address the challenges of mixed-integer programs, specifically focusing on the Capacitated Lot Sizing Problem (CLSP). Our approach, to our knowledge, is the first to utilize transformers to predict the binary variables of a mixed-integer programming (MIP) problem. Specifically, our approach harnes…
▽ More
In this study, we introduce an innovative deep learning framework that employs a transformer model to address the challenges of mixed-integer programs, specifically focusing on the Capacitated Lot Sizing Problem (CLSP). Our approach, to our knowledge, is the first to utilize transformers to predict the binary variables of a mixed-integer programming (MIP) problem. Specifically, our approach harnesses the encoder decoder transformer's ability to process sequential data, making it well-suited for predicting binary variables indicating production setup decisions in each period of the CLSP. This problem is inherently dynamic, and we need to handle sequential decision making under constraints. We present an efficient algorithm in which CLSP solutions are learned through a transformer neural network. The proposed post-processed transformer algorithm surpasses the state-of-the-art solver, CPLEX and Long Short-Term Memory (LSTM) in solution time, optimal gap, and percent infeasibility over 240K benchmark CLSP instances tested. After the ML model is trained, conducting inference on the model, reduces the MIP into a linear program (LP). This transforms the ML-based algorithm, combined with an LP solver, into a polynomial-time approximation algorithm to solve a well-known NP-Hard problem, with almost perfect solution quality.
△ Less
Submitted 24 May, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Lightweight feature encoder for wake-up word detection based on self-supervised speech representation
Authors:
Hyungjun Lim,
Younggwan Kim,
Kiho Yeom,
Eunjoo Seo,
Hoodong Lee,
Stanley Jungkyu Choi,
Honglak Lee
Abstract:
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work…
▽ More
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. In the method, the knowledge of the pre-trained wav2vec 2.0 is compressed by introducing an auto-encoder-based dimensionality reduction technique and distilled to LiteFEW. Experimental results on the open-source "Hey Snips" dataset show that the proposed method applied to various model structures significantly improves the performance, achieving over 20% of relative improvements with only 64k parameters.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
External Knowledge Selection with Weighted Negative Sampling in Knowledge-grounded Task-oriented Dialogue Systems
Authors:
Janghoon Han,
Joongbo Shin,
Hosung Song,
Hyunjik Jo,
Gyeonghun Kim,
Yireun Kim,
Stanley Jungkyu Choi
Abstract:
Constructing a robust dialogue system on spoken conversations bring more challenge than written conversation. In this respect, DSTC10-Track2-Task2 is proposed, which aims to build a task-oriented dialogue (TOD) system incorporating unstructured external knowledge on a spoken conversation, extending DSTC9-Track1. This paper introduces our system containing four advanced methods: data construction,…
▽ More
Constructing a robust dialogue system on spoken conversations bring more challenge than written conversation. In this respect, DSTC10-Track2-Task2 is proposed, which aims to build a task-oriented dialogue (TOD) system incorporating unstructured external knowledge on a spoken conversation, extending DSTC9-Track1. This paper introduces our system containing four advanced methods: data construction, weighted negative sampling, post-training, and style transfer. We first automatically construct a large training data because DSTC10-Track2 does not release the official training set. For the knowledge selection task, we propose weighted negative sampling to train the model more fine-grained manner. We also employ post-training and style transfer for the response generation task to generate an appropriate response with a similar style to the target response. In the experiment, we investigate the effect of weighted negative sampling, post-training, and style transfer. Our model ranked 7 out of 16 teams in the objective evaluation and 6 in human evaluation.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Towards Continual Knowledge Learning of Language Models
Authors:
Joel Jang,
Seonghyeon Ye,
Sohee Yang,
Joongbo Shin,
Janghoon Han,
Gyeonghun Kim,
Stanley Jungkyu Choi,
Minjoon Seo
Abstract:
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-tr…
▽ More
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-trivial to avoid catastrophic forgetting and reliably acquire new knowledge while preserving invariant knowledge. To push the community towards better maintenance of ever-changing LMs, we formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL). We construct a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge. We adopt applicable recent methods from literature to create several strong baselines. Through extensive experiments, we find that CKL exhibits unique challenges that are not addressed in previous CL setups, where parameter expansion is necessary to reliably retain and learn knowledge simultaneously. By highlighting the critical causes of knowledge forgetting, we show that CKL is a challenging and important problem that helps us better understand and train ever-changing LMs. The benchmark datasets, evaluation script, and baseline code to reproduce our results are available at https://github.com/joeljang/continual-knowledge-learning.
△ Less
Submitted 24 May, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Strong near-field light-matter interaction in plasmon-resonant tip-enhanced Raman scattering in indium nitride
Authors:
Emanuele Poliani,
Daniel Seidlitz,
Maximilian Ries,
Soo J. Choi,
Jim S. Speck,
Axel Hoffmann,
Markus R. Wagner
Abstract:
We report a detailed study of the strong near-field Raman scattering enhancement which takes place in tip-enhanced Raman scattering (TERS) in indium nitride. In addition to the well-known first-order optical phonons of indium nitride, near-field Raman modes, not detectable in the far-field, appear when approaching the plasmonic probe. The frequencies of these modes coincide with calculated energie…
▽ More
We report a detailed study of the strong near-field Raman scattering enhancement which takes place in tip-enhanced Raman scattering (TERS) in indium nitride. In addition to the well-known first-order optical phonons of indium nitride, near-field Raman modes, not detectable in the far-field, appear when approaching the plasmonic probe. The frequencies of these modes coincide with calculated energies of second order combinational modes consisting of optical zone center phonons and acoustic phonons at the edge of the Brillouin zone. The appearance of strong combinational modes suggests that TERS in indium nitride represents a special case of Raman scattering in which a resonance condition on the nanometer scale is achieved between the localized surface plasmons (LSPs) and surface plasmon polaritons (SPPs) of the probe with the surface charge oscillation of the material. We suggest that the surface charge accumulation (SCA) in InN, which can render the surface a degenerate semiconductor, is the dominating reason for the unusually large enhancement of the TERS signal as compared to other inorganic semiconductors. Thus, the plasmon-resonant TERS (PR-TERS) process in InN makes this technique an excellent tool for defect characterization of indium-rich semiconductor heterostructures and nanostructures with high carrier concentrations.
△ Less
Submitted 30 October, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Do Hospital Data Breaches Reduce Patient Care Quality?
Authors:
Sung J. Choi,
M. Eric Johnson
Abstract:
Objective: To estimate the relationship between a hospital data breach and hospital quality outcome
Materials and Methods: Hospital data breaches reported to the U.S. Department of Health and Human Services breach portal and the Privacy Rights Clearinghouse database were merged with the Medicare Hospital Compare data to assemble a panel of non-federal acutecare inpatient hospitals for years 2011…
▽ More
Objective: To estimate the relationship between a hospital data breach and hospital quality outcome
Materials and Methods: Hospital data breaches reported to the U.S. Department of Health and Human Services breach portal and the Privacy Rights Clearinghouse database were merged with the Medicare Hospital Compare data to assemble a panel of non-federal acutecare inpatient hospitals for years 2011 to 2015. The study panel included 2,619 hospitals. Changes in 30-day AMI mortality rate following a hospital data breach were estimated using a multivariate regression model based on a difference-in-differences approach.
Results: A data breach was associated with a 0.338[95% CI, 0.101-0.576] percentage point increase in the 30-day AMI mortality rate in the year following the breach and a 0.446[95% CI, 0.164-0.729] percentage point increase two years after the breach. For comparison, the median 30-day AMI mortality rate has been decreasing about 0.4 percentage points annually since 2011 due to progress in care. The magnitude of the breach impact on hospitals' AMI mortality rates was comparable to a year's worth historical progress in reducing AMI mortality rates.
Conclusion: Hospital data breaches significantly increased the 30-day mortality rate for AMI. Data breaches may disrupt the processes of care that rely on health information technology. Financial costs to repair a breach may also divert resources away from patient care. Thus breached hospitals should carefully focus investments in security procedures, processes, and health information technology that jointly lead to better data security and improved patient outcomes.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons
Authors:
Hwiyeol Jo,
Stanley Jungkyu Choi
Abstract:
We propose post-processing method for enriching not only word representation but also its vector space using semantic lexicons, which we call extrofitting. The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) on all the word vectors, filling with their representative value. (ii) Transferring semantic knowledge by averaging each representative values of synonyms and filli…
▽ More
We propose post-processing method for enriching not only word representation but also its vector space using semantic lexicons, which we call extrofitting. The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) on all the word vectors, filling with their representative value. (ii) Transferring semantic knowledge by averaging each representative values of synonyms and filling them in the expanded dimension(s). These two steps make representations of the synonyms close together. (iii) Projecting the vector space using Linear Discriminant Analysis, which eliminates the expanded dimension(s) with semantic knowledge. When experimenting with GloVe, we find that our method outperforms Faruqui's retrofitting on some of word similarity task. We also report further analysis on our method in respect to word vector dimensions, vocabulary size as well as other well-known pretrained word vectors (e.g., Word2Vec, Fasttext).
△ Less
Submitted 3 June, 2018; v1 submitted 21 April, 2018;
originally announced April 2018.