Search | arXiv e-print repository

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Authors: Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

Abstract: The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constr… ▽ More The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ACL 2024 Main

arXiv:2403.05814 [pdf, other]

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs

Authors: Yerin Hwang, Yongil Kim, Yunah Jang, Jeesoo Bang, Hyunkyung Bae, Kyomin Jung

Abstract: Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions.… ▽ More Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 20 pages

arXiv:2312.12391 [pdf, other]

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

Authors: Jehyeon Bang, Yujeong Choi, Myeongwoo Kim, Yongdeok Kim, Minsoo Rhu

Abstract: As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of L… ▽ More As large language models (LLMs) become widespread in various application domains, a critical challenge the AI community is facing is how to train these large AI models in a cost-effective manner. Existing LLM training plans typically employ a heuristic based parallel training strategy which is based on empirical observations rather than grounded upon a thorough examination of the search space of LLM parallelization. Such limitation renders existing systems to leave significant performance left on the table, wasting millions of dollars worth of training cost. This paper presents our profiling-driven simulator called vTrain, providing AI practitioners a fast yet accurate software framework to determine an efficient and cost-effective LLM training system configuration. We demonstrate vTrain's practicality through several case studies, e.g., effectively evaluating optimal training parallelization strategies that balances training time and its associated training cost, efficient multi-tenant GPU cluster schedulers targeting multiple LLM training jobs, and determining a compute-optimal LLM model architecture given a fixed compute budget. △ Less

Submitted 27 November, 2023; originally announced December 2023.

arXiv:2312.08677 [pdf, other]

Adaptive Shortcut Debiasing for Online Continual Learning

Authors: Doyoung Kim, Dongmin Park, Yooju Shin, Jihwan Bang, Hwanjun Song, Jae-Gil Lee

Abstract: We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the onli… ▽ More We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the online environment where prior knowledge and auxiliary data are not ready, two novel techniques -- feature map fusion and adaptive intensity shifting -- enable us to automatically determine the appropriate level and proportion of the candidate shortcut features to be dropped. Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, DropTop increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.12048 [pdf, other]

One Size Fits All for Semantic Shifts: Adaptive Prompt Tuning for Continual Learning

Authors: Doyoung Kim, Susik Yoon, Dongmin Park, Youngjun Lee, Hwanjun Song, Jihwan Bang, Jae-Gil Lee

Abstract: In real-world continual learning scenarios, tasks often exhibit intricate and unpredictable semantic shifts, posing challenges for fixed prompt management strategies. We identify the inadequacy of universal and specific prompting in handling these dynamic shifts. Universal prompting is ineffective for tasks with abrupt semantic changes, while specific prompting struggles with overfitting under mil… ▽ More In real-world continual learning scenarios, tasks often exhibit intricate and unpredictable semantic shifts, posing challenges for fixed prompt management strategies. We identify the inadequacy of universal and specific prompting in handling these dynamic shifts. Universal prompting is ineffective for tasks with abrupt semantic changes, while specific prompting struggles with overfitting under mild semantic shifts. To overcome these limitations, we propose an adaptive prompting approach that tailors minimal yet sufficient prompts based on the task semantics. Our methodology, SemPrompt, incorporates a two-level semantic grou** process: macroscopic semantic assignment and microscopic semantic refinement. This process ensures optimal prompt utilization for varying task semantics, improving the efficiency and effectiveness of learning in real-world CL settings. Our experimental results demonstrate that SemPrompt consistently outperforms existing methods in adapting to diverse semantic shifts in tasks. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.11178 [pdf, other]

Active Prompt Learning in Vision Language Models

Authors: Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

Abstract: Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active lear… ▽ More Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may degrade performance compared to random selection because of the class imbalance in labeling candidates, and (2) the knowledge of VLMs can provide hints for achieving the balance before labeling. Based on these observations, we devise a novel active learning framework for VLMs, denoted as PCB. To assess the effectiveness of our approach, we conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods. Code will be available in https://github.com/kaist-dmlab/pcb . △ Less

Submitted 21 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: accepted at CVPR 2024

arXiv:2311.07589 [pdf, other]

Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources

Authors: Yerin Hwang, Yongil Kim, Hyunkyung Bae, Jeesoo Bang, Hwanhee Lee, Kyomin Jung

Abstract: To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer… ▽ More To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer alignment. To overcome this limitation, we propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance from textual sources. The framework incorporates two training tasks: question-answer matching (QAM) and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted during the inference phase based on the contextual relevance of the generated questions. Using our framework, we produce four ConvQA datasets by utilizing documents from multiple domains as the primary source. Through automatic evaluation using diverse metrics, as well as human evaluation, we validate that our proposed framework exhibits the ability to generate datasets of higher quality compared to the baseline dialog inpainting model. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 main conference

arXiv:2308.15053 [pdf, other]

Adapting Text-based Dialogue State Tracker for Spoken Dialogues

Authors: Jaeseok Yoon, Seunghyun Hwang, Ran Han, Jeonguk Bang, Kee-Eung Kim

Abstract: Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are ve… ▽ More Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora. △ Less

Submitted 9 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, In Proceedings of The Eleventh Dialog System Technology Challenge, Association for Computational Linguistics

arXiv:2303.14386 [pdf, other]

Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection

Authors: Hwanjun Song, Jihwan Bang

Abstract: Prompt-OVD is an efficient and effective framework for open-vocabulary object detection that utilizes class embeddings from CLIP as prompts, guiding the Transformer decoder to detect objects in both base and novel classes. Additionally, our novel RoI-based masked attention and RoI pruning techniques help leverage the zero-shot classification ability of the Vision Transformer-based CLIP, resulting… ▽ More Prompt-OVD is an efficient and effective framework for open-vocabulary object detection that utilizes class embeddings from CLIP as prompts, guiding the Transformer decoder to detect objects in both base and novel classes. Additionally, our novel RoI-based masked attention and RoI pruning techniques help leverage the zero-shot classification ability of the Vision Transformer-based CLIP, resulting in improved detection performance at minimal computational cost. Our experiments on the OV-COCO and OVLVIS datasets demonstrate that Prompt-OVD achieves an impressive 21.2 times faster inference speed than the first end-to-end open-vocabulary detection method (OV-DETR), while also achieving higher APs than four two-stage-based methods operating within similar inference time ranges. Code will be made available soon. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: version 1

arXiv:2212.11151 [pdf, other]

Template-Based Conjecturing for Automated Induction in Isabelle/HOL

Authors: Yutaka Nagashima, Zi** Xu, Ningli Wang, Daniel Sebastian Goc, James Bang

Abstract: Proof by induction plays a central role in formal verification. However, its automation remains as a formidable challenge in Computer Science. To solve inductive problems, human engineers often have to provide auxiliary lemmas manually. We automate this laborious process with template-based conjecturing, a novel approach to generate auxiliary lemmas and use them to prove final goals. Our evaluatio… ▽ More Proof by induction plays a central role in formal verification. However, its automation remains as a formidable challenge in Computer Science. To solve inductive problems, human engineers often have to provide auxiliary lemmas manually. We automate this laborious process with template-based conjecturing, a novel approach to generate auxiliary lemmas and use them to prove final goals. Our evaluation shows that our working prototype, TBC, achieved 40 percentage point improvement of success rates for problems at intermediate difficulty level. △ Less

Submitted 19 January, 2023; v1 submitted 20 November, 2022; originally announced December 2022.

Comments: To appear at Fundamentals of Software engineering 2023 (http://fsen.ir/2023/)

arXiv:2210.07805 [pdf, other]

Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning

Authors: Dongmin Park, Yooju Shin, Jihwan Bang, Youngjun Lee, Hwanjun Song, Jae-Gil Lee

Abstract: Unlabeled data examples awaiting annotations contain open-set noise inevitably. A few active learning studies have attempted to deal with this open-set noise for sample selection by filtering out the noisy examples. However, because focusing on the purity of examples in a query set leads to overlooking the informativeness of the examples, the best balancing of purity and informativeness remains an… ▽ More Unlabeled data examples awaiting annotations contain open-set noise inevitably. A few active learning studies have attempted to deal with this open-set noise for sample selection by filtering out the noisy examples. However, because focusing on the purity of examples in a query set leads to overlooking the informativeness of the examples, the best balancing of purity and informativeness remains an important question. In this paper, to solve this purity-informativeness dilemma in open-set active learning, we propose a novel Meta-Query-Net,(MQ-Net) that adaptively finds the best balancing between the two factors. Specifically, by leveraging the multi-round property of active learning, we train MQ-Net using a query set without an additional validation set. Furthermore, a clear dominance relationship between unlabeled examples is effectively captured by MQ-Net through a novel skyline regularization. Extensive experiments on multiple open-set active learning scenarios demonstrate that the proposed MQ-Net achieves 20.14% improvement in terms of accuracy, compared with the state-of-the-art methods. △ Less

Submitted 11 January, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: published in NeurIPS 2022

arXiv:2207.03858 [pdf, other]

DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training

Authors: Yukyung Lee, Takyoung Kim, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim

Abstract: Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue c… ▽ More Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue corpora. In this study, we propose DSTEA, improving Dialogue State Tracking via Entity Adaptive pre-training, which can enhance the encoder through by intensively training key entities in dialogue utterances. DSTEA identifies these pivotal entities from input dialogues utilizing four different methods: ontology information, named-entity recognition, the spaCy, and the flair library. Subsequently, it employs selective knowledge masking to train the model effectively. Remarkably, DSTEA only requires pre-training without the direct infusion of extra knowledge into the DST model. This approach resulted in substantial performance improvements of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further validation of DSTEA's efficacy was provided through comparative experiments considering various entity types and different entity adaptive pre-training configurations such as masking strategy and masking rate. △ Less

Submitted 23 July, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Journal ref: KnowledgeNLP@KDD2023

arXiv:2203.15355 [pdf, other]

Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries

Authors: Jihwan Bang, Hyunseo Koh, Seulki Park, Hwanjun Song, Jung-Woo Ha, Jonghyun Choi

Abstract: Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with… ▽ More Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle. To address the task, we first argue the importance of both diversity and purity of examples in the episodic memory of continual learning models. To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. Our empirical validations on four real-world or synthetic noise datasets (CIFAR10 and 100, mini-WebVision, and Food-101N) exhibit that our method significantly outperforms prior arts in this realistic and challenging continual learning scenario. Code and data splits are available in https://github.com/clovaai/puridiver. △ Less

Submitted 30 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted paper at CVPR 2022

arXiv:2112.07208 [pdf]

Interpretable Convolutional Neural Networks for Subject-Independent Motor Imagery Classification

Authors: Ji-Seon Bang, Seong-Whan Lee

Abstract: Deep learning frameworks have become increasingly popular in brain computer interface (BCI) study thanks to their outstanding performance. However, in terms of the classification model alone, they are treated as black box as they do not provide any information on what led them to reach a particular decision. In other words, we cannot convince whether the high performance was aroused by the neuro-p… ▽ More Deep learning frameworks have become increasingly popular in brain computer interface (BCI) study thanks to their outstanding performance. However, in terms of the classification model alone, they are treated as black box as they do not provide any information on what led them to reach a particular decision. In other words, we cannot convince whether the high performance was aroused by the neuro-physiological factors or simply noise. Because of this disadvantage, it is difficult to ensure adequate reliability compared to their high performance. In this study, we propose an explainable deep learning model for BCI. Specifically, we aim to classify EEG signal which is obtained from the motor-imagery (MI) task. In addition, we adopted layer-wise relevance propagation (LRP) to the model to interpret the reason that the model derived certain classification output. We visualized the heatmap which indicates the output of the LRP in form of topography to certify neuro-physiological factors. Furthermore, we classified EEG with the subject-independent manner to learn robust and generalized EEG features by avoiding subject dependency. The methodology also provides the advantage of avoiding the expense of building training data for each subject. With our proposed model, we obtained generalized heatmap patterns for all subjects. As a result, we can conclude that our proposed model provides neuro-physiologically reliable interpretation. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: Submitted to IEEE 10th International Winter Conference on Brain-Computer Interface (BCI 2022)

arXiv:2109.01212 [pdf, other]

A Reliable, Self-Adaptive Face Identification Framework via Lyapunov Optimization

Authors: Dohyeon Kim, Joongheon Kim, Jae young Bang

Abstract: Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a… ▽ More Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a queue overflow that hampers the system's reliability. This paper proposes a novel, queue-aware FID framework that adapts the sampling rate to maximize the FID performance while avoiding a queue overflow by implementing the Lyapunov optimization. A preliminary evaluation via a trace-based simulation confirms the effectiveness of the framework. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: This paper was presented at ACM Symposium on Operating Systems Principles (SOSP) Workshop on AI Systems (AISys), Shanghai, China, October 2017

arXiv:2108.12637 [pdf, other]

Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

Authors: Takyoung Kim, Yukyung Lee, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim

Abstract: The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no… ▽ More The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study, "Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind after a certain topic is over?" We found that the answer is "No" because DST models cannot refer to previous user preferences when template-based turnback utterances are injected into the dataset. Even in the the simplest mind-changing (turnback) scenario, the performance of DST models significantly degenerated. However, we found that this performance degeneration can be recovered when the turnback scenarios are explicitly designed in the training set, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset. △ Less

Submitted 12 October, 2022; v1 submitted 28 August, 2021; originally announced August 2021.

Comments: SereTOD Workshop at EMNLP 2022

arXiv:2107.07062 [pdf]

Motor Imagery Classification based on CNN-GRU Network with Spatio-Temporal Feature Representation

Authors: Ji-Seon Bang, Seong-Whan Lee

Abstract: Recently, various deep neural networks have been applied to classify electroencephalogram (EEG) signal. EEG is a brain signal that can be acquired in a non-invasive way and has a high temporal resolution. It can be used to decode the intention of users. As the EEG signal has a high dimension of feature space, appropriate feature extraction methods are needed to improve classification performance.… ▽ More Recently, various deep neural networks have been applied to classify electroencephalogram (EEG) signal. EEG is a brain signal that can be acquired in a non-invasive way and has a high temporal resolution. It can be used to decode the intention of users. As the EEG signal has a high dimension of feature space, appropriate feature extraction methods are needed to improve classification performance. In this study, we obtained spatio-temporal feature representation and classified them with the combined convolutional neural networks (CNN)-gated recurrent unit (GRU) model. To this end, we obtained covariance matrices in each different temporal band and then concatenated them on the temporal axis to obtain a final spatio-temporal feature representation. In the classification model, CNN is responsible for spatial feature extraction and GRU is responsible for temporal feature extraction. Classification performance was improved by distinguishing spatial data processing and temporal data processing. The average accuracy of the proposed model was 77.70% for the BCI competition IV_2a data set. The proposed method outperformed all other methods compared as a baseline method. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: Submitted to IAPR 6th Asian Conference on Pattern Recognition (ACPR 2021)

arXiv:2104.06142 [pdf, other]

doi 10.1145/3514221.3526181

Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning

Authors: Pramod Chunduri, Jaeho Bang, Yao Lu, Joy Arulraj

Abstract: Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is… ▽ More Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is also important to consider the entire sequence of frames to answer the query effectively. In this paper, we present ZEUS, a video analytics system tailored for answering action queries. We present a novel technique for efficiently answering these queries using deep reinforcement learning. ZEUS trains a reinforcement learning agent that learns to adaptively modify the input video segments that are subsequently sent to an action classification network. The agent alters the input segments along three dimensions - sampling rate, segment length, and resolution. To meet the user-specified accuracy target, ZEUS's query optimizer trains the agent based on an accuracy-aware, aggregate reward function. Evaluation on three diverse video datasets shows that ZEUS outperforms state-of-the-art frame- and window-based filtering techniques by up to 22.1x and 4.7x, respectively. It also consistently meets the user-specified accuracy target across all queries. △ Less

Submitted 27 September, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

Journal ref: In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22). Philadelphia, PA, USA, 545-558

arXiv:2104.01671 [pdf, other]

EKO: Adaptive Sampling of Compressed Video Data

Authors: Jaeho Bang, Pramod Chunduri, Joy Arulraj

Abstract: Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the… ▽ More Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the sampling algorithm. Third, the sampling algorithms often require labeled training data obtained using a specific deep learning model. These limitations lead to lower accuracy, higher query execution time, and larger memory footprint. In this paper, we present EKO, a storage engine for efficiently managing video data. EKO relies on two optimizations. First, it uses a novel unsupervised, adaptive sampling algorithm for identifying the key frames in a given video. Second, it stores the identified key frames in a compressed representation that is optimized for machine consumption. We show that EKO improves F1-score by up to 9% compared to the next best performing state-of-the-art unsupervised, sampling algorithms by selecting more representative frames. It reduces query execution time by 3X and memory footprint by 10X in comparison to a widely-used, traditional video storage format. △ Less

Submitted 4 April, 2021; originally announced April 2021.

arXiv:2103.17230 [pdf, other]

Rainbow Memory: Continual Learning with a Memory of Diverse Samples

Authors: Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, Jonghyun Choi

Abstract: Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memor… ▽ More Continual learning is a realistic learning scenario for AI models. Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial. Instead, we focus on 'blurry' task boundary; where tasks shares classes and is more realistic and practical. To address such task, we argue the importance of diversity of samples in an episodic memory. To enhance the sample diversity in the memory, we propose a novel memory management strategy based on per-sample classification uncertainty and data augmentation, named Rainbow Memory (RM). With extensive empirical validations on MNIST, CIFAR10, CIFAR100, and ImageNet datasets, we show that the proposed method significantly improves the accuracy in blurry continual learning setups, outperforming state of the arts by large margins despite its simplicity. Code and data splits will be available in https://github.com/clovaai/rainbow-memory. △ Less

Submitted 31 March, 2021; originally announced March 2021.

Comments: Accepted paper at CVPR 2021

arXiv:2006.11021 [pdf, other]

Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples

Authors: Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha

Abstract: The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training pipeline boosting the conventional active learning approach targeting label-efficient learning to resolve the mentioned problem. Existing active learning methods only… ▽ More The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models. In this paper, we present a new training pipeline boosting the conventional active learning approach targeting label-efficient learning to resolve the mentioned problem. Existing active learning methods only focus on selecting a set of informative samples under a labeling budget. One step further, we suggest that the training efficiency can be further improved by utilizing the unlabeled samples, exceeding the labeling budget, by introducing sophisticatedly configured unsupervised loss complementing supervised loss effectively. We propose new unsupervised loss based on consistency regularization, and we configure appropriate augmentation techniques for utterances to adopt consistency regularization in the automatic speech recognition task. From the qualitative and quantitative experiments on the real-world dataset and under real-usage scenarios, we show that the proposed training pipeline can boost the efficacy of active learning approaches, thus successfully reducing a sustainable amount of human labeling cost. △ Less

Submitted 5 November, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: 8 pages, 4 figures, 2 tables

arXiv:1911.09099 [pdf, other]

SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

Authors: Hyo** Park, Lars Lowe Sjösund, YoungJoon Yoo, Nicolas Monet, Jihwan Bang, Nojun Kwak

Abstract: Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem and less handled in the semantic segmentation field. Obviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of… ▽ More Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem and less handled in the semantic segmentation field. Obviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many real-world applications, it requires extremely lightweight models. Second, there has not been any public datasets in this domain that contain a sufficient number of images with unbiased statistics. To solve the first problem, we introduce the new extremely lightweight portrait segmentation model SINet, containing an information blocking decoder and spatial squeeze modules. The information blocking decoder uses confidence estimates to recover local spatial information without spoiling global consistency. The spatial squeeze module uses multiple receptive fields to cope with various sizes of consistency in the image. To tackle the second problem, we propose a simple method to create additional portrait segmentation data which can improve accuracy on the EG1800 dataset. In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models. Our method reduces the number of parameters from 2.1M to 86.9K (around 95.9% reduction), while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. We also show our model is successfully executed on a real mobile device with 100.6 FPS. In addition, we demonstrate that our method can be used for general semantic segmentation on the Cityscapes dataset. The code and dataset are available in https://github.com/HYOJINPARK/ExtPortraitSeg . △ Less

Submitted 9 February, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: https://github.com/HYOJINPARK/ExtPortraitSeg. arXiv admin note: text overlap with arXiv:1908.03093

arXiv:1908.03093 [pdf, other]

ExtremeC3Net: Extreme Lightweight Portrait Segmentation Networks using Advanced C3-modules

Authors: Hyo** Park, Lars Lowe Sjösund, YoungJoon Yoo, Jihwan Bang, Nojun Kwak

Abstract: Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem. bviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many realworld applications, it r… ▽ More Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem. bviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many realworld applications, it requires extremely lightweight models. Second, there has not been any public datasets in this domain that contain a sufficient number of images with unbiased statistics. To solve the problems, we introduce a new extremely lightweight portrait segmentation model consisting of a two-branched architecture based on the concentrated-comprehensive convolutions block. Our method reduces the number of parameters from 2.1M to 37.7K (around 98.2% reduction), while maintaining the accuracy within a 1% margin from the state-of-the-art portrait segmentation method. In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models. Second, we propose a simple method to create additional portrait segmentation data which can improve accuracy on the EG1800 dataset. Also, we analyze the bias in public datasets by additionally annotating race, gender, and age on our own. The augmented dataset, the additional annotations and code are available in https://github.com/HYOJINPARK/ExtPortraitSeg . △ Less

Submitted 9 December, 2019; v1 submitted 8 August, 2019; originally announced August 2019.

Comments: https://github.com/HYOJINPARK/ExtPortraitSeg

arXiv:1603.08604 [pdf, other]

Classification-based Financial Markets Prediction using Deep Neural Networks

Authors: Matthew Dixon, Diego Klabjan, ** Hoon Bang

Abstract: Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previousl… ▽ More Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy backtesting environment both of which are available as open source code written by the authors. △ Less

Submitted 13 June, 2017; v1 submitted 28 March, 2016; originally announced March 2016.

arXiv:0911.0971 [pdf, ps, other]

Multicell Zero-Forcing and User Scheduling on the Downlink of a Linear Cell Array

Authors: H. J. Bang, D. Gesbert

Abstract: Coordinated base station (BS) transmission has attracted much interest for its potential to increase the capacity of wireless networks. Yet at the same time, the achievable sum-rate with single-cell processing (SCP) scales optimally with the number of users under Rayleigh fading conditions. One may therefore ask if the value of BS coordination is limited in the many-user regime from a sum-rate p… ▽ More Coordinated base station (BS) transmission has attracted much interest for its potential to increase the capacity of wireless networks. Yet at the same time, the achievable sum-rate with single-cell processing (SCP) scales optimally with the number of users under Rayleigh fading conditions. One may therefore ask if the value of BS coordination is limited in the many-user regime from a sum-rate perspective. With this in mind we consider multicell zero-forcing beamforming (ZFBF) on the downlink of a linear cell-array. We first identify the beamforming weights and the optimal scheduling policy under a per-base power constraint. We then compare the number of users m and n required per-cell to achieve the same mean SINR, after optimal scheduling, with SCP and ZFBF respectively. Specifically, we show that the ratio m/n grows logarithmically with n. Finally, we demonstrate that the gain in sum-rate between ZFBF and SCP is significant for all practical values of number of users. △ Less

Submitted 6 November, 2009; v1 submitted 5 November, 2009; originally announced November 2009.

Comments: 15 pages, 3 figures

Showing 1–25 of 25 results for author: Bang, J