Search | arXiv e-print repository

arXiv:2407.00573 [pdf, other]

A Simple Representation of Tree Covering Utilizing Balanced Parentheses and Efficient Implementation of Average-Case Optimal RMQs

Authors: Kou Hamada, Sankardeep Chakraborty, Seungbum Jo, Takuto Koriyama, Kunihiko Sadakane, Srinivasa Rao Satti

Abstract: Tree covering is a technique for decomposing a tree into smaller-sized trees with desirable properties, and has been employed in various succinct data structures. However, significant hurdles stand in the way of a practical implementation of tree covering: a lot of pointers are used to maintain the tree-covering hierarchy and many indices for tree navigational queries consume theoretically negligi… ▽ More Tree covering is a technique for decomposing a tree into smaller-sized trees with desirable properties, and has been employed in various succinct data structures. However, significant hurdles stand in the way of a practical implementation of tree covering: a lot of pointers are used to maintain the tree-covering hierarchy and many indices for tree navigational queries consume theoretically negligible yet practically vast space. To tackle these problems, we propose a simple representation of tree covering using a balanced parenthesis representation. The key to the proposal is the observation that every micro tree splits into at most two intervals on the BP representation. Utilizing the representation, we propose several data structures that represent a tree and its tree cover, which consequently allow micro tree compression with arbitrary coding and efficient tree navigational queries. We also applied our data structure to average-case optimal RMQ by Munro et al.~[ESA 2021] and implemented the RMQ data structure. Our RMQ data structures spend less than $2n$ bits and process queries in a practical time on several settings of the performance evaluation, reducing the gap between theoretical space complexity and actual space consumption. We also implement tree navigational operations while using the same amount of space as the RMQ data structures. We believe the representation can be widely utilized for designing practically memory-efficient data structures based on tree covering. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: To appear in ESA 2024

arXiv:2406.06134 [pdf, other]

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Authors: Donggeun Ko, Sangwoo Jo, Dongjun Lee, Namjun Park, Jaekwang Kim

Abstract: Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by develo** novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generativ… ▽ More Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by develo** novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 10 pages (including supplementary), 3 figures, SynData4CV@CVPR 24 (Workshop)

arXiv:2406.05761 [pdf, other]

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2405.20649 [pdf, other]

Reward-based Input Construction for Cross-document Relation Extraction

Authors: Byeonghu Na, Suhyeon Jo, Yeongmin Kim, Il-Chul Moon

Abstract: Relation extraction (RE) is a fundamental task in natural language processing, aiming to identify relations between target entities in text. While many RE methods are designed for a single sentence or document, cross-document RE has emerged to address relations across multiple long documents. Given the nature of long documents in cross-document RE, extracting document embeddings is challenging due… ▽ More Relation extraction (RE) is a fundamental task in natural language processing, aiming to identify relations between target entities in text. While many RE methods are designed for a single sentence or document, cross-document RE has emerged to address relations across multiple long documents. Given the nature of long documents in cross-document RE, extracting document embeddings is challenging due to the length constraints of pre-trained language models. Therefore, we propose REward-based Input Construction (REIC), the first learning-based sentence selector for cross-document RE. REIC extracts sentences based on relational evidence, enabling the RE module to effectively infer relations. Since supervision of evidence sentences is generally unavailable, we train REIC using reinforcement learning with RE prediction scores as rewards. Experimental results demonstrate the superiority of our method over heuristic methods for different RE structures and backbones in cross-document RE. Our code is publicly available at https://github.com/aailabkaist/REIC. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted at ACL 2024 main conference

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00384 [pdf, other]

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

Authors: Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim

Abstract: We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy w… ▽ More We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy with CLIP's image embedding, leading to biased tag relevancy. In this paper, we introduce a novel two-step fine-tuning approach, Text-Tag Self-Distillation (TTD), to address this challenge. TTD first extracts image-relevant tags from text based on their similarity to the nearest pixels then employs a self-distillation strategy to align combined masks with the text-derived mask. This approach ensures the unbiased image-text alignment of the CLIP-based models using only image-text pairs without necessitating additional supervision. Our technique demonstrates model-agnostic improvements in multi-tag classification and segmentation tasks, surpassing competing methods that rely on external resources. The code is available at https://github.com/shjo-april/TTD. △ Less

Submitted 20 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00380 [pdf, other]

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Authors: Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim

Abstract: Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion method… ▽ More Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8\%, COCO: 53.9\%, Context: 49.0\%, ADE: 32.9\%, Stuff: 37.4\%), reducing the gap with fully supervised methods by over 84\% on the VOC validation set. Code is available at https://github.com/shjo-april/DHR. △ Less

Submitted 19 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.13835 [pdf, other]

SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

Authors: Saehan Jo, Immanuel Trummer

Abstract: The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such a… ▽ More The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost. We introduce SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality. It enables users to specify an accuracy constraint in terms of the equivalence of outputs to those of the most powerful LLM. SMART then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SMART employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined accuracy level. SMART optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, our approach significantly reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.09024 [pdf, other]

Semiparametric Token-Sequence Co-Supervision

Authors: Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo

Abstract: In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequen… ▽ More In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequence embedding space is constructed by a separate language model tasked to condense an input text into a single representative embedding. Our experiments demonstrate that a model trained via both supervisions consistently surpasses models trained via each supervision independently. Analysis suggests that this co-supervision encourages a broader generalization capability across the model. Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.15162 [pdf, other]

Entity-level Factual Adaptiveness of Fine-tuning based Abstractive Summarization Models

Authors: Jongyoon Song, Nohil Park, Bongkyu Hwang, Jaewoong Yun, Seongho Joe, Youngjune L. Gwon, Sungroh Yoon

Abstract: Abstractive summarization models often generate factually inconsistent content particularly when the parametric knowledge of the model conflicts with the knowledge in the input document. In this paper, we analyze the robustness of fine-tuning based summarization models to the knowledge conflict, which we call factual adaptiveness. We utilize pre-trained language models to construct evaluation sets… ▽ More Abstractive summarization models often generate factually inconsistent content particularly when the parametric knowledge of the model conflicts with the knowledge in the input document. In this paper, we analyze the robustness of fine-tuning based summarization models to the knowledge conflict, which we call factual adaptiveness. We utilize pre-trained language models to construct evaluation sets and find that factual adaptiveness is not strongly correlated with factual consistency on original datasets. Furthermore, we introduce a controllable counterfactual data augmentation method where the degree of knowledge conflict within the augmented data can be adjustable. Our experimental results on two pre-trained language models (PEGASUS and BART) and two fine-tuning datasets (XSum and CNN/DailyMail) demonstrate that our method enhances factual adaptiveness while achieving factual consistency on original datasets on par with the contrastive learning baseline. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: EACL 2024

arXiv:2402.09450 [pdf, other]

Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram

Authors: Yeongyeon Na, Minje Park, Yunwon Tae, Sunghoon Joo

Abstract: Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through… ▽ More Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through self-supervised learning (SSL) is a well-known approach to overcome the scarcity of labeled data; however, a naive application of SSL to ECG data, without considering the spatial-temporal relationships inherent in ECG signals, may yield suboptimal results. In this paper, we introduce ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data. ST-MEM outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks. Moreover, we demonstrate that ST-MEM is adaptable to various lead combinations. Through quantitative and qualitative analysis, we show a spatio-temporal relationship within ECG data. Our code is available at https://github.com/bakqui/ST-MEM. △ Less

Submitted 19 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICLR 2024. The first three authors contribute equally

arXiv:2402.08359 [pdf, other]

Learning to Produce Semi-dense Correspondences for Visual Localization

Authors: Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract: This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes. While many prior studies have focused on improving image-matching performance to facilitate reliable dense keypoint matching between images, existing methods often heavily rely on predefined feature points on a reconstructed 3D model. Con… ▽ More This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes. While many prior studies have focused on improving image-matching performance to facilitate reliable dense keypoint matching between images, existing methods often heavily rely on predefined feature points on a reconstructed 3D model. Consequently, they tend to overlook unobserved keypoints during the matching process. Therefore, dense keypoint matches are not fully exploited, leading to a notable reduction in accuracy, particularly in noisy scenes. To tackle this issue, we propose a novel localization method that extracts reliable semi-dense 2D-3D matching points based on dense keypoint matches. This approach involves regressing semi-dense 2D keypoints into 3D scene coordinates using a point inference network. The network utilizes both geometric and visual cues to effectively infer 3D coordinates for unobserved keypoints from the observed ones. The abundance of matching information significantly enhances the accuracy of camera pose estimation, even in scenarios involving noisy or sparse 3D models. Comprehensive evaluations demonstrate that the proposed method outperforms other methods in challenging scenes and achieves competitive results in large-scale visual localization benchmarks. The code will be available. △ Less

Submitted 20 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted at CVPR 2024

arXiv:2311.09069 [pdf, other]

How Well Do Large Language Models Truly Ground?

Authors: Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo

Abstract: To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we pr… ▽ More To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications. △ Less

Submitted 29 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: published at NAACL 2022

arXiv:2311.02839 [pdf, ps, other]

Cell-Probe Lower Bound for Accessible Interval Graphs

Authors: Sankardeep Chakraborty, Christian Engels, Seungbum Jo, Mingmou Liu

Abstract: We spot a hole in the area of succinct data structures for graph classes from a universe of size at most $n^n$. Very often, the input graph is labeled by the user in an arbitrary and easy-to-use way, and the data structure for the graph relabels the input graph in some way. For any access, the user needs to store these labels or compute the new labels in an online manner. This might require more b… ▽ More We spot a hole in the area of succinct data structures for graph classes from a universe of size at most $n^n$. Very often, the input graph is labeled by the user in an arbitrary and easy-to-use way, and the data structure for the graph relabels the input graph in some way. For any access, the user needs to store these labels or compute the new labels in an online manner. This might require more bits than the information-theoretic minimum of the original graph class, hence, defeating the purpose of succinctness. Given this, the data structure designer must allow the user to access the data structure with the original labels, i.e., relabeling is not allowed. We call such a graph data structure ``accessible''. In this paper, we study the complexity of such accessible data structures for interval graphs, a graph class with information-theoretic minimum less than $n\log n$ bits. - We formalize the concept of "accessibility" (which was implicitly assumed), and propose the "universal interval representation", for interval graphs. - Any data structure for interval graphs in universal interval representation, which supports both adjacency and degree query simultaneously with time cost $t_1$ and $t_2$ respectively, must consume at least $\log_2(n!)+n/(\log n)^{O(t_1+t_2)}$ bits of space. This is also the first lower bound for graph classes with information-theoretic minimum less than $n\log_2n$ bits. - We provide efficient succinct data structures for interval graphs in universal interval representation supporting adjacency query and degree query individually in constant time and space costs. Therefore, two upper bounds together with the lower bound show that the two elementary queries for interval graphs are incompatible with each other in the context of succinct data structure. To the best of our knowledge, this is the first proof of such incompatibility phenomenon. △ Less

Submitted 5 November, 2023; originally announced November 2023.

MSC Class: 68P05; 68P30; 68Q17 ACM Class: E.1; F.1.3; F.2.3

arXiv:2311.02427 [pdf, other]

Succinct Data Structure for Graphs with $d$-Dimensional $t$-Representation

Authors: Girish Balakrishnan, Sankardeep Chakraborty, Seungbum Jo, N S Narayanaswamy, Kunihiko Sadakane

Abstract: Erdős and West (Discrete Mathematics'85) considered the class of $n$ vertex intersection graphs which have a {\em $d$-dimensional} {\em $t$-representation}, that is, each vertex of a graph in the class has an associated set consisting of at most $t$ $d$-dimensional axis-parallel boxes. In particular, for a graph $G$ and for each $d \geq 1$, they consider $i_d(G)$ to be the minimum $t$ for which… ▽ More Erdős and West (Discrete Mathematics'85) considered the class of $n$ vertex intersection graphs which have a {\em $d$-dimensional} {\em $t$-representation}, that is, each vertex of a graph in the class has an associated set consisting of at most $t$ $d$-dimensional axis-parallel boxes. In particular, for a graph $G$ and for each $d \geq 1$, they consider $i_d(G)$ to be the minimum $t$ for which $G$ has such a representation. For fixed $t$ and $d$, they consider the class of $n$ vertex labeled graphs for which $i_d(G) \leq t$, and prove an upper bound of $(2nt+\frac{1}{2})d \log n - (n - \frac{1}{2})d \log(4πt)$ on the logarithm of size of the class. In this work, for fixed $t$ and $d$ we consider the class of $n$ vertex unlabeled graphs which have a {\em $d$-dimensional $t$-representation}, denoted by $\mathcal{G}_{t,d}$. We address the problem of designing a succinct data structure for the class $\mathcal{G}_{t,d}$ in an attempt to generalize the relatively recent results on succinct data structures for interval graphs (Algorithmica'21). To this end, for each $n$ such that $td^2$ is in $o(n / \log n)$, we first prove a lower bound of $(2dt-1)n \log n - O(ndt \log \log n)$-bits on the size of any data structure for encoding an arbitrary graph that belongs to $\mathcal{G}_{t,d}$. We then present a $((2dt-1)n \log n + dt\log t + o(ndt \log n))$-bit data structure for $\mathcal{G}_{t,d}$ that supports navigational queries efficiently. Contrasting this data structure with our lower bound argument, we show that for each fixed $t$ and $d$, and for all $n \geq 0$ when $td^2$ is in $o(n/\log n)$ our data structure for $\mathcal{G}_{t,d}$ is succinct. As a byproduct, we also obtain succinct data structures for graphs of bounded boxicity (denoted by $d$ and $t = 1$) and graphs of bounded interval number (denoted by $t$ and $d=1$) when $td^2$ is in $o(n/\log n)$. △ Less

Submitted 6 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: 21 pages, 5 figures

arXiv:2310.14663 [pdf, other]

DPP-TTS: Diversifying prosodic features of speech via determinantal point processes

Authors: Seongho Joo, Hyukhun Koh, Kyomin Jung

Abstract: With the rapid advancement in deep generative models, recent neural Text-To-Speech(TTS) models have succeeded in synthesizing human-like speech. There have been some efforts to generate speech with various prosody beyond monotonous prosody patterns. However, previous works have several limitations. First, typical TTS models depend on the scaled sampling temperature for boosting the diversity of pr… ▽ More With the rapid advancement in deep generative models, recent neural Text-To-Speech(TTS) models have succeeded in synthesizing human-like speech. There have been some efforts to generate speech with various prosody beyond monotonous prosody patterns. However, previous works have several limitations. First, typical TTS models depend on the scaled sampling temperature for boosting the diversity of prosody. Speech samples generated at high sampling temperatures often lack perceptual prosodic diversity, which can adversely affect the naturalness of the speech. Second, the diversity among samples is neglected since the sampling procedure often focuses on a single speech sample rather than multiple ones. In this paper, we propose DPP-TTS: a text-to-speech model based on Determinantal Point Processes (DPPs) with a prosody diversifying module. Our TTS model is capable of generating speech samples that simultaneously consider perceptual diversity in each sample and among multiple samples. We demonstrate that DPP-TTS generates speech samples with more diversified prosody than baselines in the side-by-side comparison test considering the naturalness of speech at the same time. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2308.03251 [pdf, ps, other]

Joint Precoding and Fronthaul Compression for Cell-Free MIMO Downlink With Radio Stripes

Authors: Sangwon Jo, Hoon Lee, Seok-Hwan Park

Abstract: A sequential fronthaul network, referred to as radio stripes, is a promising fronthaul topology of cell-free MIMO systems. In this setup, a single cable suffices to connect access points (APs) to a central processor (CP). Thus, radio stripes are more effective than conventional star fronthaul topology which requires dedicated cables for each of APs. Most of works on radio stripes focused on the up… ▽ More A sequential fronthaul network, referred to as radio stripes, is a promising fronthaul topology of cell-free MIMO systems. In this setup, a single cable suffices to connect access points (APs) to a central processor (CP). Thus, radio stripes are more effective than conventional star fronthaul topology which requires dedicated cables for each of APs. Most of works on radio stripes focused on the uplink communication or downlink energy transfer. This work tackles the design of the downlink data transmission for the first time. The CP sends compressed information of linearly precoded signals to the APs on fronthaul. Due to the serial transfer on radio stripes, each AP has an access to all the compressed blocks which pass through it. Thus, an advanced compression technique, called Wyner-Ziv (WZ) compression, can be applied in which each AP decompresses all the received blocks to exploit them for the reconstruction of its desired precoded signal as side information. The problem of maximizing the sum-rate is tackled under the standard point-to-point (P2P) and WZ compression strategies. Numerical results validate the performance gains of the proposed scheme. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: To be presented at IEEE Globecom 2023, Kuala Lumpur, Malaysia, Dec. 2023

arXiv:2307.05916 [pdf, other]

SwiFT: Swin 4D fMRI Transformer

Authors: Peter Yongho Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, Taesup Moon

Abstract: Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Trans… ▽ More Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI. △ Less

Submitted 31 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.00485 [pdf, other]

TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching

Authors: Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract: This study tackles the challenge of image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches suffer from high computational costs and may not capture sufficient high… ▽ More This study tackles the challenge of image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches suffer from high computational costs and may not capture sufficient high-level contextual information, such as structural shapes or semantic instances. Consequently, the encoded features may lack discriminative power in challenging scenes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents a latent semantic instance. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Additionally, our method effectively matches features within corresponding semantic regions by estimating the covisible topics. To enhance the efficiency of feature matching, we have designed a network with a pooling-and-merging attention module. This module reduces computation by employing attention only on fixed-sized topics and small-sized features. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method significantly reduces computational costs while maintaining higher image-matching accuracy compared to state-of-the-art methods. The code will be updated soon at https://github.com/TruongKhang/TopicFM △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Paper extension of TopicFM (arXiv:2207.00328)

arXiv:2306.00227 [pdf]

From Human-Centered to Social-Centered Artificial Intelligence: Assessing ChatGPT's Impact through Disruptive Events

Authors: Skyler Wang, Ned Cooper, Margaret Eby, Eun Seo Jo

Abstract: Large language models (LLMs) and dialogue agents have existed for years, but the release of recent GPT models has been a watershed moment for artificial intelligence (AI) research and society at large. Immediately recognized for its generative capabilities and versatility, ChatGPT's impressive proficiency across technical and creative domains led to its widespread adoption. While society grapples… ▽ More Large language models (LLMs) and dialogue agents have existed for years, but the release of recent GPT models has been a watershed moment for artificial intelligence (AI) research and society at large. Immediately recognized for its generative capabilities and versatility, ChatGPT's impressive proficiency across technical and creative domains led to its widespread adoption. While society grapples with the emerging cultural impacts of ChatGPT, critiques of ChatGPT's impact within the machine learning community have coalesced around its performance or other conventional Responsible AI evaluations relating to bias, toxicity, and 'hallucination.' We argue that these latter critiques draw heavily on a particular conceptualization of the 'human-centered' framework, which tends to cast atomized individuals as the key recipients of both the benefits and detriments of technology. In this article, we direct attention to another dimension of LLMs and dialogue agents' impact: their effect on social groups, institutions, and accompanying norms and practices. By illustrating ChatGPT's social impact through three disruptive events, we challenge individualistic approaches in AI development and contribute to ongoing debates around the ethical and responsible implementation of AI systems. We hope this effort will call attention to more comprehensive and longitudinal evaluation tools and compel technologists to go beyond human-centered thinking and ground their efforts through social-centered AI. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.14045 [pdf, other]

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Authors: Seungone Kim, Se June Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, Minjoon Seo

Abstract: Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Colle… ▽ More Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Collection, which augments the existing Flan Collection (including only 9 CoT tasks) with additional 1.84 million rationales across 1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT Collection enables smaller LMs to have better CoT capabilities on unseen tasks. On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of +4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task accuracy. Furthermore, we show that instruction tuning with CoT Collection allows LMs to possess stronger few-shot learning capabilities on 4 domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and +2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until the max length by a +13.98% margin. Our code, the CoT Collection data, and model checkpoints are publicly available. △ Less

Submitted 14 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 (Main Conference)

arXiv:2305.13046 [pdf, other]

doi 10.1609/aaai.v37i7.25984

POEM: Polarization of Embeddings for Domain-Invariant Representations

Authors: Sang-Yeong Jo, Sung Whan Yoon

Abstract: Handling out-of-distribution samples is a long-lasting challenge for deep visual models. In particular, domain generalization (DG) is one of the most relevant tasks that aims to train a model with a generalization capability on novel domains. Most existing DG approaches share the same philosophy to minimize the discrepancy between domains by finding the domain-invariant representations. On the con… ▽ More Handling out-of-distribution samples is a long-lasting challenge for deep visual models. In particular, domain generalization (DG) is one of the most relevant tasks that aims to train a model with a generalization capability on novel domains. Most existing DG approaches share the same philosophy to minimize the discrepancy between domains by finding the domain-invariant representations. On the contrary, our proposed method called POEM acquires a strong DG capability by learning domain-invariant and domain-specific representations and polarizing them. Specifically, POEM cotrains category-classifying and domain-classifying embeddings while regularizing them to be orthogonal via minimizing the cosine-similarity between their features, i.e., the polarization of embeddings. The clear separation of embeddings suppresses domain-specific features in the domain-invariant embeddings. The concept of POEM shows a unique direction to enhance the domain robustness of representations that brings considerable and consistent performance gains when combined with existing DG methods. Extensive simulation results in popular DG benchmarks with the PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet datasets show that POEM indeed facilitates the category-classifying embedding to be more domain-invariant. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI) 2023, Washington D.C. USA

arXiv:2304.11095 [pdf, other]

Is Cross-modal Information Retrieval Possible without Training?

Authors: Hyun** Choi, Hyunjae Lee, Seongho Joe, Youngjune L. Gwon

Abstract: Encoded representations from a pretrained deep learning model (e.g., BERT text embeddings, penultimate CNN layer activations of an image) convey a rich set of features beneficial for information retrieval. Embeddings for a particular modality of data occupy a high-dimensional space of its own, but it can be semantically aligned to another by a simple map** without training a deep neural net. In… ▽ More Encoded representations from a pretrained deep learning model (e.g., BERT text embeddings, penultimate CNN layer activations of an image) convey a rich set of features beneficial for information retrieval. Embeddings for a particular modality of data occupy a high-dimensional space of its own, but it can be semantically aligned to another by a simple map** without training a deep neural net. In this paper, we take a simple map** computed from the least squares and singular value decomposition (SVD) for a solution to the Procrustes problem to serve a means to cross-modal information retrieval. That is, given information in one modality such as text, the map** helps us locate a semantically equivalent data item in another modality such as image. Using off-the-shelf pretrained deep learning models, we have experimented the aforementioned simple cross-modal map**s in tasks of text-to-image and image-to-text retrieval. Despite simplicity, our map**s perform reasonably well reaching the highest accuracy of 77% on recall@10, which is comparable to those requiring costly neural net training and fine-tuning. We have improved the simple map**s by contrastive learning on the pretrained models. Contrastive learning can be thought as properly biasing the pretrained encoders to enhance the cross-modal map** quality. We have further improved the performance by multilayer perceptron with gating (gMLP), a simple neural architecture. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Journal ref: Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, Proceedings, Part II

arXiv:2304.09913 [pdf, other]

MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation

Authors: Sanghyun Jo, In-Jae Yu, Kyungsu Kim

Abstract: Weakly-supervised semantic segmentation aims to reduce labeling costs by training semantic segmentation models using weak supervision, such as image-level class labels. However, most approaches struggle to produce accurate localization maps and suffer from false predictions in class-related backgrounds (i.e., biased objects), such as detecting a railroad with the train class. Recent methods that r… ▽ More Weakly-supervised semantic segmentation aims to reduce labeling costs by training semantic segmentation models using weak supervision, such as image-level class labels. However, most approaches struggle to produce accurate localization maps and suffer from false predictions in class-related backgrounds (i.e., biased objects), such as detecting a railroad with the train class. Recent methods that remove biased objects require additional supervision for manually identifying biased objects for each problematic class and collecting their datasets by reviewing predictions, limiting their applicability to the real-world dataset with multiple labels and complex relationships for biasing. Following the first observation that biased features can be separated and eliminated by matching biased objects with backgrounds in the same dataset, we propose a fully-automatic/model-agnostic biased removal framework called MARS (Model-Agnostic biased object Removal without additional Supervision), which utilizes semantically consistent features of an unsupervised technique to eliminate biased objects in pseudo labels. Surprisingly, we show that MARS achieves new state-of-the-art results on two popular benchmarks, PASCAL VOC 2012 (val: 77.7%, test: 77.2%) and MS COCO 2014 (val: 49.4%), by consistently improving the performance of various WSSS models by at least 30% without additional supervision. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.09374 [pdf, other]

doi 10.1109/ICPR56361.2022.9956208

Shuffle & Divide: Contrastive Learning for Long Text

Authors: Joonseok Lee, Seongho Joe, Kyoungwon Park, Bogun Kim, Hoyoung Kang, Jaeseon Park, Youngjune Gwon

Abstract: We propose a self-supervised learning method for long text documents based on contrastive learning. A key to our method is Shuffle and Divide (SaD), a simple text augmentation algorithm that sets up a pretext task required for contrastive updates to BERT-based document embedding. SaD splits a document into two sub-documents containing randomly shuffled words in the entire documents. The sub-docume… ▽ More We propose a self-supervised learning method for long text documents based on contrastive learning. A key to our method is Shuffle and Divide (SaD), a simple text augmentation algorithm that sets up a pretext task required for contrastive updates to BERT-based document embedding. SaD splits a document into two sub-documents containing randomly shuffled words in the entire documents. The sub-documents are considered positive examples, leaving all other documents in the corpus as negatives. After SaD, we repeat the contrastive update and clustering phases until convergence. It is naturally a time-consuming, cumbersome task to label text documents, and our method can help alleviate human efforts, which are most expensive resources in AI. We have empirically evaluated our method by performing unsupervised text classification on the 20 Newsgroups, Reuters-21578, BBC, and BBCSport datasets. In particular, our method pushes the current state-of-the-art, SS-SB-MT, on 20 Newsgroups by 20.94% in accuracy. We also achieve the state-of-the-art performance on Reuters-21578 and exceptionally-high accuracy performances (over 95%) for unsupervised classification on the BBC and BBCSport datasets. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Accepted at ICPR 2022

Journal ref: 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 2935-2941

arXiv:2304.09369 [pdf, other]

doi 10.1109/ICPR56361.2022.9956198

ContraCluster: Learning to Classify without Labels by Contrastive Self-Supervision and Prototype-Based Semi-Supervision

Authors: Seongho Joe, Byoungjip Kim, Hoyoung Kang, Kyoungwon Park, Bogun Kim, Jaeseon Park, Joonseok Lee, Youngjune Gwon

Abstract: The recent advances in representation learning inspire us to take on the challenging problem of unsupervised image classification tasks in a principled way. We propose ContraCluster, an unsupervised image classification method that combines clustering with the power of contrastive self-supervised learning. ContraCluster consists of three stages: (1) contrastive self-supervised pre-training (CPT),… ▽ More The recent advances in representation learning inspire us to take on the challenging problem of unsupervised image classification tasks in a principled way. We propose ContraCluster, an unsupervised image classification method that combines clustering with the power of contrastive self-supervised learning. ContraCluster consists of three stages: (1) contrastive self-supervised pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3) prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly accurate, categorically prototypical images in an embedding space learned by contrastive learning. We use sampled prototypes as noisy labeled data to perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and large unlabeled data to further enhance the accuracy. We demonstrate empirically that ContraCluster achieves new state-of-the-art results for standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin. Without any labels, ContraCluster can achieve a 90.8% accuracy that is comparable to 95.8% by the best supervised counterpart. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Accepted at ICPR 2022

Journal ref: 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 4685-4692

arXiv:2303.08329 [pdf, other]

Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

Authors: Suhee Jo, Younggun Lee, Yookyung Shin, Yeongtae Hwang, Taesu Kim

Abstract: In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic… ▽ More In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. By leveraging only a few labeled samples, we generate emotional speech from reading-style speech without losing the speaker identity. Furthermore, emotion strength is readily controllable using a scalar value, providing an intuitive way for users to manipulate speech. Experimental results show the proposed method affords superior performance in terms of expressiveness, naturalness, and controllability, preserving speaker identity. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: accepted to ICASSP 2023

arXiv:2303.03628 [pdf, other]

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

Authors: Seungone Kim, Se June Joo, Yul Jang, Hyungjoo Chae, **young Yeo

Abstract: Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models wi… ▽ More Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at https://github.com/SeungoneKim/CoTEVer. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at EACL 2023 Demo

arXiv:2302.00319 [pdf, other]

Development of deep biological ages aware of morbidity and mortality based on unsupervised and semi-supervised deep learning approaches

Authors: Seong-Eun Moon, Ji Won Yoon, Shinyoung Joo, Yoohyung Kim, Jae Hyun Bae, Seokho Yoon, Haanju Yoo, Young Min Cho

Abstract: Background: While deep learning technology, which has the capability of obtaining latent representations based on large-scale data, can be a potential solution for the discovery of a novel aging biomarker, existing deep learning methods for biological age estimation usually depend on chronological ages and lack of consideration of mortality and morbidity that are the most significant outcomes of a… ▽ More Background: While deep learning technology, which has the capability of obtaining latent representations based on large-scale data, can be a potential solution for the discovery of a novel aging biomarker, existing deep learning methods for biological age estimation usually depend on chronological ages and lack of consideration of mortality and morbidity that are the most significant outcomes of aging. Methods: This paper proposes a novel deep learning model to learn latent representations of biological aging in regard to subjects' morbidity and mortality. The model utilizes health check-up data in addition to morbidity and mortality information to learn the complex relationships between aging and measured clinical attributes. Findings: The proposed model is evaluated on a large dataset of general populations compared with KDM and other learning-based models. Results demonstrate that biological ages obtained by the proposed model have superior discriminability of subjects' morbidity and mortality. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2211.15900 [pdf, other]

Towards More Robust Interpretation via Local Gradient Alignment

Authors: Sunghwan Joo, Seokhyeon Jeong, Juyeon Heo, Adrian Weller, Taesup Moon

Abstract: Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, whic… ▽ More Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods. In this paper, we provide new insights by taking such normalization into account. First, we show that for every non-negative homogeneous neural network, a naive $\ell_2$-robust criterion for gradients is \textit{not} normalization invariant, which means that two functions with the same normalized gradient can have different values. Second, we formulate a normalization invariant cosine distance-based criterion and derive its upper bound, which gives insight for why simply minimizing the Hessian norm at the input, as has been done in previous work, is not sufficient for attaining robust feature attribution. Finally, we propose to combine both $\ell_2$ and cosine distance-based criteria as regularization terms to leverage the advantages of both in aligning the local gradient. As a result, we experimentally show that models trained with our method produce much more robust interpretations on CIFAR-10 and ImageNet-100 without significantly hurting the accuracy, compared to the recent baselines. To the best of our knowledge, this is the first work to verify the robustness of interpretation on a larger-scale dataset beyond CIFAR-10, thanks to the computational efficiency of our method. △ Less

Submitted 7 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: 22 pages (9 pages in paper, 13 pages in Appendix), 9 figures, 6 tables Accepted in AAAI 23 (Association for the Advancement of Artificial Intelligence)

arXiv:2209.11055 [pdf, other]

Efficient Few-Shot Learning Without Prompts

Authors: Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg

Abstract: Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we pr… ▽ More Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. The resulting model is then used to generate rich text embeddings, which are used to train a classification head. This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques. Our experiments show that SetFit obtains comparable results with PEFT and PET techniques, while being an order of magnitude faster to train. We also show that SetFit can be applied in multilingual settings by simply switching the ST body. Our code is available at https://github.com/huggingface/setfit and our datasets at https://huggingface.co/setfit . △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.00930 [pdf, other]

Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization

Authors: Seungone Kim, Se June Joo, Hyungjoo Chae, Chaehyeong Kim, Seung-won Hwang, **young Yeo

Abstract: In this paper, we propose to leverage the unique characteristics of dialogues sharing commonsense knowledge across participants, to resolve the difficulties in summarizing them. We present SICK, a framework that uses commonsense inferences as additional context. Compared to previous work that solely relies on the input dialogue, SICK uses an external knowledge model to generate a rich set of commo… ▽ More In this paper, we propose to leverage the unique characteristics of dialogues sharing commonsense knowledge across participants, to resolve the difficulties in summarizing them. We present SICK, a framework that uses commonsense inferences as additional context. Compared to previous work that solely relies on the input dialogue, SICK uses an external knowledge model to generate a rich set of commonsense inferences and selects the most probable one with a similarity-based selection method. Built upon SICK, SICK++ utilizes commonsense as supervision, where the task of generating commonsense inferences is added upon summarizing the dialogue in a multi-task learning setting. Experimental results show that with injected commonsense knowledge, our framework generates more informative and consistent summaries than existing methods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: Accepted at COLING 2022

arXiv:2209.00278 [pdf, other]

doi 10.21437/Interspeech.2021-1270

Enhancing Semantic Understanding with Self-supervised Methods for Abstractive Dialogue Summarization

Authors: Hyunjae Lee, Jaewoong Yun, Hyun** Choi, Seongho Joe, Youngjune L. Gwon

Abstract: Contextualized word embeddings can lead to state-of-the-art performances in natural language understanding. Recently, a pre-trained deep contextualized text encoder such as BERT has shown its potential in improving natural language tasks including abstractive summarization. Existing approaches in dialogue summarization focus on incorporating a large language model into summarization task trained o… ▽ More Contextualized word embeddings can lead to state-of-the-art performances in natural language understanding. Recently, a pre-trained deep contextualized text encoder such as BERT has shown its potential in improving natural language tasks including abstractive summarization. Existing approaches in dialogue summarization focus on incorporating a large language model into summarization task trained on large-scale corpora consisting of news articles rather than dialogues of multiple speakers. In this paper, we introduce self-supervised methods to compensate shortcomings to train a dialogue summarization model. Our principle is to detect incoherent information flows using pretext dialogue text to enhance BERT's ability to contextualize the dialogue text representations. We build and fine-tune an abstractive dialogue summarization model on a shared encoder-decoder architecture using the enhanced BERT. We empirically evaluate our abstractive dialogue summarizer with the SAMSum corpus, a recently introduced dataset with abstractive dialogue summaries. All of our methods have contributed improvements to abstractive summary measured in ROUGE scores. Through an extensive ablation study, we also present a sensitivity analysis to critical model hyperparameters, probabilities of switching utterances and masking interlocutors. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 5 pages, 3 figures, INTERSPEECH 2021

Journal ref: Proc. Interspeech 2021, 796-800 (2021)

arXiv:2209.00158 [pdf, other]

Space-efficient data structure for next/previous larger/smaller value queries

Authors: Seungbum Jo, Geunho Kim

Abstract: Given an array of size $n$ from a total order, we consider the problem of constructing a data structure that supports various queries (range minimum/maximum queries with their variants and next/previous larger/smaller queries) efficiently. In the encoding model (i.e., the queries can be answered without the input array), we propose a $(3.701n + o(n))$-bit data structure, which supports all these q… ▽ More Given an array of size $n$ from a total order, we consider the problem of constructing a data structure that supports various queries (range minimum/maximum queries with their variants and next/previous larger/smaller queries) efficiently. In the encoding model (i.e., the queries can be answered without the input array), we propose a $(3.701n + o(n))$-bit data structure, which supports all these queries in $O(\log^{(\ell)}n)$ time, for any positive integer $\ell$ (here, $\log^{(1)} n = \log n$, and for $\ell > 1$, $\log^{(\ell)} n = \log ({\log^{(\ell-1)}} n)$). The space of our data structure matches the current best upper bound of Tsur (Inf. Process. Lett., 2019), which does not support the queries efficiently. Also, we show that at least $3.16n-Θ(\log n)$ bits are necessary for answering all the queries. Our result is obtained by generalizing Gawrychowski and Nicholson's $(3n - Θ(\log n))$-bit lower bound (ICALP 15) for answering range minimum and maximum queries on a permutation of size $n$. △ Less

Submitted 31 August, 2022; originally announced September 2022.

arXiv:2207.06000 [pdf, other]

Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS

Authors: Yookyung Shin, Younggun Lee, Suhee Jo, Yeongtae Hwang, Taesu Kim

Abstract: Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just… ▽ More Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just by ty** text description of desired emotional style. In this paper, we propose a text-based interface for emotional style control and cross-speaker style transfer in multi-speaker TTS. We propose the bi-modal style encoder which models the semantic relationship between text description embedding and speech style embedding with a pretrained language model. To further improve cross-speaker style transfer on disjoint, multi-style datasets, we propose the novel style loss. The experimental results show that our model can generate high-quality expressive speech even in unseen style. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Accepted to Interspeech 2022

arXiv:2207.00328 [pdf, other]

doi 10.1609/aaai.v37i2.25341

TopicFM: Robust and Interpretable Topic-Assisted Feature Matching

Authors: Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract: This study addresses an image-matching problem in challenging cases, such as large scene variations or textureless scenes. To gain robustness to such situations, most previous studies have attempted to encode the global contexts of a scene via graph neural networks or transformers. However, these contexts do not explicitly represent high-level contextual information, such as structural shapes or s… ▽ More This study addresses an image-matching problem in challenging cases, such as large scene variations or textureless scenes. To gain robustness to such situations, most previous studies have attempted to encode the global contexts of a scene via graph neural networks or transformers. However, these contexts do not explicitly represent high-level contextual information, such as structural shapes or semantic instances; therefore, the encoded features are still not sufficiently discriminative in challenging scenes. We propose a novel image-matching method that applies a topic-modeling strategy to encode high-level contexts in images. The proposed method trains latent semantic instances called topics. It explicitly models an image as a multinomial distribution of topics, and then performs probabilistic feature matching. This approach improves the robustness of matching by focusing on the same semantic areas between the images. In addition, the inferred topics provide interpretability for matching the results, making our method explainable. Extensive experiments on outdoor and indoor datasets show that our method outperforms other state-of-the-art methods, particularly in challenging cases. The code is available at https://github.com/TruongKhang/TopicFM. △ Less

Submitted 29 November, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted at AAAI-23. This version includes both main text and supplementary materials

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2204.06754 [pdf, other]

RecurSeed and EdgePredictMix: Pseudo-Label Refinement Learning for Weakly Supervised Semantic Segmentation across Single- and Multi-Stage Frameworks

Authors: Sanghyun Jo, In-Jae Yu, Kyungsu Kim

Abstract: Although weakly supervised semantic segmentation using only image-level labels (WSSS-IL) is potentially useful, its low performance and implementation complexity still limit its application. The main causes are (a) non-detection and (b) false-detection phenomena: (a) The class activation maps refined from existing WSSS-IL methods still only represent partial regions for large-scale objects, and (b… ▽ More Although weakly supervised semantic segmentation using only image-level labels (WSSS-IL) is potentially useful, its low performance and implementation complexity still limit its application. The main causes are (a) non-detection and (b) false-detection phenomena: (a) The class activation maps refined from existing WSSS-IL methods still only represent partial regions for large-scale objects, and (b) for small-scale objects, over-activation causes them to deviate from the object edges. We propose RecurSeed, which alternately reduces non- and false detections through recursive iterations, thereby implicitly finding an optimal junction that minimizes both errors. We also propose a novel data augmentation (DA) approach called EdgePredictMix, which further expresses an object's edge by utilizing the probability difference information between adjacent pixels in combining the segmentation results, thereby compensating for the shortcomings when applying the existing DA methods to WSSS. We achieved new state-of-the-art performances on both the PASCAL VOC 2012 and MS COCO 2014 benchmarks (VOC val: 74.4%, COCO val: 46.4%). The code is available at https://github.com/shjo-april/RecurSeed_and_EdgePredictMix. △ Less

Submitted 15 December, 2023; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2202.10967 [pdf, other]

Learning Cluster Patterns for Abstractive Summarization

Authors: Sung-Guk Jo, Jeong-Jae Kim, Byung-Won On

Abstract: Nowadays, pre-trained sequence-to-sequence models such as BERTSUM and BART have shown state-of-the-art results in abstractive summarization. In these models, during fine-tuning, the encoder transforms sentences to context vectors in the latent space and the decoder learns the summary generation task based on the context vectors. In our approach, we consider two clusters of salient and non-salient… ▽ More Nowadays, pre-trained sequence-to-sequence models such as BERTSUM and BART have shown state-of-the-art results in abstractive summarization. In these models, during fine-tuning, the encoder transforms sentences to context vectors in the latent space and the decoder learns the summary generation task based on the context vectors. In our approach, we consider two clusters of salient and non-salient context vectors, using which the decoder can attend more to salient context vectors for summary generation. For this, we propose a novel clustering transformer layer between the encoder and the decoder, which first generates two clusters of salient and non-salient vectors, and then normalizes and shrinks the clusters to make them apart in the latent space. Our experimental result shows that the proposed model outperforms the existing BART model by learning these distinct cluster patterns, improving up to 4% in ROUGE and 0.3% in BERTScore on average in CNN/DailyMail and XSUM data sets. △ Less

Submitted 22 February, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: 11 pages, 4 figures, 4 tables

arXiv:2202.07862 [pdf]

See further upon the giants: Quantifying intellectual lineage in science

Authors: Woo Seong Jo, Lu Liu, Dashun Wang

Abstract: Newton's centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across… ▽ More Newton's centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across disciplines, about 95% of papers stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper's future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. And papers that did not have a giant but later became a giant tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful new dimension in assessing scientific impact that goes beyond sheer citation counts. △ Less

Submitted 14 March, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

arXiv:2112.05999 [pdf, other]

Curvature-guided dynamic scale networks for Multi-view Stereo

Authors: Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract: Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most recent studies tried to improve the performance of matching cost volume in MVS by designing aggregated 3D cost volumes and their regularization. This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation in the other steps. In particular, we p… ▽ More Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most recent studies tried to improve the performance of matching cost volume in MVS by designing aggregated 3D cost volumes and their regularization. This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation in the other steps. In particular, we present a dynamic scale feature extraction network, namely, CDSFNet. It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface. As a result, CDFSNet can estimate the optimal patch scales to learn discriminative features for accurate matching computation between reference and source images. By combining the robust extracted features with an appropriate cost formulation strategy, our resulting MVS architecture can estimate depth maps more precisely. Extensive experiments showed that the proposed method outperforms other state-of-the-art methods on complex outdoor scenes. It significantly improves the completeness of reconstructed models. As a result, the method can process higher resolution inputs within faster run-time and lower memory than other MVS methods. Our source code is available at url{https://github.com/TruongKhang/cds-mvsnet}. △ Less

Submitted 9 March, 2022; v1 submitted 11 December, 2021; originally announced December 2021.

Comments: Accepted to ICLR 2022

arXiv:2111.11104 [pdf, other]

doi 10.1609/aaai.v37i11.26520

Hierarchical Text Classification As Sub-Hierarchy Sequence Generation

Authors: SangHun Im, Gibaeg Kim, Heung-Seon Oh, Seongung Jo, Donghwan Kim

Abstract: Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to im… ▽ More Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to implement when the model parameters increase for a large-scale hierarchy because the model structure depends on the hierarchy size. To solve this problem, we formulate HTC as a sub-hierarchy sequence generation to incorporate hierarchy information into a target label sequence instead of the model structure. Subsequently, we propose the Hierarchy DECoder (HiDEC), which decodes a text sequence into a sub-hierarchy sequence using recursive hierarchy decoding, classifying all parents at the same level into children at once. In addition, HiDEC is trained to use hierarchical path information from a root to each leaf in a sub-hierarchy composed of the labels of a target document via an attention mechanism and hierarchy-aware masking. HiDEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets, such as RCV1-v2, NYT, and EURLEX57K. △ Less

Submitted 6 November, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 9 pages, 5 figures, Published at AAAI23

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 12933-12941 (2023)

arXiv:2109.00911 [pdf, other]

BiHPF: Bilateral High-Pass Filters for Robust Deepfake Detection

Authors: Yonghyun Jeong, Doyeon Kim, Seungjai Min, Seongho Joe, Youngjune Gwon, Jongwon Choi

Abstract: The advancement in numerous generative models has a two-fold effect: a simple and easy generation of realistic synthesized images, but also an increased risk of malicious abuse of those images. Thus, it is important to develop a generalized detector for synthesized images of any GAN model or object category, including those unseen during the training phase. However, the conventional methods heavil… ▽ More The advancement in numerous generative models has a two-fold effect: a simple and easy generation of realistic synthesized images, but also an increased risk of malicious abuse of those images. Thus, it is important to develop a generalized detector for synthesized images of any GAN model or object category, including those unseen during the training phase. However, the conventional methods heavily depend on the training settings, which cause a dramatic decline in performance when tested with unknown domains. To resolve the issue and obtain a generalized detection ability, we propose Bilateral High-Pass Filters (BiHPF), which amplify the effect of the frequency-level artifacts that are known to be found in the synthesized images of generative models. Numerous experimental results validate that our method outperforms other state-of-the-art methods, even when tested with unseen domains. △ Less

Submitted 16 August, 2021; originally announced September 2021.

arXiv:2108.10776 [pdf, other]

Succinct Data Structures for Series-Parallel, Block-Cactus and 3-Leaf Power Graphs

Authors: Sankardeep Chakraborty, Seungbum Jo, Kunihiko Sadakane, Srinivasa Rao Satti

Abstract: We design succinct encodings of {\it series-parallel, block-cactus} and {\it 3-leaf power} graphs while supporting the basic navigational queries such as degree, adjacency and neighborhood {\it optimally} in the RAM model with logarithmic word size. One salient feature of our representation is that it can achieve optimal space even though the exact space lower bound for these graph classes is not… ▽ More We design succinct encodings of {\it series-parallel, block-cactus} and {\it 3-leaf power} graphs while supporting the basic navigational queries such as degree, adjacency and neighborhood {\it optimally} in the RAM model with logarithmic word size. One salient feature of our representation is that it can achieve optimal space even though the exact space lower bound for these graph classes is not known. For these graph classes, we provide succinct data structures with optimal query support for the first time in the literature. For series-parallel multigraphs, our work also extends the works of Uno et al. (Disc. Math. Alg. and Appl., 2013) and Blelloch and Farzan (CPM, 2010) to produce optimal bounds. △ Less

Submitted 26 August, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

arXiv:2101.11363 [pdf, other]

KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding

Authors: Hyunjae Lee, Jaewoong Yoon, Bonggyu Hwang, Seongho Joe, Seungjai Min, Youngjune Gwon

Abstract: A Lite BERT (ALBERT) has been introduced to scale up deep bidirectional representation learning for natural languages. Due to the lack of pretrained ALBERT models for Korean language, the best available practice is the multilingual model or resorting back to the any other BERT-based model. In this paper, we develop and pretrain KoreALBERT, a monolingual ALBERT model specifically for Korean languag… ▽ More A Lite BERT (ALBERT) has been introduced to scale up deep bidirectional representation learning for natural languages. Due to the lack of pretrained ALBERT models for Korean language, the best available practice is the multilingual model or resorting back to the any other BERT-based model. In this paper, we develop and pretrain KoreALBERT, a monolingual ALBERT model specifically for Korean language understanding. We introduce a new training objective, namely Word Order Prediction (WOP), and use alongside the existing MLM and SOP criteria to the same architecture and model parameters. Despite having significantly fewer model parameters (thus, quicker to train), our pretrained KoreALBERT outperforms its BERT counterpart on 6 different NLU tasks. Consistent with the empirical results in English by Lan et al., KoreALBERT seems to improve downstream task performance involving multi-sentence encoding for Korean language. The pretrained KoreALBERT is publicly available to encourage research and application development for Korean NLP. △ Less

Submitted 27 January, 2021; originally announced January 2021.

Comments: 7 pages, 1 figure, to be published in 25th International Conference on Pattern Recognition, ICPR 2020

arXiv:2101.11253 [pdf, other]

doi 10.1109/ICIP42928.2021.9506058

Puzzle-CAM: Improved localization via matching partial and full features

Authors: Sanghyun Jo, In-Jae Yu

Abstract: Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision. Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network. The main limitation of WSSS is that the process of generating pseudo-labels from CAMs that use an… ▽ More Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision. Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network. The main limitation of WSSS is that the process of generating pseudo-labels from CAMs that use an image classifier is mainly focused on the most discriminative parts of the objects. To address this issue, we propose Puzzle-CAM, a process that minimizes differences between the features from separate patches and the whole image. Our method consists of a puzzle module and two regularization terms to discover the most integrated region in an object. Puzzle-CAM can activate the overall region of an object using image-level supervision without requiring extra parameters. % In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 test dataset. In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 dataset. Code associated with our experiments is available at https://github.com/OFRIN/PuzzleCAM. △ Less

Submitted 23 September, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: Accepted to ICIP 2021

arXiv:2101.10649 [pdf, other]

Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks

Authors: Hyun** Choi, Judong Kim, Seongho Joe, Seungjai Min, Youngjune Gwon

Abstract: In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. A source of cross-lingual transfer can be as straightforward as lexical overlap between languages (e.g., use of the same scripts, shared subwords) that naturally forces text embeddings to occupy a similar representation space. Re… ▽ More In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. A source of cross-lingual transfer can be as straightforward as lexical overlap between languages (e.g., use of the same scripts, shared subwords) that naturally forces text embeddings to occupy a similar representation space. Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks as the most important factor for the transfer. In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining. Particularly, we take XLM-RoBERTa (XLMR) in our experiments that extend semantic textual similarity (STS), SQuAD and KorQuAD for machine reading comprehension, sentiment analysis, and alignment of sentence embeddings under various cross-lingual settings. Our results indicate that the presence of cross-lingual transfer is most pronounced in STS, sentiment analysis the next, and MRC the last. That is, the complexity of a downstream task softens the degree of crosslingual transfer. All of our results are empirically observed and measured, and we make our code and data publicly available. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 6 pages, 4 figures, to be published in 25th International Conference on Pattern Recognition, ICPR 2020

arXiv:2101.10642 [pdf, other]

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Authors: Hyun** Choi, Judong Kim, Seongho Joe, Youngjune Gwon

Abstract: Contextualized representations from a pre-trained language model are central to achieve a high performance on downstream NLP task. The pre-trained BERT and A Lite BERT (ALBERT) models can be fine-tuned to give state-ofthe-art results in sentence-pair regressions such as semantic textual similarity (STS) and natural language inference (NLI). Although BERT-based models yield the [CLS] token vector a… ▽ More Contextualized representations from a pre-trained language model are central to achieve a high performance on downstream NLP task. The pre-trained BERT and A Lite BERT (ALBERT) models can be fine-tuned to give state-ofthe-art results in sentence-pair regressions such as semantic textual similarity (STS) and natural language inference (NLI). Although BERT-based models yield the [CLS] token vector as a reasonable sentence embedding, the search for an optimal sentence embedding scheme remains an active research area in computational linguistics. This paper explores on sentence embedding models for BERT and ALBERT. In particular, we take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT). We also experiment with an outer CNN sentence-embedding network for SBERT and SALBERT. We evaluate performances of all sentence-embedding models considered using the STS and NLI datasets. The empirical results indicate that our CNN architecture improves ALBERT models substantially more than BERT models for STS benchmark. Despite significantly fewer model parameters, ALBERT sentence embedding is highly competitive to BERT in downstream NLP evaluations. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 6 pages, 2 figures, to be published in 25th International Conference on Pattern Recognition, ICPR2020

arXiv:2101.06480 [pdf, other]

SelfMatch: Combining Contrastive Self-Supervision and Consistency for Semi-Supervised Learning

Authors: Byoungjip Kim, **ho Choo, Yeong-Dae Kwon, Seongho Joe, Seungjai Min, Youngjune Gwon

Abstract: This paper introduces SelfMatch, a semi-supervised learning method that combines the power of contrastive self-supervised learning and consistency regularization. SelfMatch consists of two stages: (1) self-supervised pre-training based on contrastive learning and (2) semi-supervised fine-tuning based on augmentation consistency regularization. We empirically demonstrate that SelfMatch achieves the… ▽ More This paper introduces SelfMatch, a semi-supervised learning method that combines the power of contrastive self-supervised learning and consistency regularization. SelfMatch consists of two stages: (1) self-supervised pre-training based on contrastive learning and (2) semi-supervised fine-tuning based on augmentation consistency regularization. We empirically demonstrate that SelfMatch achieves the state-of-the-art results on standard benchmark datasets such as CIFAR-10 and SVHN. For example, for CIFAR-10 with 40 labeled examples, SelfMatch achieves 93.19% accuracy that outperforms the strong previous methods such as MixMatch (52.46%), UDA (70.95%), ReMixMatch (80.9%), and FixMatch (86.19%). We note that SelfMatch can close the gap between supervised learning (95.87%) and semi-supervised learning (93.19%) by using only a few labels for each class. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 4 pages, NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice

arXiv:2011.11855 [pdf]

A Robotic Dating Coaching System Leveraging Online Communities Posts

Authors: Sihyeon Jo, Donghwi Jung, Keonwoo Kim, Eun Gyo Joung, Giulia Nespoli, Seungryong Yoo, Minseob So, Seung-Woo Seo, Seong-Woo Kim

Abstract: Can a robot be a personal dating coach? Even with the increasing amount of conversational data on the internet, the implementation of conversational robots remains a challenge. In particular, a detailed and professional counseling log is expensive and not publicly accessible. In this paper, we develop a robot dating coaching system leveraging corpus from online communities. We examine people's per… ▽ More Can a robot be a personal dating coach? Even with the increasing amount of conversational data on the internet, the implementation of conversational robots remains a challenge. In particular, a detailed and professional counseling log is expensive and not publicly accessible. In this paper, we develop a robot dating coaching system leveraging corpus from online communities. We examine people's perceptions of the dating coaching robot with a dialogue module. 97 participants joined to have a conversation with the robot, and 30 of them evaluated the robot. The results indicate that participants thought the robot could become a dating coach while considering the robot is entertaining rather than helpful. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Showing 1–50 of 73 results for author: Joo, S