Search | arXiv e-print repository

Foundation Models for Electrocardiograms

Authors: Junho Song, Jong-Hwan Jang, Byeong Tak Lee, DongGyun Hong, Joon-myoung Kwon, Yong-Yeon Jo

Abstract: Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrasti… ▽ More Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrastive learning - on a vast dataset of over 1.1 million ECG samples. By customizing these methods to align with the intricate characteristics of ECG signals, our research has successfully developed foundation models that significantly elevate the precision and reliability of cardiac diagnostics. These models are adept at representing the complex, subtle nuances of ECG data, thus markedly enhancing diagnostic capabilities. The results underscore the substantial potential of SSL-enhanced foundation models in clinical settings and pave the way for extensive future investigations into their scalable applications across a broader spectrum of medical diagnostics. This work sets a benchmark in the ECG field, demonstrating the profound impact of tailored, data-driven model training on the efficacy and accuracy of medical diagnostics. △ Less

Submitted 25 June, 2024; originally announced July 2024.

Comments: 27 pages

arXiv:2406.13144 [pdf, other]

DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

Authors: Jiho Kim, Woosog Chay, Hyeonji Hwang, Daeun Kyung, Hyunseung Chung, Eunbyeol Cho, Yohan Jo, Edward Choi

Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge… ▽ More Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10996 [pdf, other]

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Authors: Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, **young Yeo

Abstract: Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argu… ▽ More Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argue that such memories can provide contextual cues that help dialogue systems understand the development of past events and, therefore, benefit response generation. We present Theanine, a framework that augments LLMs' response generation with memory timelines -- series of memories that demonstrate the development and causality of relevant past events. Along with Theanine, we introduce TeaFarm, a counterfactual-driven question-answering pipeline addressing the limitation of G-Eval in long-term conversations. Supplementary videos of our methods and the TeaBag dataset for TeaFarm evaluation are in https://theanine-693b0.web.app/. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Under Review

arXiv:2406.01020 [pdf, other]

CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment

Authors: Daekyu Kwon, Dongyoung Kim, Sehwan Ki, Younghyun Jo, Hyong-Euk Lee, Seon Joo Kim

Abstract: In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalabi… ▽ More In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalability. In this work, we propose a novel pretraining framework that constructs a generalizable representation for IQA by selectively extracting quality-related knowledge from VLM and leveraging the scalability of large datasets. Specifically, we carefully select optimal text prompts for five representative image quality attributes and use VLM to generate pseudo-labels. Numerous attribute-aware pseudo-labels can be generated with large image datasets, allowing our IQA model to learn rich representations about image quality. Our approach achieves state-of-the-art performance on multiple IQA datasets and exhibits remarkable generalization capabilities. Leveraging these strengths, we propose several applications, such as evaluating image generation models and training image enhancement models, demonstrating our model's real-world applicability. We will make the code available for access. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.14082 [pdf, other]

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Authors: Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

Abstract: Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To a… ▽ More Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 9 pages technical page followed by references and appendix

arXiv:2405.11162 [pdf, other]

LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs

Authors: Yongrae Jo, Seongyun Lee, Minju Seo, Sung Ju Hwang, Moontae Lee

Abstract: Text-to-SQL models are pivotal for making Electronic Health Records (EHRs) accessible to healthcare professionals without SQL knowledge. With the advancements in large language models, these systems have become more adept at translating complex questions into SQL queries. Nonetheless, the critical need for reliability in healthcare necessitates these models to accurately identify unanswerable ques… ▽ More Text-to-SQL models are pivotal for making Electronic Health Records (EHRs) accessible to healthcare professionals without SQL knowledge. With the advancements in large language models, these systems have become more adept at translating complex questions into SQL queries. Nonetheless, the critical need for reliability in healthcare necessitates these models to accurately identify unanswerable questions or uncertain predictions, preventing misinformation. To address this problem, we present a self-training strategy using pseudo-labeled unanswerable questions to enhance the reliability of text-to-SQL models for EHRs. This approach includes a two-stage training process followed by a filtering method based on the token entropy and query execution. Our methodology's effectiveness is validated by our top performance in the EHRSQL 2024 shared task, showcasing the potential to improve healthcare decision-making through more reliable text-to-SQL systems. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: NAACL 2024 Clinical NLP Workshop

arXiv:2405.09501 [pdf, ps, other]

The Eyring-Kramers Law for Extinction Time of Contact Process on Stars

Authors: Younghun Jo

Abstract: In this paper, we derive a precise estimate of the mean of the extinction time of the contact process with a fixed infection rate on a star graph with $N$ leaves. Specifically, we determine not only the exponential main factor but also the sharp sub-exponential prefactor of the asymptotic formula for the mean extinction time as $N\to\infty$. Previously, such detailed asymptotic information on the… ▽ More In this paper, we derive a precise estimate of the mean of the extinction time of the contact process with a fixed infection rate on a star graph with $N$ leaves. Specifically, we determine not only the exponential main factor but also the sharp sub-exponential prefactor of the asymptotic formula for the mean extinction time as $N\to\infty$. Previously, such detailed asymptotic information on the mean extinction time of the contact process was known only for complete graphs. To achieve these results, we first provide an accurate estimation of the quasi-stationary distribution on non-extinction of the contact process, utilizing special function theory and refined Laplace's method. Subsequently, we employ the recently developed potential theoretic approach to metastability of non-reversible Markov processes, enabling us to deduce these results. The integration of these methodologies represents a novel approach developed in this paper, which has not previously been used in the study of the contact process. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 32 pages, 2 figures

MSC Class: 60J28 (Primary) 60K35; 82C22 (Secondary)

arXiv:2404.09480 [pdf, other]

Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information

Authors: Kyubyung Chae, Jaepill Choi, Yohan Jo, Taesup Kim

Abstract: A primary challenge in abstractive summarization is hallucination -- the phenomenon where a model generates plausible text that is absent in the source text. We hypothesize that the domain (or topic) of the source text triggers the model to generate text that is highly probable in the domain, neglecting the details of the source text. To alleviate this model bias, we introduce a decoding strategy… ▽ More A primary challenge in abstractive summarization is hallucination -- the phenomenon where a model generates plausible text that is absent in the source text. We hypothesize that the domain (or topic) of the source text triggers the model to generate text that is highly probable in the domain, neglecting the details of the source text. To alleviate this model bias, we introduce a decoding strategy based on domain-conditional pointwise mutual information. This strategy adjusts the generation probability of each token by comparing it with the token's marginal probability within the domain of the source text. According to evaluation on the XSUM dataset, our method demonstrates improvement in terms of faithfulness and source relevance. The code is publicly available at \url{https://github.com/qqplot/dcpmi}. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted by Findings of NAACL 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00102 [pdf, other]

Deeper, Sharper, Faster: Application of Efficient Transformer to Galaxy Image Restoration

Authors: Hyosun Park, Yongsik Jo, Seokun Kang, Taehwan Kim, M. James Jee

Abstract: The Transformer architecture has revolutionized the field of deep learning over the past several years in diverse areas, including natural language processing, code generation, image recognition, time series forecasting, etc. We propose to apply Zamir et al.'s efficient transformer to perform deconvolution and denoising to enhance astronomical images. We conducted experiments using pairs of high-q… ▽ More The Transformer architecture has revolutionized the field of deep learning over the past several years in diverse areas, including natural language processing, code generation, image recognition, time series forecasting, etc. We propose to apply Zamir et al.'s efficient transformer to perform deconvolution and denoising to enhance astronomical images. We conducted experiments using pairs of high-quality images and their degraded versions, and our deep learning model demonstrates exceptional restoration of photometric, structural, and morphological information. When compared with the ground-truth JWST images, the enhanced versions of our HST-quality images reduce the scatter of isophotal photometry, Sersic index, and half-light radius by factors of 4.4, 3.6, and 4.7, respectively, with Pearson correlation coefficients approaching unity. The performance is observed to degrade when input images exhibit correlated noise, point-like sources, and artifacts. We anticipate that this deep learning model will prove valuable for a number of scientific applications, including precision photometry, morphological analysis, and shear calibration. △ Less

Submitted 29 May, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: 18 pages, 14 figures, 1 table, Resubmitted to ApJ after the first revision

arXiv:2403.08914 [pdf]

Robust Chemiresistive Behavior in Conductive Polymer/MOF Composites

Authors: Heejung Roh, Dong-Ha Kim, Yeongsu Cho, Young-Moo Jo, Jesús A. del Alamo, Heather J. Kulik, Mircea Dincă, Aristide Gumyusenge

Abstract: Metal-organic frameworks (MOFs) are promising materials for gas sensing but are often limited to single-use detection. We demonstrate a hybridization strategy synergistically deploying conductive MOFs (cMOFs) and conductive polymers (cPs) as two complementary mixed ionic-electronic conductors in high-performing stand-alone chemiresistors. Our work presents significant improvement in i) sensor reco… ▽ More Metal-organic frameworks (MOFs) are promising materials for gas sensing but are often limited to single-use detection. We demonstrate a hybridization strategy synergistically deploying conductive MOFs (cMOFs) and conductive polymers (cPs) as two complementary mixed ionic-electronic conductors in high-performing stand-alone chemiresistors. Our work presents significant improvement in i) sensor recovery kinetics, ii) cycling stability, and iii) dynamic range at room temperature. We demonstrate the effect of hybridization across well-studied cMOFs based on 2,3,6,7,10,11-hexahydroxytriphenylene (HHTP) and 2,3,6,7,10,11-hexaiminotripphenylene (HITP) ligands with varied metal nodes (Co, Cu, Ni). We conduct a comprehensive mechanistic study to relate energy band alignments at the heterojunctions between the MOFs and the polymer with sensing thermodynamics and binding kinetics. Our findings reveal that hole enrichment of the cMOF component upon hybridization leads to selective enhancement in desorption kinetics, enabling significantly improved sensor recovery at room temperature, and thus long-term response retention. This mechanism was further supported by density functional theory calculations on sorbate-analyte interactions. We also find that alloying cPs and cMOFs enables facile thin film co-processing and device integration, potentially unlocking the use of these hybrid conductors in diverse electronic applications. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.04787 [pdf, other]

Ever-Evolving Memory by Blending and Refining the Past

Authors: Seo Hyun Kim, Keummin Ka, Yohan Jo, Seung-won Hwang, Dongha Lee, **young Yeo

Abstract: For a human-like chatbot, constructing a long-term memory is crucial. However, current large language models often lack this capability, leading to instances of missing important user information or redundantly asking for the same information, thereby diminishing conversation quality. To effectively construct memory, it is crucial to seamlessly connect past and present information, while also poss… ▽ More For a human-like chatbot, constructing a long-term memory is crucial. However, current large language models often lack this capability, leading to instances of missing important user information or redundantly asking for the same information, thereby diminishing conversation quality. To effectively construct memory, it is crucial to seamlessly connect past and present information, while also possessing the ability to forget obstructive information. To address these challenges, we propose CREEM, a novel memory system for long-term conversation. Improving upon existing approaches that construct memory based solely on current sessions, CREEM blends past memories during memory formation. Additionally, we introduce a refining process to handle redundant or outdated information. Unlike traditional paradigms, we view responding and memory construction as inseparable tasks. The blending process, which creates new memories, also serves as a reasoning step for response generation by informing the connection between past and present. Through evaluation, we demonstrate that CREEM enhances both memory and response qualities in multi-session personalized dialogues. △ Less

Submitted 7 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 17 pages, 4 figures, 7 tables

arXiv:2402.11827 [pdf, other]

Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

Authors: Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang

Abstract: Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results.… ▽ More Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, significantly outperforming existing baselines, including GPT-3.5. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 8 pages

arXiv:2401.06400 [pdf, other]

Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

Authors: Taehee Kim, Yeongjae Cho, Heejun Shin, Yohan Jo, Dongmyung Shin

Abstract: Visual question answering (VQA) is a task where an image is given, and a series of questions are asked about the image. To build an efficient VQA algorithm, a large amount of QA data is required which is very expensive. Generating synthetic QA pairs based on templates is a practical way to obtain data. However, VQA models trained on those data do not perform well on complex, human-written question… ▽ More Visual question answering (VQA) is a task where an image is given, and a series of questions are asked about the image. To build an efficient VQA algorithm, a large amount of QA data is required which is very expensive. Generating synthetic QA pairs based on templates is a practical way to obtain data. However, VQA models trained on those data do not perform well on complex, human-written questions. To address this issue, we propose a new method called {\it chain of QA for human-written questions} (CoQAH). CoQAH utilizes a sequence of QA interactions between a large language model and a VQA model trained on synthetic data to reason and derive logical answers for human-written questions. We tested the effectiveness of CoQAH on two types of human-written VQA datasets for 3D-rendered and chest X-ray images and found that it achieved state-of-the-art accuracy in both types of data. Notably, CoQAH outperformed general vision-language models, VQA models, and medical foundation models with no finetuning. △ Less

Submitted 16 January, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2312.13822 [pdf, other]

Universal Noise Annotation: Unveiling the Impact of Noisy annotation on Object Detection

Authors: Kwangrok Ryoo, Yeonsik Jo, Seungjun Lee, Mira Kim, Ahra Jo, Seung Hwan Kim, Seungryong Kim, Soonyoung Lee

Abstract: For object detection task with noisy labels, it is important to consider not only categorization noise, as in image classification, but also localization noise, missing annotations, and bogus bounding boxes. However, previous studies have only addressed certain types of noise (e.g., localization or categorization). In this paper, we propose Universal-Noise Annotation (UNA), a more practical settin… ▽ More For object detection task with noisy labels, it is important to consider not only categorization noise, as in image classification, but also localization noise, missing annotations, and bogus bounding boxes. However, previous studies have only addressed certain types of noise (e.g., localization or categorization). In this paper, we propose Universal-Noise Annotation (UNA), a more practical setting that encompasses all types of noise that can occur in object detection, and analyze how UNA affects the performance of the detector. We analyzed the development direction of previous works of detection algorithms and examined the factors that impact the robustness of detection model learning method. We open-source the code for injecting UNA into the dataset and all the training log and weight are also shared. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: appendix and code : https://github.com/Ryoo72/UNA

arXiv:2312.12661 [pdf, other]

Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining

Authors: Bumsoo Kim, Yeonsik Jo, **hyung Kim, Seung Hwan Kim

Abstract: Contrastive Language-Image Pretraining has emerged as a prominent approach for training vision and text encoders with uncurated image-text pairs from the web. To enhance data-efficiency, recent efforts have introduced additional supervision terms that involve random-augmented views of the image. However, since the image augmentation process is unaware of its text counterpart, this procedure could… ▽ More Contrastive Language-Image Pretraining has emerged as a prominent approach for training vision and text encoders with uncurated image-text pairs from the web. To enhance data-efficiency, recent efforts have introduced additional supervision terms that involve random-augmented views of the image. However, since the image augmentation process is unaware of its text counterpart, this procedure could cause various degrees of image-text misalignments during training. Prior methods either disregarded this discrepancy or introduced external models to mitigate the impact of misalignments during training. In contrast, we propose a novel metric learning approach that capitalizes on these misalignments as an additional training source, which we term "Misalign, Contrast then Distill (MCD)". Unlike previous methods that treat augmented images and their text counterparts as simple positive pairs, MCD predicts the continuous scales of misalignment caused by the augmentation. Our extensive experimental results show that our proposed MCD achieves state-of-the-art transferability in multiple classification and retrieval downstream datasets. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: ICCV 2023

arXiv:2312.12659 [pdf, other]

Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders

Authors: Bumsoo Kim, **hyung Kim, Yeonsik Jo, Seung Hwan Kim

Abstract: Recent advances in vision language pretraining (VLP) have been largely attributed to the large-scale data collected from the web. However, uncurated dataset contains weakly correlated image-text pairs, causing data inefficiency. To address the issue, knowledge distillation have been explored at the expense of extra image and text momentum encoders to generate teaching signals for misaligned image-… ▽ More Recent advances in vision language pretraining (VLP) have been largely attributed to the large-scale data collected from the web. However, uncurated dataset contains weakly correlated image-text pairs, causing data inefficiency. To address the issue, knowledge distillation have been explored at the expense of extra image and text momentum encoders to generate teaching signals for misaligned image-text pairs. In this paper, our goal is to resolve the misalignment problem with an efficient distillation framework. To this end, we propose ECLIPSE: Expediting Contrastive Language-Image Pretraining with Self-distilled Encoders. ECLIPSE features a distinctive distillation architecture wherein a shared text encoder is utilized between an online image encoder and a momentum image encoder. This strategic design choice enables the distillation to operate within a unified projected space of text embedding, resulting in better performance. Based on the unified text embedding space, ECLIPSE compensates for the additional computational cost of the momentum image encoder by expediting the online image encoder. Through our extensive experiments, we validate that there is a sweet spot between expedition and distillation where the partial view from the expedited online image encoder interacts complementarily with the momentum teacher. As a result, ECLIPSE outperforms its counterparts while achieving substantial acceleration in inference speed. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.03465 [pdf, other]

True image construction in quantum-secured single-pixel imaging under spoofing attack

Authors: Jaesung Heo, Taek Jeong, Nam Hun Park, Yonggi Jo

Abstract: In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but al… ▽ More In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but also facilitates the reconstruction of a true image. Our method involves the analysis of a specific mode correlation of a photon-pair, which is independent of the mode used for image construction, to check security. Through this analysis, we can identify both the targeted image region by the attack and the type of spoofing attack, enabling reconstruction of the true image. A proof-of-principle demonstration employing polarization-correlation of a photon-pair is provided, showcasing successful image reconstruction even under the condition of spoofing signals 2000 times stronger than the true signals. We expect our approach to be applied to quantum-secured signal processing such as quantum target detection or ranging. △ Less

Submitted 4 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 10 pages, 6 figures

arXiv:2312.03251 [pdf]

Electrically controlled interlayer trion fluid in electron-hole bilayers

Authors: Ruishi Qi, Qize Li, Zuocheng Zhang, Sudi Chen, **gxu Xie, Yunbo Ou, Zhiyuan Cui, David D. Dai, Andrew Y. Joe, Takashi Taniguchi, Kenji Watanabe, Sefaattin Tongay, Alex Zettl, Liang Fu, Feng Wang

Abstract: The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the… ▽ More The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the strong coupling regime of electron-hole bilayers, electrons and holes in separate layers can spontaneously form three-particle trion bound states that resemble positronium ions in high energy physics. The interlayer trions can assume 1e-2h and 2e-1h configurations, where electrons and holes are confined in different transition metal dichalcogenide layers. We show that the two correlated holes in 1e-2h trions form a spin-singlet state with a spin gap of ~1meV. By electrostatic gating, the equilibrium state of our system can be continuously tuned into an exciton fluid, a trion fluid, an exciton-trion mixture, a trion-charge mixture or an electron-hole plasma. Upon optical excitation, the system can host novel high-order multiparticle charge complexes including interlayer four-particle complex (tetrons) and five-particle complex (pentons). Our work demonstrates a unique platform to study novel correlated phases of tunable Bose-Fermi mixtures and opens up new opportunities to realize artificial ions/molecules in electronic devices. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.03206 [pdf]

Seamless monolithic three-dimensional integration of single-crystalline films by growth

Authors: Ki Seok Kim, Seunghwan Seo, Junyoung Kwon, Doyoon Lee, Changhyun Kim, Jung-El Ryu, Jekyung Kim, Min-Kyu Song, Jun Min Suh, Hang-Gyo Jung, Youhwan Jo, Hogeun Ahn, Sangho Lee, Kyeongjae Cho, Jongwook Jeon, Minsu Seol, **-Hong Park, Sang Won Kim, Jeehwan Kim

Abstract: The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystal… ▽ More The demand for the three-dimensional (3D) integration of electronic components is on a steady rise. The through-silicon-via (TSV) technique emerges as the only viable method for integrating single-crystalline device components in a 3D format, despite encountering significant processing challenges. While monolithic 3D (M3D) integration schemes show promise, the seamless connection of single-crystalline semiconductors without intervening wafers has yet to be demonstrated. This challenge arises from the inherent difficulty of growing single crystals on amorphous or polycrystalline surfaces post the back-end-of-the-line process at low temperatures to preserve the underlying circuitry. Consequently, a practical growth-based solution for M3D of single crystals remains elusive. Here, we present a method for growing single-crystalline channel materials, specifically composed of transition metal dichalcogenides, on amorphous and polycrystalline surfaces at temperatures lower than 400 °C. Building on this developed technique, we demonstrate the seamless monolithic integration of vertical single-crystalline logic transistor arrays. This accomplishment leads to the development of unprecedented vertical CMOS arrays, thereby constructing vertical inverters. Ultimately, this achievement sets the stage to pave the way for M3D integration of various electronic and optoelectronic hardware in the form of single crystals. △ Less

Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.12941 [pdf]

Controlled Interlayer Exciton Ionization in an Electrostatic Trap in Atomically Thin Heterostructures

Authors: Andrew Y. Joe, Andrés M. Mier Valdivia, Luis A. Jauregui, Kateryna Pistunova, Dapeng Ding, You Zhou, Giovanni Scuri, Kristiaan De Greve, Andrey Sushko, Bumho Kim, Takashi Taniguchi, Kenji Watanabe, James C. Hone, Mikhail D. Lukin, Hongkun Park, Philip Kim

Abstract: Atomically thin semiconductor heterostructures provide a two-dimensional (2D) device platform for creating high densities of cold, controllable excitons. Interlayer excitons (IEs), bound electrons and holes localized to separate 2D quantum well layers, have permanent out-of-plane dipole moments and long lifetimes, allowing their spatial distribution to be tuned on demand. Here, we employ electrost… ▽ More Atomically thin semiconductor heterostructures provide a two-dimensional (2D) device platform for creating high densities of cold, controllable excitons. Interlayer excitons (IEs), bound electrons and holes localized to separate 2D quantum well layers, have permanent out-of-plane dipole moments and long lifetimes, allowing their spatial distribution to be tuned on demand. Here, we employ electrostatic gates to trap IEs and control their density. By electrically modulating the IE Stark shift, electron-hole pair concentrations above $2\times10^{12}$ cm$^{-2}$ can be achieved. At this high IE density, we observe an exponentially increasing linewidth broadening indicative of an IE ionization transition, independent of the trap depth. This runaway threshold remains constant at low temperatures, but increases above 20 K, consistent with the quantum dissociation of a degenerate IE gas. Our demonstration of the IE ionization in a tunable electrostatic trap represents an important step towards the realization of dipolar exciton condensates in solid-state optoelectronic devices. △ Less

Submitted 11 June, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 14 pages, 4 main figures, 1 extended data figure

arXiv:2311.07362 [pdf, other]

Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision

Authors: Seongyun Lee, Sue Hyun Park, Yongrae Jo, Minjoon Seo

Abstract: Large multimodal models suffer from multimodal hallucination, where they provide incorrect responses misaligned with the given visual information. Recent works have conjectured that one of the reasons behind multimodal hallucination is due to the vision encoder failing to ground on the image properly. To mitigate this issue, we propose a novel approach that leverages self-feedback as visual cues.… ▽ More Large multimodal models suffer from multimodal hallucination, where they provide incorrect responses misaligned with the given visual information. Recent works have conjectured that one of the reasons behind multimodal hallucination is due to the vision encoder failing to ground on the image properly. To mitigate this issue, we propose a novel approach that leverages self-feedback as visual cues. Building on this approach, we introduce Volcano, a multimodal self-feedback guided revision model. Volcano generates natural language feedback to its initial response based on the provided visual information and utilizes this feedback to self-revise its initial response. Volcano effectively reduces multimodal hallucination and achieves state-of-the-art on MMHal-Bench, POPE, and GAVIE. It also improves on general multimodal abilities and outperforms previous models on MM-Vet and MMBench. Through qualitative analysis, we show that Volcano's feedback is properly grounded on the image than the initial response. This indicates that Volcano can provide itself with richer visual information through feedback generation, leading to self-correct hallucinations. We publicly release our model, data, and code at https://github.com/kaistAI/Volcano}{github.com/kaistAI/Volcano △ Less

Submitted 2 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.20479 [pdf, other]

Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users

Authors: Yohan Jo, Xinyan Zhao, Arijit Biswas, Nikoletta Basiou, Vincent Auvray, Nikolaos Malandrakis, Angeliki Metallinou, Alexandros Potamianos

Abstract: While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each u… ▽ More While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance from MultiWOZ 2.2 was replaced with a small chat between two users that is semantically and pragmatically consistent with the original user utterance, thus resulting in the same dialogue state and system response. These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we propose the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrate that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue systems that are trained for single-user dialogues. Further, this method surpasses training a medium-sized model directly on multi-user dialogues and generalizes to unseen domains. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: To Appear in EMNLP-Findings 2023

arXiv:2310.17857 [pdf, other]

From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models

Authors: Dongjun Kang, Joonsuk Park, Yohan Jo, **Yeong Bak

Abstract: Being able to predict people's opinions on issues and behaviors in realistic scenarios can be helpful in various domains, such as politics and marketing. However, conducting large-scale surveys like the European Social Survey to solicit people's opinions on individual issues can incur prohibitive costs. Leveraging prior research showing influence of core human values on individual decisions and ac… ▽ More Being able to predict people's opinions on issues and behaviors in realistic scenarios can be helpful in various domains, such as politics and marketing. However, conducting large-scale surveys like the European Social Survey to solicit people's opinions on individual issues can incur prohibitive costs. Leveraging prior research showing influence of core human values on individual decisions and actions, we propose to use value-injected large language models (LLM) to predict opinions and behaviors. To this end, we present Value Injection Method (VIM), a collection of two methods -- argument generation and question answering -- designed to inject targeted value distributions into LLMs via fine-tuning. We then conduct a series of experiments on four tasks to test the effectiveness of VIM and the possibility of using value-injected LLMs to predict opinions and behaviors of people. We find that LLMs value-injected with variations of VIM substantially outperform the baselines. Also, the results suggest that opinions and behaviors can be better predicted using value-injected LLMs than the baseline approaches. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 main paper accepted

arXiv:2310.14259 [pdf]

doi 10.1186/s40580-023-00360-y

Investigation of the mechanism of the anomalous Hall effects in Cr2Te3/(BiSb)2(TeSe)3 heterostructure

Authors: Seong Won Cho, In Hak Lee, Youngwoong Lee, Sangheon Kim, Yeong Gwang Khim, Seung-Young Park, Younghun Jo, Junwoo Choi, Seungwu Han, Young Jun Chang, Suyoun Lee

Abstract: The interplay between ferromagnetism and the non-trivial topology has unveiled intriguing phases in the transport of charges and spins. For example, it is consistently observed the so-called topological Hall effect (THE) featuring a hump structure in the curve of the Hall resistance (Rxy) vs. a magnetic field (H) of a heterostructure consisting of a ferromagnet (FM) and a topological insulator (TI… ▽ More The interplay between ferromagnetism and the non-trivial topology has unveiled intriguing phases in the transport of charges and spins. For example, it is consistently observed the so-called topological Hall effect (THE) featuring a hump structure in the curve of the Hall resistance (Rxy) vs. a magnetic field (H) of a heterostructure consisting of a ferromagnet (FM) and a topological insulator (TI). The origin of the hump structure is still controversial between the topological Hall effect model and the multi-component anomalous Hall effect (AHE) model. In this work, we have investigated a heterostructure consisting of BixSb2-xTeySe3-y (BSTS) and Cr2Te3 (CT), which are well-known TI and two-dimensional FM, respectively. By using the so-called minor-loop measurement, we have found that the hump structure observed in the CT/BSTS is more likely to originate from two AHE channels. Moreover, by analyzing the scaling behavior of each amplitude of two AHE with the longitudinal resistivities of CT and BSTS, we have found that one AHE is attributed to the extrinsic contribution of CT while the other is due to the intrinsic contribution of BSTS. It implies that the proximity-induced ferromagnetic layer inside BSTS serves as a source of the intrinsic AHE, resulting in the hump structure explained by the two AHE model. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Journal ref: Nano Convergence (2023) 10:11

arXiv:2310.14202 [pdf]

doi 10.1021/acsnano.2c04301

Controlling spin-orbit coupling to tailor type-II Dirac bands

Authors: Nguyen Huu Lam, Phuong Lien Nguyen, Byoung Ki Choi, Trinh Thi Ly, Ganbat Duvjir, Tae Gyu Rhee, Yong ** Jo, Tae Heon Kim, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Younghun Hwang, Young Jun Chang, Jaekwang Lee, Jungdae Kim

Abstract: NiTe2, a type-II Dirac semimetal with strongly tilted Dirac band, has been explored extensively to understand its intriguing topological properties. Here, using density-functional theory (DFT) calculations, we report that the strength of spin-orbit coupling (SOC) in NiTe2 can be tuned by Se substitution. This results in negative shifts of the bulk Dirac point (BDP) while preserving the type-II Dir… ▽ More NiTe2, a type-II Dirac semimetal with strongly tilted Dirac band, has been explored extensively to understand its intriguing topological properties. Here, using density-functional theory (DFT) calculations, we report that the strength of spin-orbit coupling (SOC) in NiTe2 can be tuned by Se substitution. This results in negative shifts of the bulk Dirac point (BDP) while preserving the type-II Dirac band. Indeed, combined studies using scanning tunneling spectroscopy (STS) and angle-resolved photoemission spectroscopy (ARPES) confirm that the BDP in the NiTe2-xSex alloy moves from +0.1 eV (NiTe2) to -0.3 eV (NiTeSe) depending on the Se concentrations, indicating the effective tunability of type-II Dirac fermions. Our results demonstrate an approach to tailor the type-II Dirac band in NiTe2 by controlling the SOC strength via chalcogen substitution. This approach can be applicable to different types of topological materials. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 25 pages, 4 figures

Journal ref: ACS Nano 16, 11227 (2022)

arXiv:2310.11220 [pdf, other]

KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models

Authors: Jiho Kim, Yeonsu Kwon, Yohan Jo, Edward Choi

Abstract: While large language models (LLMs) have made considerable advancements in understanding and generating unstructured text, their application in structured data remains underexplored. Particularly, using LLMs for complex reasoning tasks on knowledge graphs (KGs) remains largely untouched. To address this, we propose KG-GPT, a multi-purpose framework leveraging LLMs for tasks employing KGs. KG-GPT co… ▽ More While large language models (LLMs) have made considerable advancements in understanding and generating unstructured text, their application in structured data remains underexplored. Particularly, using LLMs for complex reasoning tasks on knowledge graphs (KGs) remains largely untouched. To address this, we propose KG-GPT, a multi-purpose framework leveraging LLMs for tasks employing KGs. KG-GPT comprises three steps: Sentence Segmentation, Graph Retrieval, and Inference, each aimed at partitioning sentences, retrieving relevant graph components, and deriving logical conclusions, respectively. We evaluate KG-GPT using KG-based fact verification and KGQA benchmarks, with the model showing competitive and robust performance, even outperforming several fully-supervised models. Our work, therefore, marks a significant step in unifying structured and unstructured data processing within the realm of LLMs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2310.04624 [pdf, other]

Transport Study of Charge Carrier Scattering in Monolayer WSe$_2$

Authors: Andrew Y. Joe, Kateryna Pistunova, Kristen Kaasbjerg, Ke Wang, Bumho Kim, Daniel A. Rhodes, Takashi Taniguchi, Kenji Watanabe, James Hone, Tony Low, Luis A. Jauregui, Philip Kim

Abstract: Employing flux-grown single crystal WSe$_2$, we report charge carrier scattering behaviors measured in $h$-BN encapsulated monolayer field effect transistors. We perform quantum transport measurements across various hole densities and temperatures and observe a non-monotonic change of transport mobility $μ$ as a function of hole density in the degenerately doped sample. This unusual behavior can b… ▽ More Employing flux-grown single crystal WSe$_2$, we report charge carrier scattering behaviors measured in $h$-BN encapsulated monolayer field effect transistors. We perform quantum transport measurements across various hole densities and temperatures and observe a non-monotonic change of transport mobility $μ$ as a function of hole density in the degenerately doped sample. This unusual behavior can be explained by energy dependent scattering amplitude of strong defects calculated using the T-matrix approximation. Utilizing long mean-free path ($>$500 nm), we demonstrate the high quality of our electronic devices by showing quantized conductance steps from an electrostatically-defined quantum point contact. Our results show the potential for creating ultra-high quality quantum optoelectronic devices based on atomically thin semiconductors. △ Less

Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 6 pages, 4 figures

arXiv:2309.15357 [pdf]

Perfect Coulomb drag and exciton transport in an excitonic insulator

Authors: Ruishi Qi, Andrew Y. Joe, Zuocheng Zhang, **gxu Xie, Qixin Feng, Zheyu Lu, Ziyu Wang, Takashi Taniguchi, Kenji Watanabe, Sefaattin Tongay, Feng Wang

Abstract: Strongly coupled two-dimensional electron-hole bilayers can give rise to novel quantum Bosonic states: electrons and holes in electrically isolated layers can pair into interlayer excitons, which can form a Bose-Einstein condensate below a critical temperature at zero magnetic field. This state is predicted to feature perfect Coulomb drag, where a current in one layer must be accompanied by an equ… ▽ More Strongly coupled two-dimensional electron-hole bilayers can give rise to novel quantum Bosonic states: electrons and holes in electrically isolated layers can pair into interlayer excitons, which can form a Bose-Einstein condensate below a critical temperature at zero magnetic field. This state is predicted to feature perfect Coulomb drag, where a current in one layer must be accompanied by an equal but opposite current in the other, and counterflow superconductivity, where the excitons form a superfluid with zero viscosity. Electron-hole bilayers in the strong coupling limit with an excitonic insulator ground state have been recently achieved in semiconducting transition metal dichalcogenide heterostructures, but direct electrical transport measurements remain challenging. Here we use a novel optical spectroscopy to probe the electrical transport of correlated electron-hole fluids in MoSe2/hBN/WSe2 heterostructures. We observe perfect Coulomb drag in the excitonic insulator phase up to a temperature as high as ~15K. Strongly correlated electron and hole transport is also observed at unbalanced electron and hole densities, although the Coulomb drag is not perfect anymore. Meanwhile, the counterflow resistance of interlayer excitons remains finite. These results indicate the formation of an exciton gas in the excitonic insulator which does not condensate into a superfluid at low temperature. Our work also demonstrates that dynamic optical spectroscopy provides a powerful tool for probing novel exciton transport behavior and possible exciton superfluidity in correlated quantum electron-hole fluids. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.06006 [pdf, ps, other]

SoccerNet 2023 Challenges Results

Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.00237 [pdf, other]

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Authors: Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seung** Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

Abstract: The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train… ▽ More The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius) △ Less

Submitted 13 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: ACL 2024 (Findings)

arXiv:2308.12492 [pdf, other]

Optimizing Neural Network Scale for ECG Classification

Authors: Byeong Tak Lee, Yong-Yeon Jo, Joon-Myoung Kwon

Abstract: We study scaling convolutional neural networks (CNNs), specifically targeting Residual neural networks (ResNet), for analyzing electrocardiograms (ECGs). Although ECG signals are time-series data, CNN-based models have been shown to outperform other neural networks with different architectures in ECG analysis. However, most previous studies in ECG analysis have overlooked the importance of network… ▽ More We study scaling convolutional neural networks (CNNs), specifically targeting Residual neural networks (ResNet), for analyzing electrocardiograms (ECGs). Although ECG signals are time-series data, CNN-based models have been shown to outperform other neural networks with different architectures in ECG analysis. However, most previous studies in ECG analysis have overlooked the importance of network scaling optimization, which significantly improves performance. We explored and demonstrated an efficient approach to scale ResNet by examining the effects of crucial parameters, including layer depth, the number of channels, and the convolution kernel size. Through extensive experiments, we found that a shallower network, a larger number of channels, and smaller kernel sizes result in better performance for ECG classifications. The optimal network scale might differ depending on the target task, but our findings provide insight into obtaining more efficient and accurate models with fewer computing resources or less time. In practice, we demonstrate that a narrower search space based on our findings leads to higher performance. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 30pages

arXiv:2308.11272 [pdf, other]

FoX: Formation-aware exploration in multi-agent reinforcement learning

Authors: Yonghyeon Jo, Sunwoo Lee, Junghyuk Yeom, Seungyul Han

Abstract: Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability is… ▽ More Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks. △ Less

Submitted 13 January, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 8 pages main, 5 pages appendix with reference. 10 figures, accepeted by AAAI 2024

MSC Class: Machine Learning (ML) - ML: Reinforcement Learning; Secondary Subject Areas: Multiagent Systems (MAS) - MAS: Multiagent Learning

arXiv:2307.10928 [pdf, other]

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Authors: Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo

Abstract: Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instruct… ▽ More Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instructions that require instance-wise skill composition. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment Skill Sets), a fine-grained evaluation protocol for both human-based and model-based evaluation which decomposes coarse-level scoring to a skill set-level scoring for each instruction. We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance and increasing the reliability of the evaluation. Using FLASK, we compare multiple open-source and proprietary LLMs and observe a high correlation between model-based and human-based evaluations. We publicly release the evaluation data and code implementation at https://github.com/kaistAI/FLASK. △ Less

Submitted 14 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: ICLR 2024 Spotlight

arXiv:2307.02682 [pdf, other]

Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment

Authors: Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo

Abstract: Dense video captioning, a task of localizing meaningful moments and generating relevant captions for videos, often requires a large, expensive corpus of annotated video segments paired with text. In an effort to minimize the annotation cost, we propose ZeroTA, a novel method for dense video captioning in a zero-shot manner. Our method does not require any videos or annotations for training; instea… ▽ More Dense video captioning, a task of localizing meaningful moments and generating relevant captions for videos, often requires a large, expensive corpus of annotated video segments paired with text. In an effort to minimize the annotation cost, we propose ZeroTA, a novel method for dense video captioning in a zero-shot manner. Our method does not require any videos or annotations for training; instead, it localizes and describes events within each input video at test time by optimizing solely on the input. This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model. This joint optimization aligns a frozen language generation model (i.e., GPT-2) with a frozen vision-language contrastive model (i.e., CLIP) by maximizing the matching score between the generated text and a moment within the video. We also introduce a pairwise temporal IoU loss to let a set of soft moment masks capture multiple distinct events within the video. Our method effectively discovers diverse significant events within the video, with the resulting captions appropriately describing these events. The empirical results demonstrate that ZeroTA surpasses zero-shot baselines and even outperforms the state-of-the-art few-shot method on the widely-used benchmark ActivityNet Captions. Moreover, our method shows greater robustness compared to supervised methods when evaluated in out-of-domain scenarios. This research provides insight into the potential of aligning widely-used models, such as language generation models and vision-language models, to unlock a new capability: understanding temporal aspects of videos. △ Less

Submitted 11 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.02085 [pdf, ps, other]

Finite period vectors and Gauss sums

Authors: Yeongseong Jo

Abstract: We study four sums including the Jacquet--Piatetski-Shapiro--Shalika, Flicker, Bump--Friedberg, and Jacquet--Shalika sums associated to irreducible cuspidal representations of general linear groups over finite fields. By computing explicitly, we relate Asai and Bump--Friedberg gamma factors over finite fields to those over nonarchimedean local fields through level zero supercuspidal representation… ▽ More We study four sums including the Jacquet--Piatetski-Shapiro--Shalika, Flicker, Bump--Friedberg, and Jacquet--Shalika sums associated to irreducible cuspidal representations of general linear groups over finite fields. By computing explicitly, we relate Asai and Bump--Friedberg gamma factors over finite fields to those over nonarchimedean local fields through level zero supercuspidal representation. Via Deligne--Kazhdan close field theory, we prove that exterior square and Bump--Friedberg gamma factors agree with corresponding Artin gamma factors of their associated tamely ramified representations through local Langlands correspondence. We also deduce product formulae for Asai, Bump--Friedberg, and exterior square gamma factors in terms of Gauss sums. By combining these results, we examine Jacquet--Piatetski-Shapiro--Shalika, Flicker--Rallis, Jacquet--Shalika, and Friedberg--Jacquet periods and vectors and their connections to Rankin-Selberg, Asai, exterior square, and Bump-Friedberg gamma factors, respectively. △ Less

Submitted 7 April, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Incorporate all referee's comments and in a response to referee's suggestion, the second half of arXiv:2209.12378 is included in this paper

arXiv:2306.13265 [pdf]

Thermodynamic behavior of correlated electron-hole fluids in van der Waals heterostructures

Authors: Ruishi Qi, Andrew Y. Joe, Zuocheng Zhang, Yongxin Zeng, Tiancheng Zheng, Qixin Feng, Emma Regan, **gxu Xie, Zheyu Lu, Takashi Taniguchi, Kenji Watanabe, Sefaattin Tongay, Michael F. Crommie, Allan H. MacDonald, Feng Wang

Abstract: Coupled two-dimensional electron-hole bilayers provide a unique platform to study strongly correlated Bose-Fermi mixtures in condensed matter. Electrons and holes in spatially separated layers can bind to form interlayer excitons, composite Bosons expected to support high-temperature exciton superfluids. The interlayer excitons can also interact strongly with excess charge carriers when electron a… ▽ More Coupled two-dimensional electron-hole bilayers provide a unique platform to study strongly correlated Bose-Fermi mixtures in condensed matter. Electrons and holes in spatially separated layers can bind to form interlayer excitons, composite Bosons expected to support high-temperature exciton superfluids. The interlayer excitons can also interact strongly with excess charge carriers when electron and hole densities are unequal. Here, we use optical spectroscopy to quantitatively probe the local thermodynamic properties of strongly correlated electron-hole fluids in MoSe2/hBN/WSe2 heterostructures. We observe a discontinuity in the electron and hole chemical potentials at matched electron and hole densities, a definitive signature of an excitonic insulator ground state. The excitonic insulator is stable up to a Mott density of ~$0.8\times {10}^{12} \mathrm{cm}^{-2}$ and has a thermal ionization temperature of ~70 K. The density dependence of the electron, hole, and exciton chemical potentials reveals strong correlation effects across the phase diagram. Compared with a non-interacting uniform charge distribution, the correlation effects lead to significant attractive exciton-exciton and exciton-charge interactions in the electron-hole fluid. Our work highlights the unique quantum behavior that can emerge in strongly correlated electron-hole systems. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2305.17900 [pdf, ps, other]

Continuous dependence of the Cauchy problem for the inhomogeneous biharmonic NLS equation in Sobolev spaces

Authors: **Myong An, YuIl Jo, **Myong Kim

Abstract: In this paper, we study the continuous dependence of the Cauchy problem for the inhomogeneous biharmonic nonlinear Schrödinger (IBNLS) equation \[iu_{t} +Δ^{2} u=λ|x|^{-b}|u|^σu,~u(0)=u_{0} \in H^{s} (\mathbb R^{d}),\] in the standard sense in $H^s$, i.e. in the sense that the local solution flow is continuous $H^s\to H^s$. Here $d\in \mathbb N$, $s>0$, $λ\in \mathbb R$ and $σ>0$. To arrive at thi… ▽ More In this paper, we study the continuous dependence of the Cauchy problem for the inhomogeneous biharmonic nonlinear Schrödinger (IBNLS) equation \[iu_{t} +Δ^{2} u=λ|x|^{-b}|u|^σu,~u(0)=u_{0} \in H^{s} (\mathbb R^{d}),\] in the standard sense in $H^s$, i.e. in the sense that the local solution flow is continuous $H^s\to H^s$. Here $d\in \mathbb N$, $s>0$, $λ\in \mathbb R$ and $σ>0$. To arrive at this goal, we first obtain the estimates of the term $f(u)-f(v)$ in the fractional Sobolev spaces which generalize the similar results of An-Kim [5](2021) and Dinh [16](2018), where $f(u)$ is a nonlinear function that behaves like $λ|u|^σu$ with $λ\in \mathbb R$. These estimates are then applied to obtain the standard continuous dependence result for IBNLS equation with $0<s <\min \{2+\frac{d}{2},\frac{3}{2}d\}$, $0<b<\min\{4,d,\frac{3}{2}d-s,\frac{d}{2}+2-s\}$ and $0<σ< σ_{c}(s)$, where $σ_{c}(s)=\frac{8-2b}{d-2s}$ if $s<\frac{d}{2}$, and $σ_{c}(s)=\infty$ if $s\ge \frac{d}{2}$. Our continuous dependence result generalizes that of Liu-Zhang [27](2021) by extending the validity of $s$ and $b$. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 25pages. arXiv admin note: text overlap with arXiv:2206.06690

MSC Class: Primary 35Q55; Secondary 35B30

arXiv:2305.07288 [pdf, other]

Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table

Authors: Sunjun Kweon, Yeonsu Kwon, Seonhee Cho, Yohan Jo, Edward Choi

Abstract: Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable,… ▽ More Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable, the first ODQA dataset that requires complex reasoning over tables. Open-WikiTable is built upon WikiSQL and WikiTableQuestions to be applicable in the open-domain setting. As each question is coupled with both textual answers and SQL queries, Open-WikiTable opens up a wide range of possibilities for future research, as both reader and parser methods can be applied. The dataset and code are publicly available. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: ACL 2023 (Findings)

arXiv:2305.06590 [pdf, other]

FactKG: Fact Verification via Reasoning on Knowledge Graphs

Authors: Jiho Kim, Sung** Park, Yeonsu Kwon, Yohan Jo, James Thorne, Edward Choi

Abstract: In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how conce… ▽ More In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how concepts are linked together, allowing machines to reason over chains of topics. However, there are many challenges in understanding how these machine-readable concepts map to information in text. To enable the community to better use KGs, we introduce a new dataset, FactKG: Fact Verification via Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation. Furthermore, FactKG contains various linguistic patterns, including colloquial style claims as well as written style claims to increase practicality. Lastly, we develop a baseline approach and analyze FactKG over these reasoning types. We believe FactKG can advance both reliability and practicality in KG-based fact verification. △ Less

Submitted 18 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2304.02096 [pdf, other]

The CAMELS project: Expanding the galaxy formation model space with new ASTRID and 28-parameter TNG and SIMBA suites

Authors: Yueying Ni, Shy Genel, Daniel Anglés-Alcázar, Francisco Villaescusa-Navarro, Yongseok Jo, Simeon Bird, Tiziana Di Matteo, Rupert Croft, Nianyi Chen, Natalí S. M. de Santi, Matthew Gebhardt, Helen Shao, Shivam Pandey, Lars Hernquist, Romeel Dave

Abstract: We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies.… ▽ More We present CAMELS-ASTRID, the third suite of hydrodynamical simulations in the Cosmology and Astrophysics with MachinE Learning (CAMELS) project, along with new simulation sets that extend the model parameter space based on the previous frameworks of CAMELS-TNG and CAMELS-SIMBA, to provide broader training sets and testing grounds for machine-learning algorithms designed for cosmological studies. CAMELS-ASTRID employs the galaxy formation model following the ASTRID simulation and contains 2,124 hydrodynamic simulation runs that vary 3 cosmological parameters ($Ω_m$, $σ_8$, $Ω_b$) and 4 parameters controlling stellar and AGN feedback. Compared to the existing TNG and SIMBA simulation suites in CAMELS, the fiducial model of ASTRID features the mildest AGN feedback and predicts the least baryonic effect on the matter power spectrum. The training set of ASTRID covers a broader variation in the galaxy populations and the baryonic impact on the matter power spectrum compared to its TNG and SIMBA counterparts, which can make machine-learning models trained on the ASTRID suite exhibit better extrapolation performance when tested on other hydrodynamic simulation sets. We also introduce extension simulation sets in CAMELS that widely explore 28 parameters in the TNG and SIMBA models, demonstrating the enormity of the overall galaxy formation model parameter space and the complex non-linear interplay between cosmology and astrophysical processes. With the new simulation suites, we show that building robust machine-learning models favors training and testing on the largest possible diversity of galaxy formation models. We also demonstrate that it is possible to train accurate neural networks to infer cosmological parameters using the high-dimensional TNG-SB28 simulation set. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2302.14260 [pdf, other]

A Closer Look at the Intervention Procedure of Concept Bottleneck Models

Authors: Sungbin Shin, Yohan Jo, Sungsoo Ahn, Namhoon Lee

Abstract: Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target response of a given input based on its high-level concepts. Unlike the standard end-to-end models, CBMs enable domain experts to intervene on the predicted concepts and rectify any mistakes at test time, so that more accurate task predictions can be made at the end. While such intervenabilit… ▽ More Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target response of a given input based on its high-level concepts. Unlike the standard end-to-end models, CBMs enable domain experts to intervene on the predicted concepts and rectify any mistakes at test time, so that more accurate task predictions can be made at the end. While such intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain rather unexplored. In this work, we develop various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances. Specifically, we find that an informed intervention strategy can reduce the task error more than ten times compared to the current baseline under the same amount of intervention counts in realistic settings, and yet, this can vary quite significantly when taking into account different intervention granularity. We verify our findings through comprehensive evaluations, not only on the standard real datasets, but also on synthetic datasets that we generate based on a set of different causal graphs. We further discover some major pitfalls of the current practices which, without a proper addressing, raise concerns on reliability and fairness of the intervention procedure. △ Less

Submitted 2 July, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: ICML 2023

arXiv:2302.07498 [pdf, other]

doi 10.1103/PhysRevResearch.5.033010

Gaussian Quantum Illumination via Monotone Metrics

Authors: Dong Hwan Kim, Yonggi Jo, Duk Y. Kim, Taek Jeong, Jihwan Kim, Nam Hun Park, Zaeill Kim, Su-Yong Lee

Abstract: Quantum illumination is to discern the presence or absence of a low reflectivity target, where the error probability decays exponentially in the number of copies used. When the target reflectivity is small so that it is hard to distinguish target presence or absence, the exponential decay constant falls into a class of objects called monotone metrics. We evaluate monotone metrics restricted to Gau… ▽ More Quantum illumination is to discern the presence or absence of a low reflectivity target, where the error probability decays exponentially in the number of copies used. When the target reflectivity is small so that it is hard to distinguish target presence or absence, the exponential decay constant falls into a class of objects called monotone metrics. We evaluate monotone metrics restricted to Gaussian states in terms of first-order moments and covariance matrix. Under the assumption of a low reflectivity target, we explicitly derive analytic formulae for decay constant of an arbitrary Gaussian input state. Especially, in the limit of large background noise and low reflectivity, there is no need of symplectic diagonalization which usually complicates the computation of decay constants. First, we show that two-mode squeezed vacuum (TMSV) states are the optimal probe among pure Gaussian states with fixed signal mean photon number. Second, as an alternative to preparing TMSV states with high mean photon number, we show that preparing a TMSV state with low mean photon number and displacing the signal mode is a more experimentally feasible setup without degrading the performance that much. Third, we show that it is of utmost importance to prepare an efficient idler memory to beat coherent states and provide analytic bounds on the idler memory transmittivity in terms of signal power, background noise, and idler memory noise. Finally, we identify the region of physically possible correlations between the signal and idler modes that can beat coherent states. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 16 pages, 6 figures

Journal ref: Phys. Rev. Research 5, 033010 (2023)

arXiv:2301.09047 [pdf]

doi 10.1038/s41567-022-01930-3

Observation of Kondo condensation in a degenerately doped silicon metal

Authors: H. Im, D. U. Lee, Y. Jo, J. Kim, Y. Chong, W. Song, H. Kim, E. K. Kim, S. -J. Sin, S. Moon, J. R. Prance, Yu. A. Pashkin, J. S. Tsai

Abstract: When a magnetic moment is embedded in a metal, it captures itinerant electrons to form the Kondo cloud1,2, which can spread out over a few micrometres3,4. For a metal with dense magnetic impurities such that Kondo clouds overlap with each other, correlated ground states are formed. When the impurities form a regular lattice, the result is a heavy fermion or anti-ferromagnetic order depending on th… ▽ More When a magnetic moment is embedded in a metal, it captures itinerant electrons to form the Kondo cloud1,2, which can spread out over a few micrometres3,4. For a metal with dense magnetic impurities such that Kondo clouds overlap with each other, correlated ground states are formed. When the impurities form a regular lattice, the result is a heavy fermion or anti-ferromagnetic order depending on the dominant interaction5,6. Even in the case of random impurities, overlap** Kondo clouds are expected to form a coherent ground state. Here, we examine this issue by performing electrical transport and high-precision tunnelling density-of-states (DOS) spectroscopy measurements in a highly P-doped crystalline silicon metal where disorder-induced localized magnetic moments exist7. We detect the Kondo effect in the resistivity of the Si metal below 2 K and an exotic pseudogap in the DOS with gap edge peaks at a Fermi energy below 100 mK. The DOS gap and peaks are tuned by applying an external magnetic field and transformed into a metallic Altshuler-Aronov gap8 in the paramagnetic disordered Fermi liquid (DFL) phase. We interpret this phenomenon as the Kondo condensation, the formation of a correlated ground state of overlap** Kondo clouds, and its transition to a DFL. The boundary between the Kondo condensation and DFL phases is identified by analysing distinct DOS spectra in the magnetic field-temperature plane. A detailed theoretical analysis using a holographic method 9 , 10 , 11 reproduces the unusual DOS spectra, 1, supporting our scenario. Our work demonstrates the observation of the magnetic version of Bardeen-Cooper-Shrieffer (BCS) pair condensation and will be useful for understanding complex Kondo systems. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: 34 pages,5+6 figures, accepted in nature physics

arXiv:2211.16461 [pdf, other]

doi 10.3847/1538-4357/aca8fe

Calibrating cosmological simulations with implicit likelihood inference using galaxy growth observables

Authors: Yongseok Jo, Shy Genel, Benjamin Wandelt, Rachel Somerville, Francisco Villaescusa-Navarro, Greg L. Bryan, Daniel Angles-Alcazar, Daniel Foreman-Mackey, Dylan Nelson, Ji-hoon Kim

Abstract: In a novel approach employing implicit likelihood inference (ILI), also known as likelihood-free inference, we calibrate the parameters of cosmological hydrodynamic simulations against observations, which has previously been unfeasible due to the high computational cost of these simulations. For computational efficiency, we train neural networks as emulators on ~1000 cosmological simulations from… ▽ More In a novel approach employing implicit likelihood inference (ILI), also known as likelihood-free inference, we calibrate the parameters of cosmological hydrodynamic simulations against observations, which has previously been unfeasible due to the high computational cost of these simulations. For computational efficiency, we train neural networks as emulators on ~1000 cosmological simulations from the CAMELS project to estimate simulated observables, taking as input the cosmological and astrophysical parameters, and use these emulators as surrogates to the cosmological simulations. Using the cosmic star formation rate density (SFRD) and, separately, stellar mass functions (SMFs) at different redshifts, we perform ILI on selected cosmological and astrophysical parameters (Omega_m, sigma_8, stellar wind feedback, and kinetic black hole feedback) and obtain full 6-dimensional posterior distributions. In the performance test, the ILI from the emulated SFRD (SMFs) can recover the target observables with a relative error of 0.17% (0.4%). We find that degeneracies exist between the parameters inferred from the emulated SFRD, confirmed with new full cosmological simulations. We also find that the SMFs can break the degeneracy in the SFRD, which indicates that the SMFs provide complementary constraints for the parameters. Further, we find that the parameter combination inferred from an observationally-inferred SFRD reproduces the target observed SFRD very well, whereas, in the case of the SMFs, the inferred and observed SMFs show significant discrepancies that indicate potential limitations of the current galaxy formation modeling and calibration framework, and/or systematic differences and inconsistencies between observations of the stellar mass function. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: This is the revised version from the reviewer's report (submitted to ApJ)

arXiv:2211.02033 [pdf]

doi 10.1002/aelm.202101051

Optically Induced Picosecond Lattice Compression in the Dielectric Component of a Strongly Coupled Ferroelectric/Dielectric Superlattice

Authors: Deepankar Sri Gyan, Hyeon Jun Lee, Youngjun Ahn, Jerome Carnis, Tae Yeon Kim, Sanjith Unithrattil, Jun Young Lee, Sae Hwan Chun, Sunam Kim, Intae Eom, Minseok Kim, Sang-Youn Park, Kyung Sook Kim, Ho Nyung Lee, Ji Young Jo, Paul G. Evans

Abstract: Above-bandgap femtosecond optical excitation of a ferroelectric/dielectric BaTiO3/CaTiO3 superlattice leads to structural responses that are a consequence of the screening of the strong electrostatic coupling between the component layers. Time-resolved x-ray free-electron laser diffraction shows that the structural response to optical excitation includes a net lattice expansion of the superlattice… ▽ More Above-bandgap femtosecond optical excitation of a ferroelectric/dielectric BaTiO3/CaTiO3 superlattice leads to structural responses that are a consequence of the screening of the strong electrostatic coupling between the component layers. Time-resolved x-ray free-electron laser diffraction shows that the structural response to optical excitation includes a net lattice expansion of the superlattice consistent with depolarization-field screening driven by the photoexcited charge carriers. The depolarization-field-screening-driven expansion is separate from a photoacoustic pulse launched from the bottom electrode on which the superlattice was epitaxially grown. The distribution of diffracted intensity of superlattice x-ray reflections indicates that the depolarization-field-screening-induced strain includes a photoinduced expansion in the ferroelectric BaTiO3 and a contraction in CaTiO3. The magnitude of expansion in BaTiO3 layers is larger than the contraction in CaTiO3. The difference in the magnitude of depolarization-field-screening-driven strain in the BaTiO3 and CaTiO3 components can arise from the contribution of the oxygen octahedral rotation patterns at the BaTiO3/CaTiO3 interfaces to the polarization of CaTiO3. The depolarization-field-screening-driven polarization reduction in the CaTiO3 layers points to a new direction for the manipulation of polarization in the component layers of a strongly coupled ferroelectric/dielectric superlattice. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Journal ref: Adv. Electron. Mater. 8, 2101051 (2022)

arXiv:2210.07462 [pdf]

Exploiting volumetric wave correlation for enhanced depth imaging in scattering medium

Authors: Ye-Ryoung Lee, Dong-Young Kim, Yonghyeon Jo, Moonseok Kim, Wonshik Choi

Abstract: Imaging an object embedded within a scattering medium requires the correction of complex sample-induced wave distortions. Existing approaches have been designed to resolve them by optimizing signal waves recorded in each 2D image. Here, we present a volumetric image reconstruction framework that merges two fundamental degrees of freedom, the wavelength and propagation angles of light waves, based… ▽ More Imaging an object embedded within a scattering medium requires the correction of complex sample-induced wave distortions. Existing approaches have been designed to resolve them by optimizing signal waves recorded in each 2D image. Here, we present a volumetric image reconstruction framework that merges two fundamental degrees of freedom, the wavelength and propagation angles of light waves, based on the object momentum conservation principle. On this basis, we propose methods for exploiting the correlation of signal waves from volumetric images to better cope with multiple scattering. By constructing experimental systems scanning both wavelength and illumination angle of the light source, we demonstrated a 32-fold increase in the use of signal waves compared with that of existing 2D-based approaches and achieved ultrahigh volumetric resolution (lateral resolution: 0.41 um, axial resolution: 0.60 um) even within complex scattering medium owing to the optimal coherent use of the extremely broad spectral bandwidth (225 nm). △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.03029 [pdf, other]

Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt

Authors: Seonghyeon Ye, Joel Jang, Doyoung Kim, Yongrae Jo, Minjoon Seo

Abstract: Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through pro… ▽ More Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through prompt tuning, store the samples of the training instances mapped with the prompt embeddings, and retrieve the corresponding prompt embedding of the training instance closest to the query instance during inference. While only adding 0.007% additional parameters, retrieval of soft prompt enhances the performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39% points. Also, we report an interesting finding that retrieving source embeddings trained on similar answer choice formats is more important than those on similar task types. △ Less

Submitted 16 October, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: EMNLP 2023 Findings

arXiv:2210.01471 [pdf, other]

doi 10.1364/OE.505405

Bound for Gaussian-state Quantum illumination using direct photon measurement

Authors: Su-Yong Lee, Dong Hwan Kim, Yonggi Jo, Taek Jeong, Duk Y. Kim, Zaeill Kim

Abstract: It is important to find feasible measurement bounds for quantum information protocols. We present analytic bounds for quantum illumination with Gaussian states when using an on-off detection or a photon number resolving (PNR) detection, where its performance is evaluated with signal-to-noise ratio. First, for coincidence counting measurement, the best performance is given by the two-mode squeezed… ▽ More It is important to find feasible measurement bounds for quantum information protocols. We present analytic bounds for quantum illumination with Gaussian states when using an on-off detection or a photon number resolving (PNR) detection, where its performance is evaluated with signal-to-noise ratio. First, for coincidence counting measurement, the best performance is given by the two-mode squeezed vacuum (TMSV) state which outperforms the coherent state and the classically correlated thermal (CCT) state. However, the coherent state can beat the TMSV state with increasing signal mean photon number in the case of the on-off detection. Second, the performance is enhanced by taking Fisher information approach of all counting probabilities including non-detection events. In the Fisher information approach, the TMSV state still presents the best performance but the CCT state can beat the TMSV state with increasing signal mean photon number in the case of the on-off detection. Furthermore, we show that it is useful to take the PNR detection on the signal mode and the on-off detection on the idler mode, which reaches similar performance of using PNR detections on both modes. △ Less

Submitted 2 November, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 7+1 pages, 3 figures, close to the published version

Journal ref: Opt. Express 31, 38977-38988 (2023)

arXiv:2209.14521 [pdf]

doi 10.1021/acs.nanolett.2c04030

Charge transfer dynamics in MoSe$_{2}$/hBN/WSe$_{2}$ heterostructures

Authors: Yoseob Yoon, Zuocheng Zhang, Ruishi Qi, Andrew Y. Joe, Renee Sailus, Kenji Watanabe, Takashi Taniguchi, Sefaattin Tongay, Feng Wang

Abstract: Ultrafast charge transfer processes provide a facile way to create interlayer excitons in directly contacted transition metal dichalcogenide (TMD) layers. More sophisticated heterostructures composed of TMD/hBN/TMD enable new ways to control interlayer exciton properties and achieve novel exciton phenomena, such as exciton insulators and condensates, where longer lifetimes are desired. In this wor… ▽ More Ultrafast charge transfer processes provide a facile way to create interlayer excitons in directly contacted transition metal dichalcogenide (TMD) layers. More sophisticated heterostructures composed of TMD/hBN/TMD enable new ways to control interlayer exciton properties and achieve novel exciton phenomena, such as exciton insulators and condensates, where longer lifetimes are desired. In this work, we experimentally study the charge transfer dynamics in a heterostructure composed of a 1 nm thick hBN spacer between MoSe$_{2}$ and WSe$_{2}$ monolayers. We observe the hole transfer from MoSe$_{2}$ to WSe$_{2}$ through the hBN barrier with a time constant of 500 ps, which is over 3 orders of magnitude slower than that between TMD layers without a spacer. Furthermore, we observe strong competition between the interlayer charge transfer and intralayer exciton-exciton annihilation processes at high excitation densities. Our work opens possibilities to understand charge transfer pathways in TMD/hBN/TMD heterostructures for the efficient generation and control of interlayer excitons. △ Less

Submitted 21 December, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

Journal ref: Nano Lett. 22, 10140 (2022)

Showing 1–50 of 251 results for author: Joo, Y