Search | arXiv e-print repository

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Authors: Seungwook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang

Abstract: Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavi… ▽ More Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavities. In this work, we propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 25 pages, 22 figures, accepted to CVPR 2024

arXiv:2404.10346 [pdf, other]

Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards

Authors: Hyeonbin Hwang, Doyoung Kim, Seungone Kim, Seonghyeon Ye, Minjoon Seo

Abstract: Training on large amounts of rationales (i.e., CoT Fine-tuning) is effective at improving the reasoning capabilities of large language models (LLMs). However, acquiring human-authored rationales or augmenting rationales from proprietary models is costly and not scalable. In this paper, we study the problem of whether LLMs could self-improve their reasoning capabilities. To this end, we propose Sel… ▽ More Training on large amounts of rationales (i.e., CoT Fine-tuning) is effective at improving the reasoning capabilities of large language models (LLMs). However, acquiring human-authored rationales or augmenting rationales from proprietary models is costly and not scalable. In this paper, we study the problem of whether LLMs could self-improve their reasoning capabilities. To this end, we propose Self-Explore, where the LLM is tasked to explore the first wrong step (i.e., the first pit) within the rationale and use such signals as fine-grained rewards for further improvement. On the GSM8K and MATH test set, Self-Explore achieves 11.57% and 2.89% improvement on average across three LLMs compared to supervised fine-tuning (SFT). Our code is available at https://github.com/hbin0701/Self-Explore. △ Less

Submitted 16 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Preprint Under Review

arXiv:2404.09920 [pdf, other]

Combined Pre-Supernova Alert System with Kamland and Super-Kamiokande

Authors: KamLAND, Super-Kamiokande Collaborations, :, Seisho Abe, Minori Eizuka, Sawako Futagi, Azusa Gando, Yoshihito Gando, Shun Goto, Takahiko Hachiya, Kazumi Hata, Koichi Ichimura, Sei Ieki, Haruo Ikeda, Kunio Inoue, Koji Ishidoshiro, Yuto Kamei, Nanami Kawada, Yasuhiro Kishimoto, Masayuki Koga, Maho Kurasawa, Tadao Mitsui, Haruhiko Miyake, Daisuke Morita, Takeshi Nakahata , et al. (290 additional authors not shown)

Abstract: Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are ob… ▽ More Preceding a core-collapse supernova, various processes produce an increasing amount of neutrinos of all flavors characterized by mounting energies from the interior of massive stars. Among them, the electron antineutrinos are potentially detectable by terrestrial neutrino experiments such as KamLAND and Super-Kamiokande via inverse beta decay interactions. Once these pre-supernova neutrinos are observed, an early warning of the upcoming core-collapse supernova can be provided. In light of this, KamLAND and Super-Kamiokande, both located in the Kamioka mine in Japan, have been monitoring pre-supernova neutrinos since 2015 and 2021, respectively. Recently, we performed a joint study between KamLAND and Super-Kamiokande on pre-supernova neutrino detection. A pre-supernova alert system combining the KamLAND detector and the Super-Kamiokande detector was developed and put into operation, which can provide a supernova alert to the astrophysics community. Fully leveraging the complementary properties of these two detectors, the combined alert is expected to resolve a pre-supernova neutrino signal from a 15 M$_{\odot}$ star within 510 pc of the Earth, at a significance level corresponding to a false alarm rate of no more than 1 per century. For a Betelgeuse-like model with optimistic parameters, it can provide early warnings up to 12 hours in advance. △ Less

Submitted 1 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Resubmitted to ApJ. 22 pages, 16 figures, for more information about the combined pre-supernova alert system, see https://www.lowbg.org/presnalarm/

arXiv:2404.09161 [pdf, other]

Coreset Selection for Object Detection

Authors: Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, Nojun Kwak

Abstract: Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new… ▽ More Coreset selection is a method for selecting a small, representative subset of an entire dataset. It has been primarily researched in image classification, assuming there is only one object per image. However, coreset selection for object detection is more challenging as an image can contain multiple objects. As a result, much research has yet to be done on this topic. Therefore, we introduce a new approach, Coreset Selection for Object Detection (CSOD). CSOD generates imagewise and classwise representative feature vectors for multiple objects of the same class within each image. Subsequently, we adopt submodular optimization for considering both representativeness and diversity and utilize the representative vectors in the submodular optimization process to select a subset. When we evaluated CSOD on the Pascal VOC dataset, CSOD outperformed random selection by +6.4%p in AP$_{50}$ when selecting 200 images. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024: 1st Workshop on Dataset Distillation for Computer Vision

arXiv:2404.08725 [pdf, other]

Development of a data overflow protection system for Super-Kamiokande to maximize data from nearby supernovae

Authors: M. Mori, K. Abe, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Okamoto, K. Sato, H. Sekiya, H. Shiba, K. Shimizu , et al. (230 additional authors not shown)

Abstract: Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem,… ▽ More Neutrinos from very nearby supernovae, such as Betelgeuse, are expected to generate more than ten million events over 10\,s in Super-Kamokande (SK). At such large event rates, the buffers of the SK analog-to-digital conversion board (QBEE) will overflow, causing random loss of data that is critical for understanding the dynamics of the supernova explosion mechanism. In order to solve this problem, two new DAQ modules were developed to aid in the observation of very nearby supernovae. The first of these, the SN module, is designed to save only the number of hit PMTs during a supernova burst and the second, the Veto module, prescales the high rate neutrino events to prevent the QBEE from overflowing based on information from the SN module. In the event of a very nearby supernova, these modules allow SK to reconstruct the time evolution of the neutrino event rate from beginning to end using both QBEE and SN module data. This paper presents the development and testing of these modules together with an analysis of supernova-like data generated with a flashing laser diode. We demonstrate that the Veto module successfully prevents DAQ overflows for Betelgeuse-like supernovae as well as the long-term stability of the new modules. During normal running the Veto module is found to issue DAQ vetos a few times per month resulting in a total dead time less than 1\,ms, and does not influence ordinary operations. Additionally, using simulation data we find that supernovae closer than 800~pc will trigger Veto module resulting in a prescaling of the observed neutrino data. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 28 pages, 18 figures. Submitted to PTEP

arXiv:2404.08672 [pdf, other]

Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

Authors: Hwiyeol Jo, Taiwoo Park, Nayoung Choi, Changbong Kim, Ohjoon Kwon, Donghyeon Jeon, Hyunwoo Lee, Eui-Hyeon Lee, Kyoungho Shin, Sun Suk Lim, Kyungmi Kim, Jihye Lee, Sun Kim

Abstract: Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in develo** and operating generative AI models within a national-scale search engine, with a specific focus on… ▽ More Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in develo** and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.08460 [pdf, other]

Superbunched radiation of a tunnel junction due to charge quantization

Authors: Steven Kim, Fabian Hassler

Abstract: A chaotic light source is characterized by the fact that many independent emitters radiate photons with a random optical phase. This is similar compared to a tunnel junction where many independent channels are able to emit photons due to a coupling to an electromagnetic environment. However, in a recent experiment it has been observed that a tunnel junction can deviate from the expectation of chao… ▽ More A chaotic light source is characterized by the fact that many independent emitters radiate photons with a random optical phase. This is similar compared to a tunnel junction where many independent channels are able to emit photons due to a coupling to an electromagnetic environment. However, in a recent experiment it has been observed that a tunnel junction can deviate from the expectation of chaotic light and is able to emit strongly correlated, superbunched photons. Motivated by this, we study the correlation of the radiation and show that the superbunching originates from the emission of multiple photons which is possible due to the quantization of charge. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 5 pages, 2 figures, 3 pages supplemental material

arXiv:2404.07947 [pdf, other]

ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference

Authors: Hyungjun Oh, Kihong Kim, Jaemin Kim, Sungkyun Kim, Junyeol Lee, Du-seong Chang, Jiwon Seo

Abstract: This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference. ExeGPT finds and runs with an optimal execution schedule to maximize inference throughput while satisfying a given latency constraint. By leveraging the distribution of input and output sequences, it effectively allocates resources and determines optimal execution configurations, including batch sizes and… ▽ More This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference. ExeGPT finds and runs with an optimal execution schedule to maximize inference throughput while satisfying a given latency constraint. By leveraging the distribution of input and output sequences, it effectively allocates resources and determines optimal execution configurations, including batch sizes and partial tensor parallelism. We also introduce two scheduling strategies based on Round-Robin Allocation and Workload-Aware Allocation policies, suitable for different NLP workloads. We evaluate ExeGPT on six LLM instances of T5, OPT, and GPT-3 and five NLP tasks, each with four distinct latency constraints. Compared to FasterTransformer, ExeGPT achieves up to 15.2x improvements in throughput and 6x improvements in latency. Overall, ExeGPT achieves an average throughput gain of 2.9x across twenty evaluation scenarios. Moreover, when adapting to changing sequence distributions, the cost of adjusting the schedule in ExeGPT is reasonably modest. ExeGPT proves to be an effective solution for optimizing and executing LLM inference for diverse NLP workload and serving conditions. △ Less

Submitted 15 March, 2024; originally announced April 2024.

Comments: Accepted to ASPLOS 2024 (summer cycle)

Journal ref: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS 24 summer cycle), Volume 2, Nov 15, 2023 (Notification Date)

arXiv:2404.07622 [pdf, other]

Multi-Image Visual Question Answering for Unsupervised Anomaly Detection

Authors: Jun Li, Cosmin I. Bercea, Philip Müller, Lina Felsner, Suhwan Kim, Daniel Rueckert, Benedikt Wiestler, Julia A. Schnabel

Abstract: Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images. However, the clinical interpretation of resultant anomaly maps presents a challenge due to a lack of detailed, understandable explanations. Recent advancements in language models… ▽ More Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images. However, the clinical interpretation of resultant anomaly maps presents a challenge due to a lack of detailed, understandable explanations. Recent advancements in language models have shown the capability of mimicking human-like understanding and providing detailed descriptions. This raises an interesting question: \textit{How can language models be employed to make the anomaly maps more explainable?} To the best of our knowledge, we are the first to leverage a language model for unsupervised anomaly detection, for which we construct a dataset with different questions and answers. Additionally, we present a novel multi-image visual question answering framework tailored for anomaly detection, incorporating diverse feature fusion strategies to enhance visual knowledge extraction. Our experiments reveal that the framework, augmented by our new Knowledge Q-Former module, adeptly answers questions on the anomaly detection dataset. Besides, integrating anomaly maps as inputs distinctly aids in improving the detection of unseen pathologies. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 13 pages, 8 figures

arXiv:2404.07610 [pdf, other]

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Authors: Minkuk Kim, Hyeon Bae Kim, **young Moon, **woo Choi, Seong Tae Kim

Abstract: There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is chall… ▽ More There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is challenging due to the lack of semantic content. In this study, we address this by proposing a novel framework inspired by the cognitive information processing of humans. Our model utilizes external memory to incorporate prior knowledge. The memory retrieval method is proposed with cross-modal video-to-text matching. To effectively incorporate retrieved text features, the versatile encoder and the decoder with visual and textual cross-attention modules are designed. Comparative experiments have been conducted to show the effectiveness of the proposed method on ActivityNet Captions and YouCook2 datasets. Experimental results show promising performance of our model without extensive pretraining from a large video dataset. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.07481 [pdf, other]

Adiabatic State Preparation in a Quantum Ising Spin Chain

Authors: Sooshin Kim, Alexander Lukin, Matthew Rispoli, M. Eric Tai, Adam M. Kaufman, Perrin Segura, Yanfei Li, Joyce Kwan, Julian Léonard, Brice Bakkali-Hassani, Markus Greiner

Abstract: We report on adiabatic state preparation in the one-dimensional quantum Ising model using ultracold bosons in a tilted optical lattice. We prepare many-body ground states of controllable system sizes and observe enhanced fluctuations around the transition between paramagnetic and antiferromagnetic states, marking the precursor of quantum critical behavior. Furthermore, we find evidence for superpo… ▽ More We report on adiabatic state preparation in the one-dimensional quantum Ising model using ultracold bosons in a tilted optical lattice. We prepare many-body ground states of controllable system sizes and observe enhanced fluctuations around the transition between paramagnetic and antiferromagnetic states, marking the precursor of quantum critical behavior. Furthermore, we find evidence for superpositions of domain walls and study their effect on the many-body ground state by measuring the populations of each spin configuration across the transition. These results shed new light on the effect of boundary conditions in finite-size quantum systems. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 5+5 pages, 4+8 figures

arXiv:2404.07263 [pdf, other]

The planted directed polymer: inferring a random walk from noisy images

Authors: Sun Woo P. Kim, Austen Lamacraft

Abstract: We introduce and study the planted directed polymer, in which the path of a random walker is inferred from noisy 'images' accumulated at each timestep. Formulated as a nonlinear problem of Bayesian inference for a hidden Markov model, this problem is a generalization of the directed polymer problem of statistical physics, coinciding with it in the limit of zero signal to noise. For a 1D walker we… ▽ More We introduce and study the planted directed polymer, in which the path of a random walker is inferred from noisy 'images' accumulated at each timestep. Formulated as a nonlinear problem of Bayesian inference for a hidden Markov model, this problem is a generalization of the directed polymer problem of statistical physics, coinciding with it in the limit of zero signal to noise. For a 1D walker we present numerical investigations and analytical arguments that no phase transition is present. When formulated on a Cayley tree, methods developed for the directed polymer are used to show that there is a transition with decreasing signal to noise where effective inference becomes impossible, meaning that the average fractional overlap between the inferred and true paths falls from one to zero. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 11 pages, 12 figures

arXiv:2404.06902 [pdf, other]

Spatiotemporal Analysis of Shared Situation Awareness among Connected Vehicles

Authors: Seungmo Kim

Abstract: Shared situation awareness (SSA) has been garnering explosive interest in various applications for intelligent transportation systems (ITS). In addition, the delay-constrained nature of supporting vehicular networks makes it critical to precisely analyze the performance of a SSA procedure. Extending the relevant literature, this paper provides an analysis framework that evaluates the performance o… ▽ More Shared situation awareness (SSA) has been garnering explosive interest in various applications for intelligent transportation systems (ITS). In addition, the delay-constrained nature of supporting vehicular networks makes it critical to precisely analyze the performance of a SSA procedure. Extending the relevant literature, this paper provides an analysis framework that evaluates the performance of SSA in spatial and temporal aspects simultaneously. Specifically, this paper provides a closed-form probability distribution for the length of time taken for constitution of a SSA among a group of connected vehicles. This paper extends the calculation to investigation of feasibility of SSA in supporting various types of safety messages defined by the SAE J2735. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06731 [pdf]

Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination

Authors: Soojong Kim, Kwanho Kim, Claire Wonjeong Jo

Abstract: Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available ser… ▽ More Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available service built upon an LLM, for sentiment analysis to discern different stances toward Human Papillomavirus (HPV) vaccination. Methods. Messages related to HPV vaccination were collected from social media supporting different message formats: Facebook (long format) and Twitter (short format). A selection of 1,000 human-evaluated messages was input into the LLM, which generated multiple response instances containing its classification results. Accuracy was measured for each message as the level of concurrence between human and machine decisions, ranging between 0 and 1. Results. Average accuracy was notably high when 20 response instances were used to determine the machine decision of each message: .882 (SE = .021) and .750 (SE = .029) for anti- and pro-vaccination long-form; .773 (SE = .027) and .723 (SE = .029) for anti- and pro-vaccination short-form, respectively. Using only three or even one instance did not lead to a severe decrease in accuracy. However, for long-form messages, the language model exhibited significantly lower accuracy in categorizing pro-vaccination messages than anti-vaccination ones. Conclusions. ChatGPT shows potential in analyzing public opinions on HPV vaccination using social media content. However, understanding the characteristics and limitations of a language model within specific public health contexts remains imperative. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Forthcoming in Preventive Medicine Reports

arXiv:2404.06621 [pdf, other]

What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models

Authors: Jeongrok Yu, Seong Ug Kim, Jacob Choi, **ho D. Choi

Abstract: Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper p… ▽ More Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05980 [pdf, other]

Tackling Structural Hallucination in Image Translation with Local Diffusion

Authors: Seunghoi Kim, Chen **, Tom Diethe, Matteo Figini, Henry F. J. Tregidgo, Asher Mullokandov, Philip Teare, Daniel C. Alexander

Abstract: Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing "image hallucination" and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducti… ▽ More Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing "image hallucination" and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducting separate image generations alleviates hallucinations in several applications. From this, we propose a training-free diffusion framework that reduces hallucination with multiple Local Diffusion processes. Our approach involves OOD estimation followed by two modules: a "branching" module generates locally both within and outside OOD regions, and a "fusion" module integrates these predictions into one. Our evaluation shows our method mitigates hallucination over baseline models quantitatively and qualitatively, reducing misdiagnosis by 40% and 25% in the real-world medical and natural image datasets, respectively. It also demonstrates compatibility with various pre-trained diffusion models. △ Less

Submitted 23 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05916 [pdf, other]

Prompt-driven Universal Model for View-Agnostic Echocardiography Analysis

Authors: Sekeun Kim, Hui Ren, Peng Guo, Abder-Rahman Ali, Patrick Zhang, Kyungsang Kim, Xiang Li, Quanzheng Li

Abstract: Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation… ▽ More Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation as the number of required models increases with the number of standard views. To address this, in this paper, we present a prompt-driven universal method for view-agnostic echocardiography analysis. Considering the domain shift between standard views, we first introduce a method called prompt matching, aimed at learning prompts specific to different views by matching prompts and querying input embeddings using a pre-trained vision model. Then, we utilized a pre-trained medical language model to align textual information with pixel data for accurate segmentation. Extensive experiments on three standard views showed that our approach significantly outperforms the state-of-the-art universal methods and achieves comparable or even better performances over the segmentation model trained and tested on same views. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05912 [pdf, ps, other]

OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136: Microlensing events with prominent orbital effects

Authors: Cheongho Han, Andrzej Udalski, Ian A. Bond, Chung-Uk Lee, Andrew Gould, Michael D. Albrow, Sun-Ju Chung, Kyu-Ha Hwang, Youn Kil Jung, Hyoun-Woo Kim, Yoon-Hyun Ryu, Yossi Shvartzvald, In-Gu Shin, Jennifer C. Yee, Hong**g Yang, Weicheng Zang, Sang-Mok Cha, Doeon Kim, Dong-** Kim, Seung-Lee Kim, Dong-Joo Lee, Yongseok Lee, Byeong-Gon Park, Richard W. Pogge, Przemek Mróz , et al. (38 additional authors not shown)

Abstract: We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions… ▽ More We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions of lenses is vital in accurately explaining the anomaly features observed in the light curves of the lensing events OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136. We estimate the masses and distances to the lenses by conducting Bayesian analyses using the lensing parameters of the newly found lensing solutions. From these analyses, we identify that the lenses of the events OGLE-2018-BLG-0971 and MOA-2023-BLG-065 are binaries composed of M dwarfs, while the lens of OGLE-2023-BLG-0136 is likely to be a binary composed of an early K-dwarf primary and a late M-dwarf companion. For all lensing events, the probability of the lens residing in the bulge is considerably higher than that of it being located in the disk. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 13 figures, 6 tables

arXiv:2404.05687 [pdf, other]

Retrieval-Augmented Open-Vocabulary Object Detection

Authors: Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

Abstract: Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R… ▽ More Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF . △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted paper at CVPR 2024

arXiv:2404.05514 [pdf, ps, other]

Explicit constructions of Diophantine tuples over finite fields

Authors: Seoyoung Kim, Chi Hoi Yip, Semin Yoo

Abstract: A Diophantine $m$-tuple over a finite field $\mathbb{F}_q$ is a set $\{a_1,\ldots, a_m\}$ of $m$ distinct elements in $\mathbb{F}_{q}^{*}$ such that $a_{i}a_{j}+1$ is a square in $\mathbb{F}_q$ whenever $i\neq j$. In this paper, we study $M(q)$, the maximum size of a Diophantine tuple over $\mathbb{F}_q$, assuming the characteristic of $\mathbb{F}_q$ is fixed and $q \to \infty$. By explicit constr… ▽ More A Diophantine $m$-tuple over a finite field $\mathbb{F}_q$ is a set $\{a_1,\ldots, a_m\}$ of $m$ distinct elements in $\mathbb{F}_{q}^{*}$ such that $a_{i}a_{j}+1$ is a square in $\mathbb{F}_q$ whenever $i\neq j$. In this paper, we study $M(q)$, the maximum size of a Diophantine tuple over $\mathbb{F}_q$, assuming the characteristic of $\mathbb{F}_q$ is fixed and $q \to \infty$. By explicit constructions, we improve the lower bound on $M(q)$. In particular, this improves a recent result of Dujella and Kazalicki by a multiplicative factor. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 9 pages

MSC Class: 11D72; 11D45; 11T24; 11B83

Journal ref: Ramanujan J., 2024

arXiv:2404.05364 [pdf, other]

Autoregressive Search of Gravitational Waves: Denoising

Authors: Sangin Kim, C. Y. Hui, Jianqi Yan, Alex P. Leung, Kwangmin Oh, A. K. H. Kong, L. C. -C. Lin, Kwan-Lok Li

Abstract: Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework… ▽ More Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework based on autoregressive modeling for denoising the GW data and extracting the waveform. We have tested our framework on extracting the injected signals from the simulated data as well as a series of known compact binary coalescence (CBC) events from the LIGO data. Comparing with the conventional whitening procedure, our methodology generally yields improved cross-correlation and reduced root mean square errors with respect to the signal model. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Phys. Rev. D in press, 16 pages, 11 figures, 1 table

arXiv:2404.05238 [pdf, other]

Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy

Authors: Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

Abstract: Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy… ▽ More Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive. △ Less

Submitted 20 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted for presentation at the XAI4CV Workshop, part of the CVPR 2024 proceedings

arXiv:2404.05144 [pdf, other]

Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients

Authors: HyoJe Jung, Yunha Kim, Heejung Choi, Hyeram Seo, Minkyoung Kim, JiYe Han, Gaeun Kee, Seohyun Park, Soyoung Ko, Byeolhee Kim, Suyeon Kim, Tae Joon Jun, Young-Hak Kim

Abstract: Medical documentation, including discharge notes, is crucial for ensuring patient care quality, continuity, and effective medical communication. However, the manual creation of these documents is not only time-consuming but also prone to inconsistencies and potential errors. The automation of this documentation process using artificial intelligence (AI) represents a promising area of innovation in… ▽ More Medical documentation, including discharge notes, is crucial for ensuring patient care quality, continuity, and effective medical communication. However, the manual creation of these documents is not only time-consuming but also prone to inconsistencies and potential errors. The automation of this documentation process using artificial intelligence (AI) represents a promising area of innovation in healthcare. This study directly addresses the inefficiencies and inaccuracies in creating discharge notes manually, particularly for cardiac patients, by employing AI techniques, specifically large language model (LLM). Utilizing a substantial dataset from a cardiology center, encompassing wide-ranging medical records and physician assessments, our research evaluates the capability of LLM to enhance the documentation process. Among the various models assessed, Mistral-7B distinguished itself by accurately generating discharge notes that significantly improve both documentation efficiency and the continuity of care for patients. These notes underwent rigorous qualitative evaluation by medical expert, receiving high marks for their clinical relevance, completeness, readability, and contribution to informed decision-making and care planning. Coupled with quantitative analyses, these results confirm Mistral-7B's efficacy in distilling complex medical information into concise, coherent summaries. Overall, our findings illuminate the considerable promise of specialized LLM, such as Mistral-7B, in refining healthcare documentation workflows and advancing patient care. This study lays the groundwork for further integrating advanced AI technologies in healthcare, demonstrating their potential to revolutionize patient documentation and support better care outcomes. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 10 pages, 1 figure, 3 tables, conference

arXiv:2404.05019 [pdf, other]

Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts

Authors: Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang

Abstract: Expert parallelism has been introduced as a strategy to distribute the computational workload of sparsely-gated mixture-of-experts (MoE) models across multiple computing devices, facilitating the execution of these increasingly large-scale models. However, the All-to-All communication intrinsic to expert parallelism constitutes a significant overhead, diminishing the MoE models' efficiency. Curren… ▽ More Expert parallelism has been introduced as a strategy to distribute the computational workload of sparsely-gated mixture-of-experts (MoE) models across multiple computing devices, facilitating the execution of these increasingly large-scale models. However, the All-to-All communication intrinsic to expert parallelism constitutes a significant overhead, diminishing the MoE models' efficiency. Current optimization approaches offer some relief, yet they are constrained by the sequential interdependence of communication and computation operations. To address this limitation, we present a novel shortcut-connected MoE architecture with overlap** parallel strategy, designated as ScMoE, which effectively decouples communication from its conventional sequence, allowing for a substantial overlap of 70% to 100% with computation. When compared with the prevalent top-2 MoE architecture, ScMoE demonstrates training speed improvements of 30% and 11%, and inference improvements of 40% and 15%, in our PCIe and NVLink hardware environments, respectively, where communication constitutes 60% and 15% of the total MoE time consumption. On the other hand, extensive experiments and theoretical analyses indicate that ScMoE not only achieves comparable but in some instances surpasses the model quality of existing approaches in vision and language tasks. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04153 [pdf, other]

Evaluation of the performance of the event reconstruction algorithms in the JSNS$^2$ experiment using a $^{252}$Cf calibration source

Authors: D. H. Lee, M. K. Cheoun, J. H. Choi, J. Y. Choi, T. Dodo, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, T. Iida, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. J. Kim, J. Y. Kim, S. B Kim, W. Kim, H. Kinoshita, T. Konno, I. T. Lim , et al. (28 additional authors not shown)

Abstract: JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of th… ▽ More JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of the event reconstruction is carefully checked with calibrations using $^{252}$Cf source. This manuscript describes the methodology and the performance of the event reconstruction. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.04091 [pdf, ps, other]

Bijections on pattern avoiding inversion sequences and related objects

Authors: JiSun Huh, Sangwook Kim, Seunghyun Seo, Heesung Shin

Abstract: The number of inversion sequences avoiding two patterns $101$ and $102$ is known to be the same as the number of permutations avoiding three patterns $2341$, $2431$, and $3241$. This sequence also counts the number of Schröder paths without triple descents, restricted bicolored Dyck paths, $(101,021)$-avoiding inversion sequences, and weighted ordered trees. We provide bijections to integrate them… ▽ More The number of inversion sequences avoiding two patterns $101$ and $102$ is known to be the same as the number of permutations avoiding three patterns $2341$, $2431$, and $3241$. This sequence also counts the number of Schröder paths without triple descents, restricted bicolored Dyck paths, $(101,021)$-avoiding inversion sequences, and weighted ordered trees. We provide bijections to integrate them together by introducing $F$-paths. Moreover, we define three kinds of statistics for each of the objects and count the number of each object with respect to these statistics. We also discuss direct sums of each object. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 29 pages, 12 figures, 3 tables

MSC Class: Primary 05A19; Secondary 05A05; 05A15

arXiv:2404.03691 [pdf, other]

Upgrade of NaI(Tl) crystal encapsulation for the NEON experiment

Authors: J. J. Choi, E. J. Jeon, J. Y. Kim, K. W. Kim, S. H. Kim, S. K. Kim, Y. D. Kim, Y. J. Ko, B. C. Koh, C. Ha, B. J. Park, S. H. Lee, I. S. Lee, H. Lee, H. S. Lee, J. Lee, Y. M. Oh

Abstract: The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which… ▽ More The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which operates at a thermal power of 2.8\,GW. Initial engineering operation was performed from May 2021 to March 2022 and observed unexpected photomultiplier-induced noise and a decreased light yield that were caused by leakage of liquid scintillator into the detector due to weakness of detector encapsulation. We upgraded the detector encapsulation design to prevent the leakage of the liquid scintillator. Meanwhile two small-sized detectors were replaced with larger ones resulting in a total mass of 16.7\,kg. With this new design implementation, the detector system has been operating stably since April 2022 for over a year without detector gain drop. In this paper, we present an improved crystal encapsulation design and stability of the NEON experiment. △ Less

Submitted 28 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.03679 [pdf, other]

Pulse Shape Discrimination in JSNS$^2$

Authors: T. Dodo, M. K. Cheoun, J. H. Choi, J. Y. Choi, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, T. Iida, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. J. Kim, J. Y. Kim, S. B. Kim, W. Kim, H. Kinoshita, T. Konno, D. H. Lee, I. T. Lim , et al. (29 additional authors not shown)

Abstract: JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is loca… ▽ More JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is located above ground, on the third floor of the building. We have achieved 95$\%$ rejection of neutron events while kee** 90$\%$ of signal, electron-like events using a data driven likelihood method. △ Less

Submitted 28 March, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2111.07482, arXiv:2308.02722

arXiv:2404.03613 [pdf, other]

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Authors: Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh

Abstract: As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field,… ▽ More As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce an efficient training strategy for faster convergence and higher quality. Project page: https://jeongminb.github.io/e-d3dgs/ △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.02575 [pdf, other]

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Authors: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, **young Yeo

Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use progra… ▽ More Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 38 pages, 4 figures

arXiv:2404.02483 [pdf, ps, other]

Refined canonical stable Grothendieck polynomials and their duals, Part 2

Authors: Byung-Hak Hwang, Jihyeug Jang, Jang Soo Kim, Minho Song, U-Keun Song

Abstract: This paper is the sequel of the paper under the same title with part 1, where we introduced refined canonical stable Grothendieck polynomials and their duals with two families of infinite parameters. In this paper we give combinatorial interpretations for these polynomials using generalizations of set-valued tableaux and reverse plane partitions, respectively. Our results extend to their flagged a… ▽ More This paper is the sequel of the paper under the same title with part 1, where we introduced refined canonical stable Grothendieck polynomials and their duals with two families of infinite parameters. In this paper we give combinatorial interpretations for these polynomials using generalizations of set-valued tableaux and reverse plane partitions, respectively. Our results extend to their flagged and skew versions. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 34 pages. This is the sequel of the manuscript (arXiv:2104.04251)

MSC Class: 05E05

arXiv:2404.02413 [pdf, ps, other]

Generalization of Spivey's recurrence relation

Authors: Taekyun Kim, Dae San Kim

Abstract: In 2008, Spivey found a recurrence relation for the Bell numbers. We consider the probabilistic r-Bell polynomials associated with which are a probabilistic extension of the r-Bell polynomials. Here Y is a random variable whose moment generating function exists in some neighborhood of the origin . The aim of this paper is to generalize the relation for the Bell numbers to that for the probabilisti… ▽ More In 2008, Spivey found a recurrence relation for the Bell numbers. We consider the probabilistic r-Bell polynomials associated with which are a probabilistic extension of the r-Bell polynomials. Here Y is a random variable whose moment generating function exists in some neighborhood of the origin . The aim of this paper is to generalize the relation for the Bell numbers to that for the probabilistic r-Bell polynomials associated with Y. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 10 pages

MSC Class: 11B73; 11B83

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01628 [pdf, other]

Learning Equi-angular Representations for Online Continual Learning

Authors: Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

Abstract: Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so th… ▽ More Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.01588 [pdf, other]

Hallucination Diversity-Aware Active Learning for Text Summarization

Authors: Yu Xia, Xu Liu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Anup Rao, Tung Mai, Shuai Li

Abstract: Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which l… ▽ More Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which limits their effectiveness in addressing various types of hallucinations exhibited in LLM outputs. To our best knowledge, in this paper we propose the first active learning framework to alleviate LLM hallucinations, reducing costly human annotations of hallucination needed. By measuring fine-grained hallucinations from errors in semantic frame, discourse and content verifiability in text summarization, we propose HAllucination Diversity-Aware Sampling (HADAS) to select diverse hallucinations for annotations in active learning for LLM finetuning. Extensive experiments on three datasets and different backbone models demonstrate advantages of our method in effectively and efficiently mitigating LLM hallucinations. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024

arXiv:2404.01579 [pdf]

Diffusion Deepfake

Authors: Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu

Abstract: Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to th… ▽ More Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to this evolving threat, our paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models as other datasets are less diverse and low in quality. Our extensive experiments also showed that our dataset is more challenging compared to the other face deepfake datasets. Our strategic dataset creation not only challenge the deepfake detectors but also sets a new benchmark for more evaluation. Our comprehensive evaluation reveals the struggle of existing detection methods, often optimized for specific image domains and manipulations, to effectively adapt to the intricate nature of diffusion deepfakes, limiting their practical utility. To address this critical issue, we investigate the impact of enhancing training data diversity on representative detection methods. This involves expanding the diversity of both manipulation techniques and image domains. Our findings underscore that increasing training data diversity results in improved generalizability. Moreover, we propose a novel momentum difficulty boosting strategy to tackle the additional challenge posed by training data heterogeneity. This strategy dynamically assigns appropriate sample weights based on learning difficulty, enhancing the model's adaptability to both easy and challenging samples. Extensive experiments on both existing and newly proposed benchmarks demonstrate that our model optimization approach surpasses prior alternatives significantly. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 28 pages including Supplementary material

arXiv:2404.01040 [pdf, ps, other]

Monge-Ampère equations with right-hand sides of polynomial growth

Authors: Beomjun Choi, Kyeongsu Choi, Soojung Kim

Abstract: We study the regularity and the growth rates of solutions to two-dimensional Monge-Ampère equations with the right-hand side exhibiting polynomial growth. Utilizing this analysis, we demonstrate that the translators for the flow by sub-affine-critical powers of the Gauss curvature are smooth, strictly convex entire graphs. These graphs exhibit specific growth rates that depend solely on the power… ▽ More We study the regularity and the growth rates of solutions to two-dimensional Monge-Ampère equations with the right-hand side exhibiting polynomial growth. Utilizing this analysis, we demonstrate that the translators for the flow by sub-affine-critical powers of the Gauss curvature are smooth, strictly convex entire graphs. These graphs exhibit specific growth rates that depend solely on the power of the flow. △ Less

Submitted 1 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: A part of this work was introduced in arXiv:2104.13186v1; however, we have separated and further elaborated the result to accommodate issues that will be dealt in the revision of arXiv:2104.13186. In v2, we generalized the main theorems slightly so that they apply for broader class of equations

MSC Class: 53E99; 35J96

arXiv:2404.01039 [pdf, other]

A Survey on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide

Authors: Sunwoo Kim, Soo Yong Lee, Yue Gao, Alessia Antelmi, Mirko Polato, Kijung Shin

Abstract: Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications, and thus investigation of deep learning for HOIs has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraph… ▽ More Higher-order interactions (HOIs) are ubiquitous in real-world complex systems and applications, and thus investigation of deep learning for HOIs has become a valuable agenda for the data mining and machine learning communities. As networks of HOIs are expressed mathematically as hypergraphs, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given the emerging trend, we present the first survey dedicated to HNNs, with an in-depth and step-by-step guide. Broadly, the present survey overviews HNN architectures, training strategies, and applications. First, we break existing HNNs down into four design components: (i) input features, (ii) input structures, (iii) message-passing schemes, and (iv) training strategies. Second, we examine how HNNs address and learn HOIs with each of their components. Third, we overview the recent applications of HNNs in recommendation, biological and medical science, time series analysis, and computer vision. Lastly, we conclude with a discussion on limitations and future directions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00995 [pdf, other]

PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation

Authors: Jaejung Seol, Seojun Kim, Jaejun Yoo

Abstract: Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend towards content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization. To bridge this gap, we introduce PosterLlama, a network designed for gen… ▽ More Visual layout plays a critical role in graphic design fields such as advertising, posters, and web UI design. The recent trend towards content-aware layout generation through generative models has shown promise, yet it often overlooks the semantic intricacies of layout design by treating it as a simple numerical optimization. To bridge this gap, we introduce PosterLlama, a network designed for generating visually and textually coherent layouts by reformatting layout elements into HTML code and leveraging the rich design knowledge embedded within language models. Furthermore, we enhance the robustness of our model with a unique depth-based poster augmentation strategy. This ensures our generated layouts remain semantically rich but also visually appealing, even with limited data. Our extensive evaluations across several benchmarks demonstrate that PosterLlama outperforms existing methods in producing authentic and content-aware layouts. It supports an unparalleled range of conditions, including but not limited to unconditional layout generation, element conditional layout generation, layout completion, among others, serving as a highly versatile user manipulation tool. △ Less

Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00802 [pdf, other]

The Mechanics and Physics of Twisted and Coiled Polymer Actuators

Authors: Qiong Wang, Anan Ghrayeb, SeongHyeon Kim, Liuyang Cheng, Sameh Tawfick

Abstract: Twisted and coiled polymer actuators (TCPAs) generate large contractile mechanical work mimicking natural muscles, which makes them suitable for robotics and health-assistive devices. Understanding the mechanism of nylon TCPA remains challenging due to the interplay between their intricate geometry, chirality, residual stresses, and material microstructure. This study integrates a material microst… ▽ More Twisted and coiled polymer actuators (TCPAs) generate large contractile mechanical work mimicking natural muscles, which makes them suitable for robotics and health-assistive devices. Understanding the mechanism of nylon TCPA remains challenging due to the interplay between their intricate geometry, chirality, residual stresses, and material microstructure. This study integrates a material microstructure model with rod theory to analytically predict the equilibrium helical shape of the nylon TCPA after fabrication and to explain the observed contraction mechanism upon stimulation. The first ingredient of the model is to treat nylon as a two-phase thermomechanical microstructure system capable of storing strain energy and exchanging it among the two phases. This is validated by characterizing the torsional actuation response of twisted and annealed nylon fibers. The second ingredient of the model is to use the classic Kirchhoff Rod Theory and add a necessary term that couples the bending and twisting energy. Validation with experiments shows that the model captures the equilibrium and longitudinal stiffness of the TCPA in both active and passive states, and the stimulated contraction under external load. Importantly, the model quantifies the influence of the stored energy level on the actuation performance. These concepts can be extended to other types of TCPAs and could enable new material design. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00638 [pdf, other]

HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Authors: Sunwoo Kim, Shinhwan Kang, Fanchen Bu, Soo Yong Lee, Jaemin Yoo, Kijung Shin

Abstract: Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex… ▽ More Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple nodes with hyperedges, and better capturing the topology is essential for effective representation learning. Recent advances in generative self-supervised learning (SSL) suggest that hypergraph neural networks learned from generative self supervision have the potential to effectively encode the complex hypergraph topology. Designing a generative SSL strategy for hypergraphs, however, is not straightforward. Questions remain with regard to its generative SSL task, connection to downstream tasks, and empirical properties of learned representations. In light of the promises and challenges, we propose a novel generative SSL strategy for hypergraphs. We first formulate a generative SSL task on hypergraphs, hyperedge filling, and highlight its theoretical connection to node classification. Based on the generative SSL task, we propose a hypergraph SSL method, HypeBoy. HypeBoy learns effective general-purpose hypergraph representations, outperforming 16 baseline methods across 11 benchmark datasets. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2404.00496 [pdf, ps, other]

Reducing of the Uncertainty Product of Coherent Light through Multi-Photon Interference

Authors: Sangbae Kim, Joachim Stohr, Fabian Rotermund, Byoung S. Ham

Abstract: We demonstrate theoretically and experimentally how the diffraction and interferometric resolution limit for single-mode coherent cw laser light can be overcome by multi-photon interference. By use of a Mach-Zehnder interferometer, operated in the single input and single or double output port geometries, we observe a fringe width reduction of the conventional interference pattern, predicted by the… ▽ More We demonstrate theoretically and experimentally how the diffraction and interferometric resolution limit for single-mode coherent cw laser light can be overcome by multi-photon interference. By use of a Mach-Zehnder interferometer, operated in the single input and single or double output port geometries, we observe a fringe width reduction of the conventional interference pattern, predicted by the wave or single photon quantum theory, by a factor of up to $1/\sqrt{2N}$ through coincident detection of $N=2,3,4$ photons. Our scheme does not require squeezed or entangled light to overcome the standard quantum limit and greatly facilitates precision interferometry experiments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00384 [pdf, other]

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

Authors: Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim

Abstract: We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy w… ▽ More We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy with CLIP's image embedding, leading to biased tag relevancy. In this paper, we introduce a novel two-step fine-tuning approach, Text-Tag Self-Distillation (TTD), to address this challenge. TTD first extracts image-relevant tags from text based on their similarity to the nearest pixels then employs a self-distillation strategy to align combined masks with the text-derived mask. This approach ensures the unbiased image-text alignment of the CLIP-based models using only image-text pairs without necessitating additional supervision. Our technique demonstrates model-agnostic improvements in multi-tag classification and segmentation tasks, surpassing competing methods that rely on external resources. The code is available at https://github.com/shjo-april/TTD. △ Less

Submitted 20 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00238 [pdf, other]

Flattening a trapped atomic gas using a programmable optical potential in a feedback loop

Authors: Sol Kim, Kyuhwan Lee, Jongmin Kim, Y. Shin

Abstract: We present a method for producing a flat, large-area Fermi gas of $^6$Li with a uniform area density. The method uses a programmable optical potential within a feedback loop to flatten the in-plane trap** potential for atoms. The optical potential is generated using a laser beam, whose intensity profile is adjusted by a spatial light modulator and optimized through measurements of the density di… ▽ More We present a method for producing a flat, large-area Fermi gas of $^6$Li with a uniform area density. The method uses a programmable optical potential within a feedback loop to flatten the in-plane trap** potential for atoms. The optical potential is generated using a laser beam, whose intensity profile is adjusted by a spatial light modulator and optimized through measurements of the density distribution of the sample. The resulting planar sample exhibits a uniform area density within a region of about 480 $μ$m in diameter and the standard deviation of the trap bottom potential is estimated to be $\approx k_B \times$ 6.1 nK, which is less than 20$\%$ of the transverse confinement energy. We discuss a dimensional crossover toward 2D regime by reducing the number of atoms in the planar trap, including the effect of the spatial variation of the transverse trap** frequency in the large-area sample. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 8 pages, 6 figures

arXiv:2403.20153 [pdf, other]

Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior

Authors: Jaehoon Ko, Kyusun Cho, Joungbin Lee, Heeji Yoon, Sangmin Lee, Sangjun Ahn, Seungryong Kim

Abstract: Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introdu… ▽ More Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introduce a novel audio-driven talking head synthesis framework, called Talk3D, that can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Given the personalized 3D generative model, we present a novel audio-guided attention U-Net architecture that predicts the dynamic face variations in the NeRF space driven by audio. Furthermore, our model is further modulated by audio-unrelated conditioning tokens which effectively disentangle variations unrelated to audio features. Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses. We also conduct extensive experiments showing our approach surpasses state-of-the-art benchmarks in terms of both quantitative and qualitative evaluations. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Project page: https://ku-cvlab.github.io/Talk3D/

arXiv:2403.19997 [pdf, other]

Size-dependent fracture in elastomers: experiments and continuum modeling

Authors: Jaehee Lee, Jeongun Lee, Seounghee Yun, Sanha Kim, Shawn A. Chester, Hansohl Cho

Abstract: Elastomeric materials display a complicated set of stretchability and fracture properties that strongly depend on the flaw size, which has long been of interest to engineers and materials scientists. Here, we combine experiments and numerical simulations for a comprehensive understanding of the nonlocal, size-dependent features of fracture in elastomers. We show the size-dependent fracture behavio… ▽ More Elastomeric materials display a complicated set of stretchability and fracture properties that strongly depend on the flaw size, which has long been of interest to engineers and materials scientists. Here, we combine experiments and numerical simulations for a comprehensive understanding of the nonlocal, size-dependent features of fracture in elastomers. We show the size-dependent fracture behavior is quantitatively described through a nonlocal continuum model. The key ingredient of the nonlocal model is the use of an intrinsic length scale associated with a finite fracture process zone, which is inferred from experiments. Of particular importance, our experimental and theoretical approach passes the critical set of capturing key aspects of the size-dependent fracture in elastomers. Applications to a wide range of synthetic elastomers that exhibit moderate (~100%) to extreme stretchability (~1000%) are presented, which is also used to demonstrate the applicability of our approach in elastomeric specimens with complex geometries. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.19270 [pdf, other]

sDPO: Don't Use Your Data All at Once

Authors: Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park

Abstract: As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at… ▽ More As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19144 [pdf, other]

MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Authors: Seyeon Kim, Siyoon **, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim

Abstract: Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overc… ▽ More Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, dubbed MoDiTalker. We introduce the two modules: audio-to-motion (AToM), designed to generate a synchronized lip motion from audio, and motion-to-video (MToV), designed to produce high-quality head video following the generated motion. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. In addition, MToV enhances temporal consistency by leveraging an efficient tri-plane representation. Our experiments conducted on standard benchmarks demonstrate that our model achieves superior performance compared to existing models. We also provide comprehensive ablation studies and user study results. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18881 [pdf]

Transmission IR Microscopy for the Quantitation of Biomolecular Mass In Live Cells

Authors: Yow-Ren Chang, Seong-Min Kim, Young Jong Lee

Abstract: Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applyi… ▽ More Absolute quantity imaging of biomolecules on a single cell level is critical for measurement assurance in biosciences and bioindustries. While infrared (IR) transmission microscopy is a powerful label-free imaging modality capable of chemical quantification, its applicability to hydrated biological samples remains challenging due to the strong water absorption. We overcome this challenge by applying a solvent absorption compensation (SAC) technique to a home-built quantum cascade laser IR microscope. SAC-IR microscopy improves the chemical sensitivity considerably by adjusting the incident light intensity to pre-compensate the IR absorption by water while retaining the full dynamic range. We demonstrate the label-free chemical imaging of key biomolecules of a cell, such as protein, fatty acid, and nucleic acid, with sub-cellular spatial resolution. By imaging live fibroblast cells over twelve hours, we monitor the mass change of the three molecular species of single cells at various phases, including cell division. While the current live-cell imaging demonstration involved three wavenumbers, more wavenumber images could measure more biomolecules in live cells with higher accuracy. As a label-free method to measure absolute quantities of various molecules in a cell, SAC-IR microscopy can potentially become a standard chemical characterization tool for live cells in biology, medicine, and biotechnology. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Body: 19 pages, 5 figures. Supplemental: 11 pages, 6 figures

arXiv:2403.18411 [pdf, ps, other]

Role of hidden-color components in the tetraquark mixing model

Authors: Hungchong Kim, K. S. Kim

Abstract: Multiquarks can have two-hadron components and hidden-color components in their wave functions. The presence of two-hadron components in multiquarks introduces a potential source of confusion, particularly with respect to their resemblance to hadronic molecules. On the other hand, hidden-color components are essential for distinguishing between multiquarks and hadronic molecules. In this work, we… ▽ More Multiquarks can have two-hadron components and hidden-color components in their wave functions. The presence of two-hadron components in multiquarks introduces a potential source of confusion, particularly with respect to their resemblance to hadronic molecules. On the other hand, hidden-color components are essential for distinguishing between multiquarks and hadronic molecules. In this work, we study the hidden-color components in the wave functions of the tetraquark mixing model, a model that has been proposed as a suitable framework for describing the properties of two nonets in the $J^P=0^+$ channel: the light nonet [$a_0 (980)$, $K_0^* (700)$, $f_0 (500)$, $f_0 (980)$] and the heavy nonet [$a_0 (1450)$, $K_0^* (1430)$, $f_0 (1370)$, $f_0 (1500)$]. Our analysis reveals a substantial presence of hidden-color components within the tetraquark wave functions. To elucidate the impact of hidden-color components on physical quantities, we conduct computations of the hyperfine masses, $\langle V_{CS}\rangle$, for the two nonets, considering scenarios involving only the two-meson components and those incorporating the hidden-color components. We demonstrate that the hidden-color components constitute an important part of the hyperfine masses, such that the mass difference formula, $ΔM\approx Δ\langle V_{CS}\rangle$, which has been successful for the two nonets, cannot be achieved without the hidden-color contributions. This can provide another evidence supporting the tetraquark nature of the two nonets. △ Less

Submitted 29 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 10 pages, no figure. The version accepted for publication in EPJC

Showing 201–250 of 7,472 results for author: Kim, S