Search | arXiv e-print repository

doi 10.1109/ICAIIC60209.2024.10463391

RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing

Authors: Won Hyeok Kim, Hyeong ** Kim, Tae Hee Han

Abstract: The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternat… ▽ More The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternative. This paper introduces the RISC-V R-extension, a novel approach to enhancing DNN process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency. Furthermore, this extension includes new custom instructions to support these architectural improvements. Through comprehensive analysis, this study demonstrates the boost of R-extension in edge device processing, setting the stage for more responsive and intelligent edge applications. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 6 pages, 6 figures, ICAIIC 2024

arXiv:2406.16275 [pdf, other]

Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper, we analyze the impact of such shortcuts in AIGT detection. We propose Feedback-based Adversarial Instruction List Optimization (FAILOpt), an attack that searches for instructions deceptive to AIGT detectors exploiting prompt-specific shortcuts. FAILOpt effectively drops the detection performance of the target detector, comparable to other attacks based on adversarial in-context examples. We also utilize our method to enhance the robustness of the detector by mitigating the shortcuts. Based on the findings, we further train the classifier with the dataset augmented by FAILOpt prompt. The augmented classifier exhibits improvements across generation models, tasks, and attacks. Our code will be available at https://github.com/zxcvvxcz/FAILOpt. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures, 13 tables, under review

arXiv:2406.09905 [pdf, other]

Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo ** Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" device providing a third-person viewpoint. We compute world-aligned 6DoF transformations for all sensors, across devices and capture sessions. The dataset also provides 3D scene point clouds and calibrated gaze estimation. We derive a protocol to annotate hierarchical language descriptions of in-context human motion, from fine-grain pose narrations, to atomic actions and activity summarization. To the best of our knowledge, the Nymeria dataset is the world largest in-the-wild collection of human motion with natural and diverse activities; first of its kind to provide synchronized and localized multi-device multimodal egocentric data; and the world largest dataset with motion-language descriptions. It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km. The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545. To demonstrate the potential of the dataset we define key research tasks for egocentric body tracking, motion synthesis, and action recognition and evaluate several state-of-the-art baseline algorithms. Data and code will be open-sourced. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09698 [pdf, other]

Projected background and sensitivity of AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08176 [pdf, other]

Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment

Authors: Taekbeom Lee, Youngseok Jang, H. ** Kim

Abstract: Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To… ▽ More Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To better treat this problem, we introduce category-level neural fields that learn meaningful common 3D information among objects belonging to the same category present in the scene. Our key idea is to subcategorize objects based on their observed shape for better training of the category-level model. Then we take advantage of the neural field to conduct the challenging task of registering partially observed objects by selecting and aligning against representative objects selected by ray-based uncertainty. Experiments on both simulation and real-world datasets demonstrate that our method improves the reconstruction of unobserved parts for several categories. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: RA-L. 8 pages, 8 figures, 4 tables

arXiv:2406.08140 [pdf]

Functional voxel hierarchy and afferent capacity revealed mental state transition on dynamic correlation resting-state fMRI

Authors: Dong Soo Lee, Hyun Joo Kim, Youngmin Huh, Yeon Koo Kang, Wonseok Whi, Hyekyoung Lee, Hye** Kang

Abstract: Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on s… ▽ More Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on sliding-window representation to underpin the visualization and quantitation of mental dynamic states with their transitions. Voxel hierarchy animation maps of positive graphs revealed abrupt changes in coreness k and kmaxcore, which we called mental state transitions. Afferent voxel capacities of the positive graphs also revealed transient modules composed of dominating voxels/independent components and their exchanges representing mental state transitions. Animation and quantification plots of voxel hierarchy and afferent capacity corroborated each other in underpinning mental state transitions and afferent module exchange on the positive directed functional connectivity graphs. We propose the use of spatiotemporal trajectories of voxels on positive dynamic graphs to construct hierarchical structures by k core percolation and quantified in- and out-flows of information of voxels by volume entropy/directed graphs to subserve diverse resting mental state transitions on resting-state fMRI graphs in normal human individuals. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.15591 [pdf, ps, other]

Constraints for electron-capture decays mimicking production of axion-like particles in nuclei

Authors: Aagrah Agnihotri, Jouni Suhonen, Hong Joo Kim

Abstract: We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS… ▽ More We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS branches, these estimates are of utmost importance for terrestrial searches of axion-like particles (ALPs). Predictions are made for EC-decay rates of 2$^{nd}$-forbidden unique (FU) and 2$^{nd}$-forbidden non-unique (FNU) EC transitions that can potentially mimic nuclear axion production in experiments designed to detect ALPs in nuclear environments. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.01361 [pdf, other]

Haptic-Based Bilateral Teleoperation of Aerial Manipulator for Extracting Wedged Object with Compensation of Human Reaction Time

Authors: Jeonghyun Byun, Dohyun Eom, H. ** Kim

Abstract: Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abru… ▽ More Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abrupt changes in the interaction force. In this paper, we propose a human reaction time compensating haptic-based bilateral teleoperation strategy for an aerial manipulator extracting a wedged object from a static structure (i.e., plug-pulling), which incurs an abrupt decrease in the interaction force and causes additional difficulty for an aerial platform. A haptic device composed of a 4-degree-of-freedom robotic arm and a gripper is made for the teleoperation of aerial wedged object-extracting tasks, and a haptic-based teleoperation method to execute the aerial manipulator by the haptic device is introduced. We detect the extraction of the object by the estimation of the external force exerted on the aerial manipulator and generate reference trajectories for both the aerial manipulator and the haptic device after the extraction. As an example of the extraction of a wedged object, we conduct comparative plug-pulling experiments with a quadrotor-based aerial manipulator. The results validate that the proposed bilateral teleoperation method reduces the overshoot in the aerial manipulator's position and ensures fast recovery to its initial position after extracting the wedged object. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: to be presented in 2024 IEEE International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Crete, Greece, 2024

arXiv:2404.14220 [pdf, other]

Robust electrothermal switching of optical phase change materials through computer-aided adaptive pulse optimization

Authors: Parth Garud, Kiumars Aryana, Cosmin Constantin Popescu, Steven Vitale, Rashi Sharma, Kathleen Richardson, Tian Gu, Juejun Hu, Hyun Jung Kim

Abstract: Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including… ▽ More Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including sensing, imaging, telecommunications, and data storage. Chalcogenide-based phase change materials (PCMs) have shown great promise due to their stable, non-volatile phase transition between amorphous and crystalline states. Nonetheless, optimizing the switching parameters of PCM devices and maintaining their stable operation over thousands of cycles with minimal variation can be challenging. In this paper, we report on the critical role of PCM pattern as well as electrical pulse form in achieving reliable and stable switching, extending the operational lifetime of the device beyond 13,000 switching events. To achieve this, we have developed a computer-aided algorithm that monitors optical changes in the device and adjusts the applied voltage in accordance with the phase transformation process, thereby significantly enhancing the lifetime of these reconfigurable devices. Our findings reveal that patterned PCM structures show significantly higher endurance compared to blanket PCM thin films. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11972 [pdf, other]

Aligning Language Models to Explicitly Handle Ambiguity

Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

Abstract: In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure… ▽ More In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. △ Less

Submitted 16 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11320 [pdf, other]

Saturated RISE control for considering rotor thrust saturation of fully actuated multirotor

Authors: Dongjae Lee, H. ** Kim

Abstract: This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performanc… ▽ More This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performance degradation or even instability due to a gap between the commanded input and the actual input during saturation. To address these issues, we present a modified saturated RISE (Robust Integral of the Sign of the Error) control method. The proposed modified saturated RISE controller is developed for expansion to a system with a non-diagonal, state-dependent input matrix. Next, we present reformulation of the system dynamics of a fully actuated multirotor, and apply the control law to the system. The proposed method is validated in simulation where the proposed controller outperforms the existing one thanks to the capability of handling the input matrix. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, 2024 International Conference on Unmanned Aircraft Systems (ICUAS) accepted

arXiv:2404.11310 [pdf, other]

Autonomous aerial perching and unperching using omnidirectional tiltrotor and switching controller

Authors: Dongjae Lee, Sunwoo Hwang, Jeonghyun Byun, Seung Jae Lee, H. ** Kim

Abstract: Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and pe… ▽ More Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight ($\approx$ $1$ \si{kg}), fully actuated tiltrotor that can hover at $90^\circ$ pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 7 pages, 10 figures, 2024 IEEE International Conference on Robotics and Automation (ICRA) accepted

arXiv:2404.11104 [pdf, other]

Object Remover Performance Evaluation Methods using Class-wise Object Removal Images

Authors: Changsuk Oh, Dongseok Shim, Taekbeom Lee, H. ** Kim

Abstract: Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work… ▽ More Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.05687 [pdf, other]

Retrieval-Augmented Open-Vocabulary Object Detection

Authors: Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

Abstract: Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R… ▽ More Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF . △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted paper at CVPR 2024

arXiv:2404.00851 [pdf, other]

Prompt Learning via Meta-Regularization

Authors: **young Park, Juyeon Ko, Hyunwoo J. Kim

Abstract: Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th… ▽ More Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2403.17709 [pdf, other]

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Authors: Jongha Kim, Jihwan Park, **young Park, **young Kim, Sehyung Kim, Hyunwoo J. Kim

Abstract: Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to… ▽ More Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.13347 [pdf, other]

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

Authors: Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

Abstract: Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., bac… ▽ More Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., backgrounds, degrade the generalization performance of models. To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training. For vid-TLDR, we introduce a novel approach to capture the salient regions in videos only with the attention map. Further, we introduce the saliency-aware token merging strategy by drop** the background tokens and sharpening the object scores. Our experiments show that vid-TLDR significantly mitigates the computational complexity of video Transformers while achieving competitive performance compared to the base model without vid-TLDR. Code is available at https://github.com/mlvlab/vid-TLDR. △ Less

Submitted 30 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2403.12726 [pdf]

Small Distance Increment Method for Measuring Complex Permittivity With mmWave Radar

Authors: Hang Song, Hyun Joon Kim, Mingxia Wan, Bo Wei, Takamaro Kikkawa, Jun-ichi Takada

Abstract: Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods re… ▽ More Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods require the destruction of samples, which is not suitable for non-destructive inspection. In this work, a small distance increment (SDI) method is proposed to non-destructively measure the complex permittivity of material. In SDI, the transmitter and receiver are formed as the monostatic radar, which is facing towards the material under test (MUT). During the measurement, the distance between radar and MUT changes with small increments and the signals are recorded at each position. A mathematical model is formulated to depict the relationship among the complex permittivity, distance increment, and measured signals. By fitting the model, the complex permittivity of MUT is estimated. To implement and evaluate the proposed SDI method, a commercial off-the-shelf mmWave radar is utilized and the measurement system is developed. Then, the evaluation was carried out on the acrylic plate. With the proposed method, the estimated complex permittivity of acrylic plate shows good agreement with the literature values, demonstrating the efficacy of SDI method for characterizing the complex permittivity of material. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.10030 [pdf, other]

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

Authors: Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

Abstract: Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minim… ▽ More Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (e.g., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy trade-off in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various Vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://github.com/mlvlab/MCTF. △ Less

Submitted 1 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2403.06397 [pdf, other]

DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Authors: Xuefeng Wang, Henglin Pu, Hyung Jun Kim, Husheng Li

Abstract: Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have bee… ▽ More Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have been only very limited applications of Model Predictive Control (MPC) methods in this domain, primarily due to the complex and implicit dynamics characteristic of multi-agent environments. To bridge this gap, we propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC). The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. Our method applies MARL principles to search for optimal solutions. Through the employment of MPC, the actions of agents can be restricted within safe states concurrently. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment, showcasing significant advancements in addressing safety concerns in MARL. △ Less

Submitted 11 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures

arXiv:2403.03181 [pdf, other]

Behavior Generation with Latent Actions

Authors: Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. ** Kim, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

Abstract: Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models calle… ▽ More Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. VQ-BeT augments BeT by tokenizing continuous actions with a hierarchical vector quantization module. Across seven environments including simulated manipulation, autonomous driving, and robotics, VQ-BeT improves on state-of-the-art models such as BeT and Diffusion Policies. Importantly, we demonstrate VQ-BeT's improved ability to capture behavior modes while accelerating inference speed 5x over Diffusion Policies. Videos and code can be found https://sjlee.cc/vq-bet △ Less

Submitted 28 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Github repo: https://github.com/jayLEE0301/vq_bet_official

arXiv:2402.16506 [pdf, other]

Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis

Authors: Juyeon Ko, Inho Kong, Dogyun Park, Hyunwoo J. Kim

Abstract: Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It… ▽ More Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It enhances robustness by stochastically perturbing the semantic label maps through Label Diffusion, which diffuses the labels with discrete diffusion. Through the diffusion of labels, the noisy and clean semantic maps become similar as the timestep increases, eventually becoming identical at $t=T$. This facilitates the generation of an image close to a clean image, enabling robust generation. Furthermore, we propose a class-wise noise schedule to differentially diffuse the labels depending on the class. We demonstrate that the proposed method generates high-quality samples through extensive experiments and analyses on benchmark datasets, including a novel experimental setup simulating human errors during real-world applications. Code is available at https://github.com/mlvlab/SCDM. △ Less

Submitted 3 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2402.15122 [pdf, other]

Measurements of low energy nuclear recoil quenching factors for Na and I recoils in the NaI(Tl) scintillator

Authors: S. H. Lee, H. W. Joo, H. J. Kim, K. W. Kim, S. K. Kim, Y. D. Kim, Y. J. Ko, H. S. Lee, J. Y. Lee, H. S. Park, Y. S. Yoon

Abstract: Elastic scattering off nuclei in target detectors, involving interactions with dark matter and coherent elastic neutrino nuclear recoil (CE$ν$NS), results in the deposition of low energy within the nuclei, dissipating rapidly through a combination of heat and ionization. The primary energy loss mechanism for nuclear recoil is heat, leading to consistently smaller measurable scintillation signals c… ▽ More Elastic scattering off nuclei in target detectors, involving interactions with dark matter and coherent elastic neutrino nuclear recoil (CE$ν$NS), results in the deposition of low energy within the nuclei, dissipating rapidly through a combination of heat and ionization. The primary energy loss mechanism for nuclear recoil is heat, leading to consistently smaller measurable scintillation signals compared to electron recoils of the same energy. The nuclear recoil quenching factor (QF), representing the ratio of scintillation light yield produced by nuclear recoil to that of electron recoil at the same energy, is a critical parameter for understanding dark matter and neutrino interactions with nuclei. The low energy QF of NaI(Tl) crystals, commonly employed in dark matter searches and CE$ν$NS measurements, is of substantial importance. Previous low energy QF measurements were constrained by contamination from photomultiplier tube (PMT)-induced noise, resulting in an observed light yield of approximately 15 photoelectrons per keVee (kilo-electron-volt electron-equivalent energy) and nuclear recoil energy above 5 keVnr (kilo-electron-volt nuclear recoil energy). Through enhanced crystal encapsulation, an increased light yield of around 26 photoelectrons per keVee is achieved. This improvement enables the measurement of the nuclear recoil QF for sodium nuclei at an energy of 3.8 $\pm$ 0.6 keVnr with a QF of 11.2 $\pm$ 1.7%. Furthermore, a reevaluation of previously reported QF results is conducted, incorporating enhancements in low energy events based on waveform simulation. The outcomes are generally consistent with various recent QF measurements for sodium and iodine. △ Less

Submitted 28 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14579 [pdf, other]

Text Role Classification in Scientific Charts Using Multimodal Transformers

Authors: Hye ** Kim, Nicolas Lell, Ansgar Scherp

Abstract: Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods… ▽ More Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods help the performance of the models. The models are evaluated on various chart datasets, and results show that LayoutLMv3 outperforms UDOP in all experiments. LayoutLMv3 achieves the highest F1-macro score of 82.87 on the ICPR22 test dataset, beating the best-performing model from the ICPR22 CHART-Infographics challenge. Moreover, the robustness of the models is tested on a synthetic noisy dataset ICPR22-N. Finally, the generalizability of the models is evaluated on three chart datasets, CHIME-R, DeGruyter, and EconBiz, for which we added labels for the text roles. Findings indicate that even in cases where there is limited training data, transformers can be used with the help of data augmentation and balancing methods. The source code and datasets are available on GitHub under https://github.com/hjkimk/text-role-classification △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.12097 [pdf, other]

Holographic dual effective field theory for an SYK model

Authors: Yoon-Seok Choun, Hyeon Jung Kim, Ki-Seok Kim

Abstract: We derive an emergent holographic dual description for an SYK model, where the renormalization group (RG) flows of collective bi-local fields appear manifestly in the bulk effective action with an emergent extradimension. This holographic dual effective field theory reproduces $1/N$ quantum corrections given by the Schwarzian action when we take the UV limit in the bulk effective action. Going int… ▽ More We derive an emergent holographic dual description for an SYK model, where the renormalization group (RG) flows of collective bi-local fields appear manifestly in the bulk effective action with an emergent extradimension. This holographic dual effective field theory reproduces $1/N$ quantum corrections given by the Schwarzian action when we take the UV limit in the bulk effective action. Going into the IR regime in the extradimension, we observe that the field theoretic $1/N$, $1/N^{2}$, ... quantum corrections are resummed in the all-loop order and reorganized to form a holographic dual effective field theory in a large $N$ fashion living on the one-higher dimensional spacetime. Taking the large $N$ limit in the holographic dual effective field theory, we obtain nonlinearly coupled second-order bulk differential equations of motion for the three bi-local order-parameter fields of fermion self-energy, Green's function, and polarization function. Here, both UV and IR boundary conditions are derived self-consistently from the boundary effective action. We solve these highly intertwined nonlinear differential equations based on the so called matching method. Our ansatz for the bi-local order-parameter fields coincide with the conformally invariant solution of the field theoretic large $N$ limit in the UV limit, but their overall coefficients RG-flow along the extradimensional space, respectively, reflecting effects of higher-order quantum corrections. As a result, we find an insulating behavior, where the self-energy diverges at IR. To confirm this insulating physics, we investigate thermodynamics. We obtain an effective free energy functional in terms of such bi-local dual order-parameter fields, which satisfy the Hamilton-Jacobi equation of the holographic dual effective field theory. ... △ Less

Submitted 21 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Regularizations have been clarified

arXiv:2401.12517 [pdf, other]

DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations

Authors: Dogyun Park, Sihyeon Kim, So** Lee, Hyunwoo J. Kim

Abstract: Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize… ▽ More Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize INRs and evaluate the network with fixed positional embeddings (PEs). Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation. To address this limitation, we propose Domain-agnostic Latent Diffusion Model for INRs (DDMI) that generates adaptive positional embeddings instead of neural networks' weights. Specifically, we develop a Discrete-to-continuous space Variational AutoEncoder (D2C-VAE), which seamlessly connects discrete data and the continuous signal functions in the shared latent space. Additionally, we introduce a novel conditioning mechanism for evaluating INRs with the hierarchically decomposed PEs to further enhance expressive power. Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models. △ Less

Submitted 20 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.07476 [pdf, other]

Background study of the AMoRE-pilot experiment

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Yu. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental conf… ▽ More We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental configurations with the results of Monte Carlo simulations and identified the background sources in each configuration. We replaced several detector components and enhanced the neutron shielding to lower the background level between configurations. A limit on the half-life of $0νββ$ decay of $^{100}$Mo was found at $T_{1/2}^{0ν} \ge 3.0\times 10^{23}$ years at 90\% confidence level, based on the measured background and its modeling. Further reduction of the background rate in the AMoRE-I and AMoRE-II are discussed. △ Less

Submitted 7 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.07462 [pdf, other]

doi 10.1140/epjc/s10052-024-12770-1

Nonproportionality of NaI(Tl) Scintillation Detector for Dark Matter Search Experiments

Authors: S. M. Lee, G. Adhikari, N. Carlin, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Fran. a, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (37 additional authors not shown)

Abstract: We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced… ▽ More We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced by decays supported by both long and short-lived isotopes. Analyzing peaks from decays supported only by short-lived isotopes presented a unique challenge due to their limited statistics and overlap** energies, which was overcome by long-term data collection and a time-dependent analysis. A key achievement is the direct measurement of the 0.87 keV light yield, resulting from the cascade following electron capture decay of $^{22}$Na from internal contamination. This measurement, previously accessible only indirectly, deepens our understanding of NaI(Tl) scintillator behavior in the region of interest for dark matter searches. This study holds substantial implications for background modeling and the interpretation of dark matter signals in NaI(Tl) experiments. △ Less

Submitted 10 May, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures

Journal ref: Eur. Phys. J. C 84 (2024) 484

arXiv:2401.01873 [pdf, other]

Observation of the Magnonic Dicke Superradiant Phase Transition

Authors: Dasom Kim, Sohail Dasgupta, Xiaoxuan Ma, Joong-Mok Park, Hao-Tian Wei, Liang Luo, Jacques Doumani, Xinwei Li, Wanting Yang, Di Cheng, Richard H. J. Kim, Henry O. Everitt, Shojiro Kimura, Hiroyuki Nojiri, Jigang Wang, Shixun Cao, Motoaki Bamba, Kaden R. A. Hazzard, Junichiro Kono

Abstract: Two-level atoms coupled with single-mode cavity photons are predicted to exhibit a quantum phase transition when the coupling strength exceeds a critical value, entering a phase in which atomic polarization and photonic field are finite even at zero temperature and without external driving. However, this phenomenon, the superradiant phase transition (SRPT), is forbidden by a no-go theorem due to t… ▽ More Two-level atoms coupled with single-mode cavity photons are predicted to exhibit a quantum phase transition when the coupling strength exceeds a critical value, entering a phase in which atomic polarization and photonic field are finite even at zero temperature and without external driving. However, this phenomenon, the superradiant phase transition (SRPT), is forbidden by a no-go theorem due to the existence of the diamagnetic term in the Hamiltonian. Here, we present spectroscopic evidence for a magnonic SRPT in ErFeO$_3$, where the role of the photonic mode (two-level atoms) in the photonic SRPT is played by an Fe$^{3+}$ magnon mode (Er$^{3+}$ spins). The absence of the diamagnetic term in the Fe$^{3+}$-Er$^{3+}$ exchange coupling ensures that the no-go theorem does not apply. Terahertz and gigahertz magnetospectroscopy experiments revealed the signatures of the SRPT -- a kink and a softening, respectively, of two spin-magnon hybridized modes at the critical point. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.12664 [pdf, other]

UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection

Authors: Bumsoo Kim, Taeho Choi, Jaewoo Kang, Hyunwoo J. Kim

Abstract: Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where… ▽ More Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x~14x while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: ECCV 2020

arXiv:2312.11979 [pdf, other]

Origin of chirality in transition-metal dichalcogenides

Authors: Kwangrae Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Hoon Kim, **-Kwang Kim, Jaehwon Kim, Hyunsung Kim, Junyoung Kwon, Jihoon Seol, Saegyeol Jung, Changyoung Kim, Ahmet Alatas, Ayman Said, Michael Merz, Matthieu Le Tacon, ** Mo Bok, Ki-Seok Kim, B. J. Kim

Abstract: Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood.… ▽ More Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood. Here, we unravel the mechanism of chiral charge density wave formation in the transition-metal dichalcogenide 1T-TiSe2. Using representation analysis, we show that charge density modulations and ionic displacements, which transform as a continuous scalar field and a vector field on a discrete lattice, respectively, follow different irreducible representations of the space group, despite the fact that they propagate with the same wave-vectors and are strongly coupled to each other. This charge-lattice symmetry frustration is resolved by further breaking of all symmetries not common to both sectors through induced lattice distortions, thus leading to chirality. Our theory is verified using Raman spectroscopy and inelastic x-ray scattering, which reveal that all but translation symmetries are broken at a level not resolved by state-of-the-art diffraction techniques. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures, 1 table

arXiv:2312.10912 [pdf, other]

Discovery of an Unconventional Quantum Echo by Interference of Higgs Coherence

Authors: C. Huang, M. Mootz, L. Luo, D. Cheng, J. M. Park, R. H. J. Kim, Y. Qiang, V. L. Quito, Yongxin Yao, P. P. Orth, I. E. Perakis, J. Wang

Abstract: Nonlinearities in quantum systems are fundamentally characterized by the interplay of phase coherences, their interference, and state transition amplitudes. Yet the question of how quantum coherence and interference manifest in transient, massive Higgs excitations, prevalent within both the quantum vacuum and superconductors, remains elusive. One hallmark example is photon echo, enabled by the gen… ▽ More Nonlinearities in quantum systems are fundamentally characterized by the interplay of phase coherences, their interference, and state transition amplitudes. Yet the question of how quantum coherence and interference manifest in transient, massive Higgs excitations, prevalent within both the quantum vacuum and superconductors, remains elusive. One hallmark example is photon echo, enabled by the generation, preservation, and retrieval of phase coherences amid multiple excitations. Here we reveal an unconventional quantum echo arising from the Higgs coherence in superconductors, and identify distinctive signatures attributed to Higgs anharmonicity. A terahertz pulse-pair modulation of the superconducting gap generates a "time grating" of coherent Higgs population, which scatters echo signals distinct from conventional spin- and photon-echoes in atoms and semiconductors. These manifestations appear as Higgs echo spectral peaks occurring at frequencies forbidden by equilibrium particle-hole symmetry, an asymmetric delay in the echo formation from the dynamics of the "reactive" superconducting state, and negative time signals arising from Higgs-quasiparticle anharmonic coupling. The Higgs interference and anharmonicity control the decoherence of driven superconductivity and may enable applications in quantum memory and entanglement. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.10468 [pdf, other]

Electrically reconfigurable phase-change transmissive metasurface

Authors: Cosmin Constantin Popescu, Kiumars Aryana, Parth Garud, Khoi Phuong Dao, Steven Vitale, Vladimir Liberman, Hyung-Bin Bae, Tae-Woo Lee, Myungkoo Kang, Kathleen A. Richardson, Carlos A. Rios Ocampo, Yifei Zhang, Tian Gu, Juejun Hu, Hyun Jung Kim

Abstract: Programmable and reconfigurable optics hold significant potential for transforming a broad spectrum of applications, spanning space explorations to biomedical imaging, gas sensing, and optical cloaking. The ability to adjust the optical properties of components like filters, lenses, and beam steering devices could result in dramatic reductions in size, weight, and power consumption in future optoe… ▽ More Programmable and reconfigurable optics hold significant potential for transforming a broad spectrum of applications, spanning space explorations to biomedical imaging, gas sensing, and optical cloaking. The ability to adjust the optical properties of components like filters, lenses, and beam steering devices could result in dramatic reductions in size, weight, and power consumption in future optoelectronic devices. Among the potential candidates for reconfigurable optics, chalcogenide-based phase change materials (PCMs) offer great promise due to their non-volatile and analogue switching characteristics. Although PCM have found widespread use in electronic data storage, these memory devices are deeply sub-micron-sized. To incorporate phase change materials into free-space optical components, it is essential to scale them up to beyond several hundreds of microns while maintaining reliable switching characteristics. This study demonstrated a non-mechanical, non-volatile transmissive filter based on low-loss PCMs with a 200 $μ$m$ \times $200 $μ$m switching area. The device/metafilter can be consistently switched between low- and high-transmission states using electrical pulses with a switching contrast ratio of 5.5 dB. The device was reversibly switched for 1250 cycles before accelerated degradation took place. The work represents an important step toward realizing free-space reconfigurable optics based on PCMs. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.07957 [pdf, other]

Scintillation characteristics of an undoped CsI crystal at low-temperature for dark matter search

Authors: W. K. Kim, H. Y. Lee, K. W. Kim, Y. J. Ko, J. A. Jeon, H. J. Kim, H. S. Lee

Abstract: The scintillation characteristics of an undoped CsI crystal with dimensions of 5.8 mm $\times$ 5.9 mm $\times$ 7.0 mm, corresponding to a weight of 1.0 g, were studied by directly coupling two silicon photomultipliers (SiPMs) over a temperature range from room temperature (300 K) to a low temperature of 86 K. The scintillation decay time and light output were measured using x-ray (23 keV) and gamm… ▽ More The scintillation characteristics of an undoped CsI crystal with dimensions of 5.8 mm $\times$ 5.9 mm $\times$ 7.0 mm, corresponding to a weight of 1.0 g, were studied by directly coupling two silicon photomultipliers (SiPMs) over a temperature range from room temperature (300 K) to a low temperature of 86 K. The scintillation decay time and light output were measured using x-ray (23 keV) and gamma-ray (88 keV) peaks from a $^{109}$Cd radioactive source. An increase in decay time was observed as the temperature decreased from room temperature to 86 K, ranging from 76 ns to 605 ns. Correspondingly, the light output increased as well, reaching 37.9 $\pm$ 1.5 photoelectrons per keV electron-equivalent at 86 K, which is approximately 18 times higher than the light yield at room temperature. Leveraging the significantly enhanced scintillation light output of the undpoed CsI crystal at the low temperature, coupling it with SiPMs makes it a promising candidate for the future dark matter search detector, benefiting from the low threshold owing to the high light output. The odd proton numbers from both cesium and iodine provide an advantage for the WIMP-proton spin-dependent interaction. We evaluated the sensitivity of low-mass dark matter on WIMP-proton spin-dependent interaction with the Migdal process, assuming 200 kg of undoped CsI crystals for the dark matter search. We conclude that undoped CsI crystal detectors exhibit world-competitive sensitivities for low-mass dark matter detection, particularly for the WIMP-proton spin-dependent interaction. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.04490 [pdf, other]

doi 10.1021/acsmacrolett.3c00617

Assembling PNIPAM-Capped Gold Nanoparticles in Aqueous Solutions

Authors: Binay P. Nayak, Hyeong ** Kim, Srikanth Nayak, Wenjie Wang, Wei Bu, Surya K. Mallapragada, David Vaknin

Abstract: Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical… ▽ More Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical solution temperature (LCST) of PNIPAM. Our results show that the longer the PNIPAM chain is, the better organization in the assembled clusters. Interestingly, without added electrolytes, there is no evidence of AuNP assembly as a function of temperature, although untethered PNIPAM is known to undergo a coil-to-globule transition above its LCST. This study demonstrates another approach to assembling potential thermo-sensitive nanostructures for devices by leveraging the unique properties of PNIPAM. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Published at ACS Macro Letters, DOI - https://doi.org/10.1021/acsmacrolett.3c00617

Journal ref: ACS Macro Lett. 2023, 12, XXX, 1659 to 1664

arXiv:2311.09762 [pdf, other]

Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models

Authors: **young Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim

Abstract: Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-q… ▽ More Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets. △ Less

Submitted 22 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Preprint

arXiv:2311.05010 [pdf, other]

doi 10.1016/j.astropartphys.2024.102945

Alpha backgrounds in NaI(Tl) crystals of COSINE-100

Authors: G. Adhikari, N. Carlin, D. F. F. S. Cavalcante, J. Y. Cho, J. J. Choi, S. Choi, A. C. Ezeribe, L. E. Franca, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, S. W. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim , et al. (38 additional authors not shown)

Abstract: COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Ca… ▽ More COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Carlo simulation results and activity quantification of the alpha decay components of the COSINE-100 NaI(Tl) crystals. The data strongly indicate that the alpha decays probabilistically undergo two possible quenching factors but require further investigation. The fitted results are consistent with independent measurements and improve the overall understanding of the COSINE-100 backgrounds. Furthermore, the half-life of 216Po has been measured to be 143.4 +/- 1.2 ms, which is consistent with and more precise than recent measurements. △ Less

Submitted 30 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.03784 [pdf, other]

UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields

Authors: Injae Kim, Minhyuk Choi, Hyunwoo J. Kim

Abstract: Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a vid… ▽ More Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a video. So they have difficulty handling unconstrained images with varying illumination and transient occluders. In this paper, we propose $\textbf{UP-NeRF}$ ($\textbf{U}$nconstrained $\textbf{P}$ose-prior-free $\textbf{Ne}$ural $\textbf{R}$adiance $\textbf{F}$ields) to optimize NeRF with unconstrained image collections without camera pose prior. We tackle these challenges with surrogate tasks that optimize color-insensitive feature fields and a separate module for transient occluders to block their influence on pose estimation. In addition, we introduce a candidate head to enable more robust pose estimation and transient-aware depth supervision to minimize the effect of incorrect prior. Our experiments verify the superior performance of our method compared to the baselines including BARF and its variants in a challenging internet photo collection, $\textit{Phototourism}$ dataset. △ Less

Submitted 7 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Neural Information Processing Systems (NeurIPS), 2023. The code is available at https://github.com/mlvlab/UP-NeRF

arXiv:2310.20258 [pdf, other]

Advancing Bayesian Optimization via Learning Correlated Latent Space

Authors: Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, Hyunwoo J. Kim

Abstract: Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it lea… ▽ More Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it leads to an inherent gap that results in potentially suboptimal solutions. To alleviate the discrepancy, we propose Correlated latent space Bayesian Optimization (CoBO), which focuses on learning correlated latent spaces characterized by a strong correlation between the distances in the latent space and the distances within the objective function. Specifically, our method introduces Lipschitz regularization, loss weighting, and trust region recoordination to minimize the inherent gap around the promising areas. We demonstrate the effectiveness of our approach on several optimization tasks in discrete data, such as molecule design and arithmetic expression fitting, and achieve high performance within a small budget. △ Less

Submitted 19 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.19261 [pdf, other]

Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

Authors: Daesol Cho, Seungjae Lee, H. ** Kim

Abstract: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C r… ▽ More Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19185 [pdf, other]

Robotic Barrier Construction through Weaved, Inflatable Tubes

Authors: H. J. Kim, H. Abdel-Raziq, X. Liu, A. Y. Siskovic, S. Patil, K. H. Petersen, H. L. Kao

Abstract: In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as bar… ▽ More In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as barriers, the ability of the tubes to withstand distributed loads. We further explore an algorithm which, given a feature map and the size and direction of the external load, can determine where and how to extrude the barrier. Finally, we showcase the potential of this method in an autonomously extruded two-layer wall weaved around three pipes. While preliminary, our work indicates that this method has the potential for barrier construction in cluttered environments, e.g. shelters against wind or snow. Future work may show how to achieve tighter weaves, how to leverage weave friction for improved strength, how to assess barrier performance for feedback control, and how to operate the extrusion mechanism off of a mobile robot. △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.19131 [pdf]

Versatile spaceborne photonics with chalcogenide phase-change materials

Authors: Hyun Jung Kim, Matthew Julian, Calum Williams, David Bombara, Juejun Hu, Tian Gu, Kiumars Aryana, Godfrey Sauti, William Humphreys

Abstract: Recent growth in space systems has seen increasing capabilities packed into smaller and lighter Earth observation and deep space mission spacecraft. Phase-change materials (PCMs) are nonvolatile, reconfigurable, fast-switching, and have recently shown a high degree of space radiation tolerance, thereby making them an attractive materials platform for spaceborne photonics applications. They promise… ▽ More Recent growth in space systems has seen increasing capabilities packed into smaller and lighter Earth observation and deep space mission spacecraft. Phase-change materials (PCMs) are nonvolatile, reconfigurable, fast-switching, and have recently shown a high degree of space radiation tolerance, thereby making them an attractive materials platform for spaceborne photonics applications. They promise robust, lightweight, and energy-efficient reconfigurable optical systems whose functions can be dynamically defined on-demand and on orbit to deliver enhanced science or mission support in harsh environments on lean power budgets. This comment aims to discuss the recent advances in rapidly growing PCM research and its potential to transition from conventional terrestrial optoelectronics materials platforms to versatile spaceborne photonic materials platforms for current and next-generation space and science missions. Materials International Space Station Experiment-14 (MISSE-14) mission-flown PCMs outside of the International Space Station (ISS) and key results and NASA examples are highlighted to provide strong evidence of the applicability of spaceborne photonics. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: 16 pages, 4 figures

arXiv:2310.17330 [pdf, other]

CQM: Curriculum Reinforcement Learning with a Quantized World Model

Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. ** Kim

Abstract: Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we p… ▽ More Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph. Concurrently, ours suggests uncertainty and temporal distance-aware curriculum goals that converges to the final goals over the automatically composed goal space. We demonstrate that the proposed method allows efficient explorations in an uninformed environment with raw goal examples only. Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.15747 [pdf, other]

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Authors: Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

Abstract: Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rel… ▽ More Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flip** the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA. △ Less

Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted paper at EMNLP 2023 Main

arXiv:2310.15484 [pdf, other]

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

Authors: Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim

Abstract: Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the f… ▽ More Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Neural Information Processing Systems (NeurIPS) 2023

arXiv:2310.14849 [pdf, other]

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

Authors: Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

Abstract: When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the… ▽ More When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model's generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model's performance. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023

arXiv:2310.14207 [pdf]

doi 10.1016/j.cap.2022.11.014

Atomic arrangement of van der Waals heterostructures using X-ray scattering and crystal truncation rod analysis

Authors: Ryung Kim, Byoung Ki Choi, Kyeong Jun Lee, Hyuk ** Kim, Hyun Hwi Lee, Tae Gyu Rhee, Yeong Gwang Khim, Young Jun Chang, Seo Hyoung Chang

Abstract: Vanadium diselenide (VSe2) has intriguing physical properties such as unexpected ferromagnetism at the two-dimensional limit. However, the experimental results for room temperature ferromagnetism are still controversial and depend on the detailed crystal structure and stoichiometry. Here we introduce crystal truncation rod (CTR) analysis to investigate the atomic arrangement of bilayer VSe2 and bi… ▽ More Vanadium diselenide (VSe2) has intriguing physical properties such as unexpected ferromagnetism at the two-dimensional limit. However, the experimental results for room temperature ferromagnetism are still controversial and depend on the detailed crystal structure and stoichiometry. Here we introduce crystal truncation rod (CTR) analysis to investigate the atomic arrangement of bilayer VSe2 and bilayer graphene (BLG) hetero-structures grown on a 6H-SiC(0001) substrate. Using non-destructive CTR analysis, we were able to obtain electron density profiles and detailed crystal structure of the VSe2/BLG heterostructures. Specifically, the out-of-plane lattice parameters of each VSe2 layer were modulated by the interface compared to that of the bulk VSe2 1T phase. The atomic arrangement of the VSe2/BLG heterostructure provides deeper understanding and insight for elucidating the magnetic properties of the van der Waals heterostructure. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 17 pages, 4 figures

Journal ref: Current Applied Physics 46, 70 (2023)

arXiv:2310.14205 [pdf]

doi 10.1186/s40580-023-00359-5

Machine-learning-assisted analysis of transition metal dichalcogenide thin-film growth

Authors: Hyuk ** Kim, Minsu Chong, Tae Gyu Rhee, Yeong Gwang Khim, Min-Hyoung Jung, Young-Min Kim, Hu Young Jeong, Byoung Ki Choi, Young Jun Chang

Abstract: In situ reflective high-energy electron diffraction (RHEED) is widely used to monitor the surface crystalline state during thin-film growth by molecular beam epitaxy (MBE) and pulsed laser deposition. With the recent development of machine learning (ML), ML-assisted analysis of RHEED videos aids in interpreting the complete RHEED data of oxide thin films. The quantitative analysis of RHEED data al… ▽ More In situ reflective high-energy electron diffraction (RHEED) is widely used to monitor the surface crystalline state during thin-film growth by molecular beam epitaxy (MBE) and pulsed laser deposition. With the recent development of machine learning (ML), ML-assisted analysis of RHEED videos aids in interpreting the complete RHEED data of oxide thin films. The quantitative analysis of RHEED data allows us to characterize and categorize the growth modes step by step, and extract hidden knowledge of the epitaxial film growth process. In this study, we employed the ML-assisted RHEED analysis method to investigate the growth of 2D thin films of transition metal dichalcogenides (ReSe2) on graphene substrates by MBE. Principal component analysis (PCA) and K-means clustering were used to separate statistically important patterns and visualize the trend of pattern evolution without any notable loss of information. Using the modified PCA, we could monitor the diffraction intensity of solely the ReSe2 layers by filtering out the substrate contribution. These findings demonstrate that ML analysis can be successfully employed to examine and understand the film-growth dynamics of 2D materials. Further, the ML-based method can pave the way for the development of advanced real-time monitoring and autonomous material synthesis techniques. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 21 pages, 4 figures

Journal ref: Nano Convergence 10, 10 (2023)

arXiv:2310.07974 [pdf, other]

Causality-based Cost Allocation for Peer-to-Peer Energy Trading in Distribution System

Authors: Hyun Joong Kim, Yong Hyun Song, Jip Kim

Abstract: While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching p… ▽ More While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching process with the physical grid conditions, this paper proposes a cost causality-based network cost allocation method and the grid-aware peer-matching process. Building on the cost causality principle, the proposed model utilizes the network cost (loss, congestion, and voltage) as a signal to encourage peers to adjust their preferences ensuring that matches are more in line with grid conditions, leading to enhanced social welfare. Additionally, this paper presents mathematical proof showing the superiority of the causality-based cost allocation over existing methods. △ Less

Submitted 20 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures

arXiv:2310.00886 [pdf, other]

doi 10.1038/s41586-023-06829-4

Quantum spin nematic phase in a square-lattice iridate

Authors: Hoon Kim, **-Kwang Kim, Jimin Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Kwangrae Kim, Wonjun Lee, Jonghwan Kim, Gil Young Cho, Hyeokjun Heo, Joonho Jang, J. Strempfer, G. Fabbris, Y. Choi, D. Haskel, Jungho Kim, J. -W. Kim, B. J. Kim

Abstract: Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the sq… ▽ More Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the square-lattice iridate Sr$_2$IrO$_4$, which approximately realizes a pseudospin one-half Heisenberg antiferromagnet (AF) in the strong spin-orbit coupling limit. Upon cooling, the transition into the SN phase at T$_C$ $\approx$ 263 K is marked by a divergence in the static spin quadrupole susceptibility extracted from our Raman spectra, and concomitant emergence of a collective mode associated with the spontaneous breaking of rotational symmetries. The quadrupolar order persists in the antiferromagnetic (AF) phase below T$_N$ $\approx$ 230 K, and becomes directly observable through its interference with the AF order in resonant x-ray diffraction, which allows us to uniquely determine its spatial structure. Further, we find using resonant inelastic x-ray scattering a complete breakdown of coherent magnon excitations at short-wavelength scales, suggesting a resonating-valence-bond-like quantum entanglement in the AF state. Taken together, our results reveal a quantum order underlying the Néel AF that is widely believed to be intimately connected to the mechanism of high temperature superconductivity (HTSC). △ Less

Submitted 14 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Published in https://www.nature.com/articles/s41586-023-06829-4

Showing 1–50 of 535 results for author: Kim, H J