-
RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing
Authors:
Won Hyeok Kim,
Hyeong ** Kim,
Tae Hee Han
Abstract:
The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternat…
▽ More
The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternative. This paper introduces the RISC-V R-extension, a novel approach to enhancing DNN process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency. Furthermore, this extension includes new custom instructions to support these architectural improvements. Through comprehensive analysis, this study demonstrates the boost of R-extension in edge device processing, setting the stage for more responsive and intelligent edge applications.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection
Authors:
Choonghyun Park,
Hyuhng Joon Kim,
Junyeob Kim,
Youna Kim,
Taeuk Kim,
Hyunsoo Cho,
Hwiyeol Jo,
Sang-goo Lee,
Kang Min Yoo
Abstract:
AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper…
▽ More
AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper, we analyze the impact of such shortcuts in AIGT detection. We propose Feedback-based Adversarial Instruction List Optimization (FAILOpt), an attack that searches for instructions deceptive to AIGT detectors exploiting prompt-specific shortcuts. FAILOpt effectively drops the detection performance of the target detector, comparable to other attacks based on adversarial in-context examples. We also utilize our method to enhance the robustness of the detector by mitigating the shortcuts. Based on the findings, we further train the classifier with the dataset augmented by FAILOpt prompt. The augmented classifier exhibits improvements across generation models, tasks, and attacks. Our code will be available at https://github.com/zxcvvxcz/FAILOpt.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild
Authors:
Lingni Ma,
Yuting Ye,
Fangzhou Hong,
Vladimir Guzov,
Yifeng Jiang,
Rowan Postyeni,
Luis Pesqueira,
Alexander Gamino,
Vijay Baiyya,
Hyo ** Kim,
Kevin Bailey,
David Soriano Fosas,
C. Karen Liu,
Ziwei Liu,
Jakob Engel,
Renzo De Nardi,
Richard Newcombe
Abstract:
We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev…
▽ More
We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" device providing a third-person viewpoint. We compute world-aligned 6DoF transformations for all sensors, across devices and capture sessions. The dataset also provides 3D scene point clouds and calibrated gaze estimation. We derive a protocol to annotate hierarchical language descriptions of in-context human motion, from fine-grain pose narrations, to atomic actions and activity summarization. To the best of our knowledge, the Nymeria dataset is the world largest in-the-wild collection of human motion with natural and diverse activities; first of its kind to provide synchronized and localized multi-device multimodal egocentric data; and the world largest dataset with motion-language descriptions. It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km. The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545. To demonstrate the potential of the dataset we define key research tasks for egocentric body tracking, motion synthesis, and action recognition and evaluate several state-of-the-art baseline algorithms. Data and code will be open-sourced.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment
Authors:
Taekbeom Lee,
Youngseok Jang,
H. ** Kim
Abstract:
Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To…
▽ More
Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To better treat this problem, we introduce category-level neural fields that learn meaningful common 3D information among objects belonging to the same category present in the scene. Our key idea is to subcategorize objects based on their observed shape for better training of the category-level model. Then we take advantage of the neural field to conduct the challenging task of registering partially observed objects by selecting and aligning against representative objects selected by ray-based uncertainty. Experiments on both simulation and real-world datasets demonstrate that our method improves the reconstruction of unobserved parts for several categories.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Functional voxel hierarchy and afferent capacity revealed mental state transition on dynamic correlation resting-state fMRI
Authors:
Dong Soo Lee,
Hyun Joo Kim,
Youngmin Huh,
Yeon Koo Kang,
Wonseok Whi,
Hyekyoung Lee,
Hye** Kang
Abstract:
Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on s…
▽ More
Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on sliding-window representation to underpin the visualization and quantitation of mental dynamic states with their transitions. Voxel hierarchy animation maps of positive graphs revealed abrupt changes in coreness k and kmaxcore, which we called mental state transitions. Afferent voxel capacities of the positive graphs also revealed transient modules composed of dominating voxels/independent components and their exchanges representing mental state transitions. Animation and quantification plots of voxel hierarchy and afferent capacity corroborated each other in underpinning mental state transitions and afferent module exchange on the positive directed functional connectivity graphs. We propose the use of spatiotemporal trajectories of voxels on positive dynamic graphs to construct hierarchical structures by k core percolation and quantified in- and out-flows of information of voxels by volume entropy/directed graphs to subserve diverse resting mental state transitions on resting-state fMRI graphs in normal human individuals.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Constraints for electron-capture decays mimicking production of axion-like particles in nuclei
Authors:
Aagrah Agnihotri,
Jouni Suhonen,
Hong Joo Kim
Abstract:
We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS…
▽ More
We give for the first time, theoretical estimates of ground-state-to-ground-state (GS-to-GS) electron-capture (EC) branch decay rates of $^{44}$Ti, $^{57}$Co, and $^{139}$Ce. The nuclear-structure calculations have been done exploiting the nuclear shell model (NSM) with well-established Hamiltonians and an advanced theory of $β$ decay. In the absence of experimental measurements of these GS-to-GS branches, these estimates are of utmost importance for terrestrial searches of axion-like particles (ALPs). Predictions are made for EC-decay rates of 2$^{nd}$-forbidden unique (FU) and 2$^{nd}$-forbidden non-unique (FNU) EC transitions that can potentially mimic nuclear axion production in experiments designed to detect ALPs in nuclear environments.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Haptic-Based Bilateral Teleoperation of Aerial Manipulator for Extracting Wedged Object with Compensation of Human Reaction Time
Authors:
Jeonghyun Byun,
Dohyun Eom,
H. ** Kim
Abstract:
Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abru…
▽ More
Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abrupt changes in the interaction force. In this paper, we propose a human reaction time compensating haptic-based bilateral teleoperation strategy for an aerial manipulator extracting a wedged object from a static structure (i.e., plug-pulling), which incurs an abrupt decrease in the interaction force and causes additional difficulty for an aerial platform. A haptic device composed of a 4-degree-of-freedom robotic arm and a gripper is made for the teleoperation of aerial wedged object-extracting tasks, and a haptic-based teleoperation method to execute the aerial manipulator by the haptic device is introduced. We detect the extraction of the object by the estimation of the external force exerted on the aerial manipulator and generate reference trajectories for both the aerial manipulator and the haptic device after the extraction. As an example of the extraction of a wedged object, we conduct comparative plug-pulling experiments with a quadrotor-based aerial manipulator. The results validate that the proposed bilateral teleoperation method reduces the overshoot in the aerial manipulator's position and ensures fast recovery to its initial position after extracting the wedged object.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Robust electrothermal switching of optical phase change materials through computer-aided adaptive pulse optimization
Authors:
Parth Garud,
Kiumars Aryana,
Cosmin Constantin Popescu,
Steven Vitale,
Rashi Sharma,
Kathleen Richardson,
Tian Gu,
Juejun Hu,
Hyun Jung Kim
Abstract:
Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including…
▽ More
Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including sensing, imaging, telecommunications, and data storage. Chalcogenide-based phase change materials (PCMs) have shown great promise due to their stable, non-volatile phase transition between amorphous and crystalline states. Nonetheless, optimizing the switching parameters of PCM devices and maintaining their stable operation over thousands of cycles with minimal variation can be challenging. In this paper, we report on the critical role of PCM pattern as well as electrical pulse form in achieving reliable and stable switching, extending the operational lifetime of the device beyond 13,000 switching events. To achieve this, we have developed a computer-aided algorithm that monitors optical changes in the device and adjusts the applied voltage in accordance with the phase transformation process, thereby significantly enhancing the lifetime of these reconfigurable devices. Our findings reveal that patterned PCM structures show significantly higher endurance compared to blanket PCM thin films.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Aligning Language Models to Explicitly Handle Ambiguity
Authors:
Hyuhng Joon Kim,
Youna Kim,
Cheonbok Park,
Junyeob Kim,
Choonghyun Park,
Kang Min Yoo,
Sang-goo Lee,
Taeuk Kim
Abstract:
In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure…
▽ More
In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.
△ Less
Submitted 16 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Saturated RISE control for considering rotor thrust saturation of fully actuated multirotor
Authors:
Dongjae Lee,
H. ** Kim
Abstract:
This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performanc…
▽ More
This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performance degradation or even instability due to a gap between the commanded input and the actual input during saturation. To address these issues, we present a modified saturated RISE (Robust Integral of the Sign of the Error) control method. The proposed modified saturated RISE controller is developed for expansion to a system with a non-diagonal, state-dependent input matrix. Next, we present reformulation of the system dynamics of a fully actuated multirotor, and apply the control law to the system. The proposed method is validated in simulation where the proposed controller outperforms the existing one thanks to the capability of handling the input matrix.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Autonomous aerial perching and unperching using omnidirectional tiltrotor and switching controller
Authors:
Dongjae Lee,
Sunwoo Hwang,
Jeonghyun Byun,
Seung Jae Lee,
H. ** Kim
Abstract:
Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and pe…
▽ More
Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight ($\approx$ $1$ \si{kg}), fully actuated tiltrotor that can hover at $90^\circ$ pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Object Remover Performance Evaluation Methods using Class-wise Object Removal Images
Authors:
Changsuk Oh,
Dongseok Shim,
Taekbeom Lee,
H. ** Kim
Abstract:
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work…
▽ More
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Retrieval-Augmented Open-Vocabulary Object Detection
Authors:
Jooyeon Kim,
Eulrang Cho,
Sehyung Kim,
Hyunwoo J. Kim
Abstract:
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R…
▽ More
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF .
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Prompt Learning via Meta-Regularization
Authors:
**young Park,
Juyeon Ko,
Hyunwoo J. Kim
Abstract:
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th…
▽ More
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Authors:
Jongha Kim,
Jihwan Park,
**young Park,
**young Kim,
Sehyung Kim,
Hyunwoo J. Kim
Abstract:
Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to…
▽ More
Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Authors:
Joonmyung Choi,
Sanghyeok Lee,
Jaewon Chu,
Minhyuk Choi,
Hyunwoo J. Kim
Abstract:
Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., bac…
▽ More
Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., backgrounds, degrade the generalization performance of models. To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training. For vid-TLDR, we introduce a novel approach to capture the salient regions in videos only with the attention map. Further, we introduce the saliency-aware token merging strategy by drop** the background tokens and sharpening the object scores. Our experiments show that vid-TLDR significantly mitigates the computational complexity of video Transformers while achieving competitive performance compared to the base model without vid-TLDR. Code is available at https://github.com/mlvlab/vid-TLDR.
△ Less
Submitted 30 March, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Small Distance Increment Method for Measuring Complex Permittivity With mmWave Radar
Authors:
Hang Song,
Hyun Joon Kim,
Mingxia Wan,
Bo Wei,
Takamaro Kikkawa,
Jun-ichi Takada
Abstract:
Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods re…
▽ More
Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods require the destruction of samples, which is not suitable for non-destructive inspection. In this work, a small distance increment (SDI) method is proposed to non-destructively measure the complex permittivity of material. In SDI, the transmitter and receiver are formed as the monostatic radar, which is facing towards the material under test (MUT). During the measurement, the distance between radar and MUT changes with small increments and the signals are recorded at each position. A mathematical model is formulated to depict the relationship among the complex permittivity, distance increment, and measured signals. By fitting the model, the complex permittivity of MUT is estimated. To implement and evaluate the proposed SDI method, a commercial off-the-shelf mmWave radar is utilized and the measurement system is developed. Then, the evaluation was carried out on the acrylic plate. With the proposed method, the estimated complex permittivity of acrylic plate shows good agreement with the literature values, demonstrating the efficacy of SDI method for characterizing the complex permittivity of material.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Authors:
Sanghyeok Lee,
Joonmyung Choi,
Hyunwoo J. Kim
Abstract:
Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minim…
▽ More
Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (e.g., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy trade-off in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various Vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://github.com/mlvlab/MCTF.
△ Less
Submitted 1 April, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning
Authors:
Xuefeng Wang,
Henglin Pu,
Hyung Jun Kim,
Husheng Li
Abstract:
Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have bee…
▽ More
Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have been only very limited applications of Model Predictive Control (MPC) methods in this domain, primarily due to the complex and implicit dynamics characteristic of multi-agent environments. To bridge this gap, we propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC). The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. Our method applies MARL principles to search for optimal solutions. Through the employment of MPC, the actions of agents can be restricted within safe states concurrently. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment, showcasing significant advancements in addressing safety concerns in MARL.
△ Less
Submitted 11 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Behavior Generation with Latent Actions
Authors:
Seungjae Lee,
Yibin Wang,
Haritheja Etukuru,
H. ** Kim,
Nur Muhammad Mahi Shafiullah,
Lerrel Pinto
Abstract:
Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models calle…
▽ More
Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. VQ-BeT augments BeT by tokenizing continuous actions with a hierarchical vector quantization module. Across seven environments including simulated manipulation, autonomous driving, and robotics, VQ-BeT improves on state-of-the-art models such as BeT and Diffusion Policies. Importantly, we demonstrate VQ-BeT's improved ability to capture behavior modes while accelerating inference speed 5x over Diffusion Policies. Videos and code can be found https://sjlee.cc/vq-bet
△ Less
Submitted 28 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis
Authors:
Juyeon Ko,
Inho Kong,
Dogyun Park,
Hyunwoo J. Kim
Abstract:
Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It…
▽ More
Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It enhances robustness by stochastically perturbing the semantic label maps through Label Diffusion, which diffuses the labels with discrete diffusion. Through the diffusion of labels, the noisy and clean semantic maps become similar as the timestep increases, eventually becoming identical at $t=T$. This facilitates the generation of an image close to a clean image, enabling robust generation. Furthermore, we propose a class-wise noise schedule to differentially diffuse the labels depending on the class. We demonstrate that the proposed method generates high-quality samples through extensive experiments and analyses on benchmark datasets, including a novel experimental setup simulating human errors during real-world applications. Code is available at https://github.com/mlvlab/SCDM.
△ Less
Submitted 3 June, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Measurements of low energy nuclear recoil quenching factors for Na and I recoils in the NaI(Tl) scintillator
Authors:
S. H. Lee,
H. W. Joo,
H. J. Kim,
K. W. Kim,
S. K. Kim,
Y. D. Kim,
Y. J. Ko,
H. S. Lee,
J. Y. Lee,
H. S. Park,
Y. S. Yoon
Abstract:
Elastic scattering off nuclei in target detectors, involving interactions with dark matter and coherent elastic neutrino nuclear recoil (CE$ν$NS), results in the deposition of low energy within the nuclei, dissipating rapidly through a combination of heat and ionization. The primary energy loss mechanism for nuclear recoil is heat, leading to consistently smaller measurable scintillation signals c…
▽ More
Elastic scattering off nuclei in target detectors, involving interactions with dark matter and coherent elastic neutrino nuclear recoil (CE$ν$NS), results in the deposition of low energy within the nuclei, dissipating rapidly through a combination of heat and ionization. The primary energy loss mechanism for nuclear recoil is heat, leading to consistently smaller measurable scintillation signals compared to electron recoils of the same energy. The nuclear recoil quenching factor (QF), representing the ratio of scintillation light yield produced by nuclear recoil to that of electron recoil at the same energy, is a critical parameter for understanding dark matter and neutrino interactions with nuclei. The low energy QF of NaI(Tl) crystals, commonly employed in dark matter searches and CE$ν$NS measurements, is of substantial importance. Previous low energy QF measurements were constrained by contamination from photomultiplier tube (PMT)-induced noise, resulting in an observed light yield of approximately 15 photoelectrons per keVee (kilo-electron-volt electron-equivalent energy) and nuclear recoil energy above 5 keVnr (kilo-electron-volt nuclear recoil energy). Through enhanced crystal encapsulation, an increased light yield of around 26 photoelectrons per keVee is achieved. This improvement enables the measurement of the nuclear recoil QF for sodium nuclei at an energy of 3.8 $\pm$ 0.6 keVnr with a QF of 11.2 $\pm$ 1.7%. Furthermore, a reevaluation of previously reported QF results is conducted, incorporating enhancements in low energy events based on waveform simulation. The outcomes are generally consistent with various recent QF measurements for sodium and iodine.
△ Less
Submitted 28 February, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Text Role Classification in Scientific Charts Using Multimodal Transformers
Authors:
Hye ** Kim,
Nicolas Lell,
Ansgar Scherp
Abstract:
Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods…
▽ More
Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods help the performance of the models. The models are evaluated on various chart datasets, and results show that LayoutLMv3 outperforms UDOP in all experiments. LayoutLMv3 achieves the highest F1-macro score of 82.87 on the ICPR22 test dataset, beating the best-performing model from the ICPR22 CHART-Infographics challenge. Moreover, the robustness of the models is tested on a synthetic noisy dataset ICPR22-N. Finally, the generalizability of the models is evaluated on three chart datasets, CHIME-R, DeGruyter, and EconBiz, for which we added labels for the text roles. Findings indicate that even in cases where there is limited training data, transformers can be used with the help of data augmentation and balancing methods. The source code and datasets are available on GitHub under https://github.com/hjkimk/text-role-classification
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Holographic dual effective field theory for an SYK model
Authors:
Yoon-Seok Choun,
Hyeon Jung Kim,
Ki-Seok Kim
Abstract:
We derive an emergent holographic dual description for an SYK model, where the renormalization group (RG) flows of collective bi-local fields appear manifestly in the bulk effective action with an emergent extradimension. This holographic dual effective field theory reproduces $1/N$ quantum corrections given by the Schwarzian action when we take the UV limit in the bulk effective action. Going int…
▽ More
We derive an emergent holographic dual description for an SYK model, where the renormalization group (RG) flows of collective bi-local fields appear manifestly in the bulk effective action with an emergent extradimension. This holographic dual effective field theory reproduces $1/N$ quantum corrections given by the Schwarzian action when we take the UV limit in the bulk effective action. Going into the IR regime in the extradimension, we observe that the field theoretic $1/N$, $1/N^{2}$, ... quantum corrections are resummed in the all-loop order and reorganized to form a holographic dual effective field theory in a large $N$ fashion living on the one-higher dimensional spacetime. Taking the large $N$ limit in the holographic dual effective field theory, we obtain nonlinearly coupled second-order bulk differential equations of motion for the three bi-local order-parameter fields of fermion self-energy, Green's function, and polarization function. Here, both UV and IR boundary conditions are derived self-consistently from the boundary effective action. We solve these highly intertwined nonlinear differential equations based on the so called matching method. Our ansatz for the bi-local order-parameter fields coincide with the conformally invariant solution of the field theoretic large $N$ limit in the UV limit, but their overall coefficients RG-flow along the extradimensional space, respectively, reflecting effects of higher-order quantum corrections. As a result, we find an insulating behavior, where the self-energy diverges at IR. To confirm this insulating physics, we investigate thermodynamics. We obtain an effective free energy functional in terms of such bi-local dual order-parameter fields, which satisfy the Hamilton-Jacobi equation of the holographic dual effective field theory. ...
△ Less
Submitted 21 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
Authors:
Dogyun Park,
Sihyeon Kim,
So** Lee,
Hyunwoo J. Kim
Abstract:
Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize…
▽ More
Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize INRs and evaluate the network with fixed positional embeddings (PEs). Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation. To address this limitation, we propose Domain-agnostic Latent Diffusion Model for INRs (DDMI) that generates adaptive positional embeddings instead of neural networks' weights. Specifically, we develop a Discrete-to-continuous space Variational AutoEncoder (D2C-VAE), which seamlessly connects discrete data and the continuous signal functions in the shared latent space. Additionally, we introduce a novel conditioning mechanism for evaluating INRs with the hierarchically decomposed PEs to further enhance expressive power. Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models.
△ Less
Submitted 20 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Background study of the AMoRE-pilot experiment
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Yu. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental conf…
▽ More
We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental configurations with the results of Monte Carlo simulations and identified the background sources in each configuration. We replaced several detector components and enhanced the neutron shielding to lower the background level between configurations. A limit on the half-life of $0νββ$ decay of $^{100}$Mo was found at $T_{1/2}^{0ν} \ge 3.0\times 10^{23}$ years at 90\% confidence level, based on the measured background and its modeling. Further reduction of the background rate in the AMoRE-I and AMoRE-II are discussed.
△ Less
Submitted 7 April, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Nonproportionality of NaI(Tl) Scintillation Detector for Dark Matter Search Experiments
Authors:
S. M. Lee,
G. Adhikari,
N. Carlin,
J. Y. Cho,
J. J. Choi,
S. Choi,
A. C. Ezeribe,
L. E. Fran. a,
C. Ha,
I. S. Hahn,
S. J. Hollick,
E. J. Jeon,
H. W. Joo,
W. G. Kang,
M. Kauer,
B. H. Kim,
H. J. Kim,
J. Kim,
K. W. Kim,
S. H. Kim,
S. K. Kim,
S. W. Kim,
W. K. Kim,
Y. D. Kim,
Y. H. Kim
, et al. (37 additional authors not shown)
Abstract:
We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced…
▽ More
We present a comprehensive study of the nonproportionality of NaI(Tl) scintillation detectors within the context of dark matter search experiments. Our investigation, which integrates COSINE-100 data with supplementary $γ$ spectroscopy, measures light yields across diverse energy levels from full-energy $γ$ peaks produced by the decays of various isotopes. These $γ$ peaks of interest were produced by decays supported by both long and short-lived isotopes. Analyzing peaks from decays supported only by short-lived isotopes presented a unique challenge due to their limited statistics and overlap** energies, which was overcome by long-term data collection and a time-dependent analysis. A key achievement is the direct measurement of the 0.87 keV light yield, resulting from the cascade following electron capture decay of $^{22}$Na from internal contamination. This measurement, previously accessible only indirectly, deepens our understanding of NaI(Tl) scintillator behavior in the region of interest for dark matter searches. This study holds substantial implications for background modeling and the interpretation of dark matter signals in NaI(Tl) experiments.
△ Less
Submitted 10 May, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Observation of the Magnonic Dicke Superradiant Phase Transition
Authors:
Dasom Kim,
Sohail Dasgupta,
Xiaoxuan Ma,
Joong-Mok Park,
Hao-Tian Wei,
Liang Luo,
Jacques Doumani,
Xinwei Li,
Wanting Yang,
Di Cheng,
Richard H. J. Kim,
Henry O. Everitt,
Shojiro Kimura,
Hiroyuki Nojiri,
Jigang Wang,
Shixun Cao,
Motoaki Bamba,
Kaden R. A. Hazzard,
Junichiro Kono
Abstract:
Two-level atoms coupled with single-mode cavity photons are predicted to exhibit a quantum phase transition when the coupling strength exceeds a critical value, entering a phase in which atomic polarization and photonic field are finite even at zero temperature and without external driving. However, this phenomenon, the superradiant phase transition (SRPT), is forbidden by a no-go theorem due to t…
▽ More
Two-level atoms coupled with single-mode cavity photons are predicted to exhibit a quantum phase transition when the coupling strength exceeds a critical value, entering a phase in which atomic polarization and photonic field are finite even at zero temperature and without external driving. However, this phenomenon, the superradiant phase transition (SRPT), is forbidden by a no-go theorem due to the existence of the diamagnetic term in the Hamiltonian. Here, we present spectroscopic evidence for a magnonic SRPT in ErFeO$_3$, where the role of the photonic mode (two-level atoms) in the photonic SRPT is played by an Fe$^{3+}$ magnon mode (Er$^{3+}$ spins). The absence of the diamagnetic term in the Fe$^{3+}$-Er$^{3+}$ exchange coupling ensures that the no-go theorem does not apply. Terahertz and gigahertz magnetospectroscopy experiments revealed the signatures of the SRPT -- a kink and a softening, respectively, of two spin-magnon hybridized modes at the critical point.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
Authors:
Bumsoo Kim,
Taeho Choi,
Jaewoo Kang,
Hyunwoo J. Kim
Abstract:
Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where…
▽ More
Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x~14x while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Origin of chirality in transition-metal dichalcogenides
Authors:
Kwangrae Kim,
Hyun-Woo J. Kim,
Seunghyeok Ha,
Hoon Kim,
**-Kwang Kim,
Jaehwon Kim,
Hyunsung Kim,
Junyoung Kwon,
Jihoon Seol,
Saegyeol Jung,
Changyoung Kim,
Ahmet Alatas,
Ayman Said,
Michael Merz,
Matthieu Le Tacon,
** Mo Bok,
Ki-Seok Kim,
B. J. Kim
Abstract:
Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood.…
▽ More
Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood. Here, we unravel the mechanism of chiral charge density wave formation in the transition-metal dichalcogenide 1T-TiSe2. Using representation analysis, we show that charge density modulations and ionic displacements, which transform as a continuous scalar field and a vector field on a discrete lattice, respectively, follow different irreducible representations of the space group, despite the fact that they propagate with the same wave-vectors and are strongly coupled to each other. This charge-lattice symmetry frustration is resolved by further breaking of all symmetries not common to both sectors through induced lattice distortions, thus leading to chirality. Our theory is verified using Raman spectroscopy and inelastic x-ray scattering, which reveal that all but translation symmetries are broken at a level not resolved by state-of-the-art diffraction techniques.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Discovery of an Unconventional Quantum Echo by Interference of Higgs Coherence
Authors:
C. Huang,
M. Mootz,
L. Luo,
D. Cheng,
J. M. Park,
R. H. J. Kim,
Y. Qiang,
V. L. Quito,
Yongxin Yao,
P. P. Orth,
I. E. Perakis,
J. Wang
Abstract:
Nonlinearities in quantum systems are fundamentally characterized by the interplay of phase coherences, their interference, and state transition amplitudes. Yet the question of how quantum coherence and interference manifest in transient, massive Higgs excitations, prevalent within both the quantum vacuum and superconductors, remains elusive. One hallmark example is photon echo, enabled by the gen…
▽ More
Nonlinearities in quantum systems are fundamentally characterized by the interplay of phase coherences, their interference, and state transition amplitudes. Yet the question of how quantum coherence and interference manifest in transient, massive Higgs excitations, prevalent within both the quantum vacuum and superconductors, remains elusive. One hallmark example is photon echo, enabled by the generation, preservation, and retrieval of phase coherences amid multiple excitations. Here we reveal an unconventional quantum echo arising from the Higgs coherence in superconductors, and identify distinctive signatures attributed to Higgs anharmonicity. A terahertz pulse-pair modulation of the superconducting gap generates a "time grating" of coherent Higgs population, which scatters echo signals distinct from conventional spin- and photon-echoes in atoms and semiconductors. These manifestations appear as Higgs echo spectral peaks occurring at frequencies forbidden by equilibrium particle-hole symmetry, an asymmetric delay in the echo formation from the dynamics of the "reactive" superconducting state, and negative time signals arising from Higgs-quasiparticle anharmonic coupling. The Higgs interference and anharmonicity control the decoherence of driven superconductivity and may enable applications in quantum memory and entanglement.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Electrically reconfigurable phase-change transmissive metasurface
Authors:
Cosmin Constantin Popescu,
Kiumars Aryana,
Parth Garud,
Khoi Phuong Dao,
Steven Vitale,
Vladimir Liberman,
Hyung-Bin Bae,
Tae-Woo Lee,
Myungkoo Kang,
Kathleen A. Richardson,
Carlos A. Rios Ocampo,
Yifei Zhang,
Tian Gu,
Juejun Hu,
Hyun Jung Kim
Abstract:
Programmable and reconfigurable optics hold significant potential for transforming a broad spectrum of applications, spanning space explorations to biomedical imaging, gas sensing, and optical cloaking. The ability to adjust the optical properties of components like filters, lenses, and beam steering devices could result in dramatic reductions in size, weight, and power consumption in future optoe…
▽ More
Programmable and reconfigurable optics hold significant potential for transforming a broad spectrum of applications, spanning space explorations to biomedical imaging, gas sensing, and optical cloaking. The ability to adjust the optical properties of components like filters, lenses, and beam steering devices could result in dramatic reductions in size, weight, and power consumption in future optoelectronic devices. Among the potential candidates for reconfigurable optics, chalcogenide-based phase change materials (PCMs) offer great promise due to their non-volatile and analogue switching characteristics. Although PCM have found widespread use in electronic data storage, these memory devices are deeply sub-micron-sized. To incorporate phase change materials into free-space optical components, it is essential to scale them up to beyond several hundreds of microns while maintaining reliable switching characteristics. This study demonstrated a non-mechanical, non-volatile transmissive filter based on low-loss PCMs with a 200 $μ$m$ \times $200 $μ$m switching area. The device/metafilter can be consistently switched between low- and high-transmission states using electrical pulses with a switching contrast ratio of 5.5 dB. The device was reversibly switched for 1250 cycles before accelerated degradation took place. The work represents an important step toward realizing free-space reconfigurable optics based on PCMs.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Scintillation characteristics of an undoped CsI crystal at low-temperature for dark matter search
Authors:
W. K. Kim,
H. Y. Lee,
K. W. Kim,
Y. J. Ko,
J. A. Jeon,
H. J. Kim,
H. S. Lee
Abstract:
The scintillation characteristics of an undoped CsI crystal with dimensions of 5.8 mm $\times$ 5.9 mm $\times$ 7.0 mm, corresponding to a weight of 1.0 g, were studied by directly coupling two silicon photomultipliers (SiPMs) over a temperature range from room temperature (300 K) to a low temperature of 86 K. The scintillation decay time and light output were measured using x-ray (23 keV) and gamm…
▽ More
The scintillation characteristics of an undoped CsI crystal with dimensions of 5.8 mm $\times$ 5.9 mm $\times$ 7.0 mm, corresponding to a weight of 1.0 g, were studied by directly coupling two silicon photomultipliers (SiPMs) over a temperature range from room temperature (300 K) to a low temperature of 86 K. The scintillation decay time and light output were measured using x-ray (23 keV) and gamma-ray (88 keV) peaks from a $^{109}$Cd radioactive source. An increase in decay time was observed as the temperature decreased from room temperature to 86 K, ranging from 76 ns to 605 ns. Correspondingly, the light output increased as well, reaching 37.9 $\pm$ 1.5 photoelectrons per keV electron-equivalent at 86 K, which is approximately 18 times higher than the light yield at room temperature. Leveraging the significantly enhanced scintillation light output of the undpoed CsI crystal at the low temperature, coupling it with SiPMs makes it a promising candidate for the future dark matter search detector, benefiting from the low threshold owing to the high light output. The odd proton numbers from both cesium and iodine provide an advantage for the WIMP-proton spin-dependent interaction. We evaluated the sensitivity of low-mass dark matter on WIMP-proton spin-dependent interaction with the Migdal process, assuming 200 kg of undoped CsI crystals for the dark matter search. We conclude that undoped CsI crystal detectors exhibit world-competitive sensitivities for low-mass dark matter detection, particularly for the WIMP-proton spin-dependent interaction.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Assembling PNIPAM-Capped Gold Nanoparticles in Aqueous Solutions
Authors:
Binay P. Nayak,
Hyeong ** Kim,
Srikanth Nayak,
Wenjie Wang,
Wei Bu,
Surya K. Mallapragada,
David Vaknin
Abstract:
Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical…
▽ More
Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical solution temperature (LCST) of PNIPAM. Our results show that the longer the PNIPAM chain is, the better organization in the assembled clusters. Interestingly, without added electrolytes, there is no evidence of AuNP assembly as a function of temperature, although untethered PNIPAM is known to undergo a coil-to-globule transition above its LCST. This study demonstrates another approach to assembling potential thermo-sensitive nanostructures for devices by leveraging the unique properties of PNIPAM.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models
Authors:
**young Park,
Ameen Patel,
Omar Zia Khan,
Hyunwoo J. Kim,
Joo-Kyung Kim
Abstract:
Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-q…
▽ More
Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.
△ Less
Submitted 22 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Alpha backgrounds in NaI(Tl) crystals of COSINE-100
Authors:
G. Adhikari,
N. Carlin,
D. F. F. S. Cavalcante,
J. Y. Cho,
J. J. Choi,
S. Choi,
A. C. Ezeribe,
L. E. Franca,
C. Ha,
I. S. Hahn,
S. J. Hollick,
E. J. Jeon,
H. W. Joo,
W. G. Kang,
M. Kauer,
B. H. Kim,
H. J. Kim,
J. Kim,
K. W. Kim,
S. H. Kim,
S. K. Kim,
S. W. Kim,
W. K. Kim,
Y. D. Kim,
Y. H. Kim
, et al. (38 additional authors not shown)
Abstract:
COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Ca…
▽ More
COSINE-100 is a dark matter direct detection experiment with 106 kg NaI(Tl) as the target material. 210Pb and daughter isotopes are a dominant background in the WIMP region of interest and are detected via beta decay and alpha decay. Analysis of the alpha channel complements the background model as observed in the beta/gamma channel. We present the measurement of the quenching factors and Monte Carlo simulation results and activity quantification of the alpha decay components of the COSINE-100 NaI(Tl) crystals. The data strongly indicate that the alpha decays probabilistically undergo two possible quenching factors but require further investigation. The fitted results are consistent with independent measurements and improve the overall understanding of the COSINE-100 backgrounds. Furthermore, the half-life of 216Po has been measured to be 143.4 +/- 1.2 ms, which is consistent with and more precise than recent measurements.
△ Less
Submitted 30 January, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields
Authors:
Injae Kim,
Minhyuk Choi,
Hyunwoo J. Kim
Abstract:
Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a vid…
▽ More
Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a video. So they have difficulty handling unconstrained images with varying illumination and transient occluders. In this paper, we propose $\textbf{UP-NeRF}$ ($\textbf{U}$nconstrained $\textbf{P}$ose-prior-free $\textbf{Ne}$ural $\textbf{R}$adiance $\textbf{F}$ields) to optimize NeRF with unconstrained image collections without camera pose prior. We tackle these challenges with surrogate tasks that optimize color-insensitive feature fields and a separate module for transient occluders to block their influence on pose estimation. In addition, we introduce a candidate head to enable more robust pose estimation and transient-aware depth supervision to minimize the effect of incorrect prior. Our experiments verify the superior performance of our method compared to the baselines including BARF and its variants in a challenging internet photo collection, $\textit{Phototourism}$ dataset.
△ Less
Submitted 7 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Advancing Bayesian Optimization via Learning Correlated Latent Space
Authors:
Seunghun Lee,
Jaewon Chu,
Sihyeon Kim,
Juyeon Ko,
Hyunwoo J. Kim
Abstract:
Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it lea…
▽ More
Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it leads to an inherent gap that results in potentially suboptimal solutions. To alleviate the discrepancy, we propose Correlated latent space Bayesian Optimization (CoBO), which focuses on learning correlated latent spaces characterized by a strong correlation between the distances in the latent space and the distances within the objective function. Specifically, our method introduces Lipschitz regularization, loss weighting, and trust region recoordination to minimize the inherent gap around the promising areas. We demonstrate the effectiveness of our approach on several optimization tasks in discrete data, such as molecule design and arithmetic expression fitting, and achieve high performance within a small budget.
△ Less
Submitted 19 November, 2023; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement
Authors:
Daesol Cho,
Seungjae Lee,
H. ** Kim
Abstract:
Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C r…
▽ More
Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Robotic Barrier Construction through Weaved, Inflatable Tubes
Authors:
H. J. Kim,
H. Abdel-Raziq,
X. Liu,
A. Y. Siskovic,
S. Patil,
K. H. Petersen,
H. L. Kao
Abstract:
In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as bar…
▽ More
In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as barriers, the ability of the tubes to withstand distributed loads. We further explore an algorithm which, given a feature map and the size and direction of the external load, can determine where and how to extrude the barrier. Finally, we showcase the potential of this method in an autonomously extruded two-layer wall weaved around three pipes. While preliminary, our work indicates that this method has the potential for barrier construction in cluttered environments, e.g. shelters against wind or snow. Future work may show how to achieve tighter weaves, how to leverage weave friction for improved strength, how to assess barrier performance for feedback control, and how to operate the extrusion mechanism off of a mobile robot.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Versatile spaceborne photonics with chalcogenide phase-change materials
Authors:
Hyun Jung Kim,
Matthew Julian,
Calum Williams,
David Bombara,
Juejun Hu,
Tian Gu,
Kiumars Aryana,
Godfrey Sauti,
William Humphreys
Abstract:
Recent growth in space systems has seen increasing capabilities packed into smaller and lighter Earth observation and deep space mission spacecraft. Phase-change materials (PCMs) are nonvolatile, reconfigurable, fast-switching, and have recently shown a high degree of space radiation tolerance, thereby making them an attractive materials platform for spaceborne photonics applications. They promise…
▽ More
Recent growth in space systems has seen increasing capabilities packed into smaller and lighter Earth observation and deep space mission spacecraft. Phase-change materials (PCMs) are nonvolatile, reconfigurable, fast-switching, and have recently shown a high degree of space radiation tolerance, thereby making them an attractive materials platform for spaceborne photonics applications. They promise robust, lightweight, and energy-efficient reconfigurable optical systems whose functions can be dynamically defined on-demand and on orbit to deliver enhanced science or mission support in harsh environments on lean power budgets. This comment aims to discuss the recent advances in rapidly growing PCM research and its potential to transition from conventional terrestrial optoelectronics materials platforms to versatile spaceborne photonic materials platforms for current and next-generation space and science missions. Materials International Space Station Experiment-14 (MISSE-14) mission-flown PCMs outside of the International Space Station (ISS) and key results and NASA examples are highlighted to provide strong evidence of the applicability of spaceborne photonics.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
CQM: Curriculum Reinforcement Learning with a Quantized World Model
Authors:
Seungjae Lee,
Daesol Cho,
Jonghae Park,
H. ** Kim
Abstract:
Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we p…
▽ More
Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph. Concurrently, ours suggests uncertainty and temporal distance-aware curriculum goals that converges to the final goals over the automatically composed goal space. We demonstrate that the proposed method allows efficient explorations in an uninformed environment with raw goal examples only. Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Authors:
Dohwan Ko,
Ji Soo Lee,
Wooyoung Kang,
Byungseok Roh,
Hyunwoo J. Kim
Abstract:
Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rel…
▽ More
Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flip** the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA.
△ Less
Submitted 6 November, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA
Authors:
Hyeong Kyu Choi,
Seunghun Lee,
Jaewon Chu,
Hyunwoo J. Kim
Abstract:
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the f…
▽ More
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP
Authors:
Hyuhng Joon Kim,
Hyunsoo Cho,
Sang-Woo Lee,
Junyeob Kim,
Choonghyun Park,
Sang-goo Lee,
Kang Min Yoo,
Taeuk Kim
Abstract:
When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the…
▽ More
When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model's generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model's performance.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Atomic arrangement of van der Waals heterostructures using X-ray scattering and crystal truncation rod analysis
Authors:
Ryung Kim,
Byoung Ki Choi,
Kyeong Jun Lee,
Hyuk ** Kim,
Hyun Hwi Lee,
Tae Gyu Rhee,
Yeong Gwang Khim,
Young Jun Chang,
Seo Hyoung Chang
Abstract:
Vanadium diselenide (VSe2) has intriguing physical properties such as unexpected ferromagnetism at the two-dimensional limit. However, the experimental results for room temperature ferromagnetism are still controversial and depend on the detailed crystal structure and stoichiometry. Here we introduce crystal truncation rod (CTR) analysis to investigate the atomic arrangement of bilayer VSe2 and bi…
▽ More
Vanadium diselenide (VSe2) has intriguing physical properties such as unexpected ferromagnetism at the two-dimensional limit. However, the experimental results for room temperature ferromagnetism are still controversial and depend on the detailed crystal structure and stoichiometry. Here we introduce crystal truncation rod (CTR) analysis to investigate the atomic arrangement of bilayer VSe2 and bilayer graphene (BLG) hetero-structures grown on a 6H-SiC(0001) substrate. Using non-destructive CTR analysis, we were able to obtain electron density profiles and detailed crystal structure of the VSe2/BLG heterostructures. Specifically, the out-of-plane lattice parameters of each VSe2 layer were modulated by the interface compared to that of the bulk VSe2 1T phase. The atomic arrangement of the VSe2/BLG heterostructure provides deeper understanding and insight for elucidating the magnetic properties of the van der Waals heterostructure.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Machine-learning-assisted analysis of transition metal dichalcogenide thin-film growth
Authors:
Hyuk ** Kim,
Minsu Chong,
Tae Gyu Rhee,
Yeong Gwang Khim,
Min-Hyoung Jung,
Young-Min Kim,
Hu Young Jeong,
Byoung Ki Choi,
Young Jun Chang
Abstract:
In situ reflective high-energy electron diffraction (RHEED) is widely used to monitor the surface crystalline state during thin-film growth by molecular beam epitaxy (MBE) and pulsed laser deposition. With the recent development of machine learning (ML), ML-assisted analysis of RHEED videos aids in interpreting the complete RHEED data of oxide thin films. The quantitative analysis of RHEED data al…
▽ More
In situ reflective high-energy electron diffraction (RHEED) is widely used to monitor the surface crystalline state during thin-film growth by molecular beam epitaxy (MBE) and pulsed laser deposition. With the recent development of machine learning (ML), ML-assisted analysis of RHEED videos aids in interpreting the complete RHEED data of oxide thin films. The quantitative analysis of RHEED data allows us to characterize and categorize the growth modes step by step, and extract hidden knowledge of the epitaxial film growth process. In this study, we employed the ML-assisted RHEED analysis method to investigate the growth of 2D thin films of transition metal dichalcogenides (ReSe2) on graphene substrates by MBE. Principal component analysis (PCA) and K-means clustering were used to separate statistically important patterns and visualize the trend of pattern evolution without any notable loss of information. Using the modified PCA, we could monitor the diffraction intensity of solely the ReSe2 layers by filtering out the substrate contribution. These findings demonstrate that ML analysis can be successfully employed to examine and understand the film-growth dynamics of 2D materials. Further, the ML-based method can pave the way for the development of advanced real-time monitoring and autonomous material synthesis techniques.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Causality-based Cost Allocation for Peer-to-Peer Energy Trading in Distribution System
Authors:
Hyun Joong Kim,
Yong Hyun Song,
Jip Kim
Abstract:
While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching p…
▽ More
While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching process with the physical grid conditions, this paper proposes a cost causality-based network cost allocation method and the grid-aware peer-matching process. Building on the cost causality principle, the proposed model utilizes the network cost (loss, congestion, and voltage) as a signal to encourage peers to adjust their preferences ensuring that matches are more in line with grid conditions, leading to enhanced social welfare. Additionally, this paper presents mathematical proof showing the superiority of the causality-based cost allocation over existing methods.
△ Less
Submitted 20 February, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Quantum spin nematic phase in a square-lattice iridate
Authors:
Hoon Kim,
**-Kwang Kim,
Jimin Kim,
Hyun-Woo J. Kim,
Seunghyeok Ha,
Kwangrae Kim,
Wonjun Lee,
Jonghwan Kim,
Gil Young Cho,
Hyeokjun Heo,
Joonho Jang,
J. Strempfer,
G. Fabbris,
Y. Choi,
D. Haskel,
Jungho Kim,
J. -W. Kim,
B. J. Kim
Abstract:
Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the sq…
▽ More
Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the square-lattice iridate Sr$_2$IrO$_4$, which approximately realizes a pseudospin one-half Heisenberg antiferromagnet (AF) in the strong spin-orbit coupling limit. Upon cooling, the transition into the SN phase at T$_C$ $\approx$ 263 K is marked by a divergence in the static spin quadrupole susceptibility extracted from our Raman spectra, and concomitant emergence of a collective mode associated with the spontaneous breaking of rotational symmetries. The quadrupolar order persists in the antiferromagnetic (AF) phase below T$_N$ $\approx$ 230 K, and becomes directly observable through its interference with the AF order in resonant x-ray diffraction, which allows us to uniquely determine its spatial structure. Further, we find using resonant inelastic x-ray scattering a complete breakdown of coherent magnon excitations at short-wavelength scales, suggesting a resonating-valence-bond-like quantum entanglement in the AF state. Taken together, our results reveal a quantum order underlying the Néel AF that is widely believed to be intimately connected to the mechanism of high temperature superconductivity (HTSC).
△ Less
Submitted 14 December, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.