-
A Remark on Fu**o's work on the canonical bundle formula via period maps
Authors:
Hyunsuk Kim
Abstract:
Fu**o gave a proof in [Fuj03] for the semi-ampleness of the moduli part in the canonical bundle formula in the case when the general fibers are K3 surfaces or Abelian varieties. We show a similar statement when the general fibers are primitive symplectic varieties with mild singularities. This answers a question of Fu**o raised in the same article. Moreover, using the structure theory of varieti…
▽ More
Fu**o gave a proof in [Fuj03] for the semi-ampleness of the moduli part in the canonical bundle formula in the case when the general fibers are K3 surfaces or Abelian varieties. We show a similar statement when the general fibers are primitive symplectic varieties with mild singularities. This answers a question of Fu**o raised in the same article. Moreover, using the structure theory of varieties with trivial first Chern class, we reduce the question of semi-ampleness in the case of families of K-trivial varieties to a question when the general fibers satisfy a slightly weaker Calabi-Yau condition.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Gravitational Deflection of Light: A Heuristic Derivation at the Undergraduate Level
Authors:
Hongbin Kim,
Dong-han Yeom,
Jong Hyun Kim
Abstract:
In this paper, we present a new heuristic derivation of the gravitational deflection of light around the Sun at the undergraduate level. Instead of solving the geodesic equation directly, we compute the correct deflection angle by focusing on the acceleration term of null geodesics. Using this heuristic deviation, we expect that undergraduate students who have not learned general relativity will b…
▽ More
In this paper, we present a new heuristic derivation of the gravitational deflection of light around the Sun at the undergraduate level. Instead of solving the geodesic equation directly, we compute the correct deflection angle by focusing on the acceleration term of null geodesics. Using this heuristic deviation, we expect that undergraduate students who have not learned general relativity will be able to experience this computation, which is one of the most remarkable evidences of general relativity.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
All-in-One Deep Learning Framework for MR Image Reconstruction
Authors:
Geunu Jeong,
Hyeonsoo Kim,
Joonyoung Yang,
Kyungeun Jang,
Jeewook Kim
Abstract:
We introduce a novel, all-in-one deep learning framework for MR image reconstruction, enabling a single model to enhance image quality across multiple aspects of k-space sampling and to be effective across a wide range of clinical and technical scenarios. This DICOM-based algorithm serves as the core of SwiftMR (AIRS Medical, Seoul, Korea), which is FDA-cleared, CE-certified, and commercially avai…
▽ More
We introduce a novel, all-in-one deep learning framework for MR image reconstruction, enabling a single model to enhance image quality across multiple aspects of k-space sampling and to be effective across a wide range of clinical and technical scenarios. This DICOM-based algorithm serves as the core of SwiftMR (AIRS Medical, Seoul, Korea), which is FDA-cleared, CE-certified, and commercially available. We first detail the comprehensive development process of the model, including data collection, training pair preparation, model architecture design, and DICOM inference. We then assess the model's capability to enhance image quality in a multi-dimensional manner, specifically across various aspects of k-space sampling. Subsequently, we evaluate several features of the multi-dimensional enhancement: the accuracy of tunable denoising, the effectiveness of super-resolution in each encoding direction, and the reduction of artifacts that become more prominent at lower spatial resolutions. Additionally, we assess its compatibility with various scan parameter sets and its generalizability across scanner vendors not seen during training. Finally, we present specific cases demonstrating the model's utility in reducing scan time across anatomical regions in conjunction with protocol optimization. The proposed model is compatible with a broad spectrum of scenarios, including various vendors, pulse sequences, scan parameters, and anatomical regions. Its DICOM-based operation particularly enhances its applicability for real-world applications. Given its demonstrated effectiveness and versatility, we expect its use to expand in the field of clinical MRI.
△ Less
Submitted 26 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Using magnetic dynamics to measure the spin gap in a candidate Kitaev material
Authors:
Xinyi Jiang,
Qingzheng Qiu,
Cheng Peng,
Hoyoung Jang,
Wenjie Chen,
Xianghong **,
Li Yue,
Byungjune Lee,
Sang-Youn Park,
Minseok Kim,
Hyeong-Do Kim,
Xinqiang Cai,
Qizhi Li,
Tao Dong,
Nanlin Wang,
Joshua J. Turner,
Yuan Li,
Yao Wang,
Yingying Peng
Abstract:
Materials potentially hosting Kitaev spin-liquid states are considered crucial for realizing topological quantum computing. However, the intricate nature of spin interactions within these materials complicates the precise measurement of low-energy spin excitations indicative of fractionalized excitations. Using Na$_{2}$Co$_2$TeO$_{6}$ as an example, we study these low-energy spin excitations using…
▽ More
Materials potentially hosting Kitaev spin-liquid states are considered crucial for realizing topological quantum computing. However, the intricate nature of spin interactions within these materials complicates the precise measurement of low-energy spin excitations indicative of fractionalized excitations. Using Na$_{2}$Co$_2$TeO$_{6}$ as an example, we study these low-energy spin excitations using the time-resolved resonant elastic x-ray scattering (tr-REXS). Our observations unveil remarkably slow spin dynamics at the magnetic peak, whose recovery timescale is several nanoseconds. This timescale aligns with the extrapolated spin gap of $\sim$ 1 $μ$eV, obtained by density matrix renormalization group (DMRG) simulations in the thermodynamic limit. The consistency demonstrates the efficacy of tr-REXS in discerning low-energy spin gaps inaccessible to conventional spectroscopic techniques.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Active Neural 3D Reconstruction with Colorized Surface Voxel-based View Selection
Authors:
Hyunseo Kim,
Hyeonseo Yang,
Taekyung Kim,
YoonSung Kim,
**-Hwa Kim,
Byoung-Tak Zhang
Abstract:
Active view selection in 3D scene reconstruction has been widely studied since training on informative views is critical for reconstruction. Recently, Neural Radiance Fields (NeRF) variants have shown promising results in active 3D reconstruction using uncertainty-guided view selection. They utilize uncertainties estimated with neural networks that encode scene geometry and appearance. However, th…
▽ More
Active view selection in 3D scene reconstruction has been widely studied since training on informative views is critical for reconstruction. Recently, Neural Radiance Fields (NeRF) variants have shown promising results in active 3D reconstruction using uncertainty-guided view selection. They utilize uncertainties estimated with neural networks that encode scene geometry and appearance. However, the choice of uncertainty integration methods, either voxel-based or neural rendering, has conventionally depended on the types of scene uncertainty being estimated, whether geometric or appearance-related. In this paper, we introduce Colorized Surface Voxel (CSV)-based view selection, a new next-best view (NBV) selection method exploiting surface voxel-based measurement of uncertainty in scene appearance. CSV encapsulates the uncertainty of estimated scene appearance (e.g., color uncertainty) and estimated geometric information (e.g., surface). Using the geometry information, we interpret the uncertainty of scene appearance 3D-wise during the aggregation of the per-voxel uncertainty. Consequently, the uncertainty from occluded and complex regions is recognized under challenging scenarios with limited input data. Our method outperforms previous works on popular datasets, DTU and Blender, and our new dataset with imbalanced viewpoints, showing that the CSV-based view selection significantly improves performance by up to 30%.
△ Less
Submitted 10 June, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Multitask Extension of Geometrically Aligned Transfer Encoder
Authors:
Sung Moon Ko,
Sumin Lee,
Dae-Woong Jeong,
Hyunseung Kim,
Chanhui Lee,
Soorin Yim,
Sehui Han
Abstract:
Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transf…
▽ More
Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Haptic-Based Bilateral Teleoperation of Aerial Manipulator for Extracting Wedged Object with Compensation of Human Reaction Time
Authors:
Jeonghyun Byun,
Dohyun Eom,
H. ** Kim
Abstract:
Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abru…
▽ More
Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abrupt changes in the interaction force. In this paper, we propose a human reaction time compensating haptic-based bilateral teleoperation strategy for an aerial manipulator extracting a wedged object from a static structure (i.e., plug-pulling), which incurs an abrupt decrease in the interaction force and causes additional difficulty for an aerial platform. A haptic device composed of a 4-degree-of-freedom robotic arm and a gripper is made for the teleoperation of aerial wedged object-extracting tasks, and a haptic-based teleoperation method to execute the aerial manipulator by the haptic device is introduced. We detect the extraction of the object by the estimation of the external force exerted on the aerial manipulator and generate reference trajectories for both the aerial manipulator and the haptic device after the extraction. As an example of the extraction of a wedged object, we conduct comparative plug-pulling experiments with a quadrotor-based aerial manipulator. The results validate that the proposed bilateral teleoperation method reduces the overshoot in the aerial manipulator's position and ensures fast recovery to its initial position after extracting the wedged object.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Multi-intent-aware Session-based Recommendation
Authors:
Min** Choi,
Hye-young Kim,
Hyunsouk Cho,
Jongwuk Lee
Abstract:
Session-based recommendation (SBR) aims to predict the following item a user will interact with during an ongoing session. Most existing SBR models focus on designing sophisticated neural-based encoders to learn a session representation, capturing the relationship among session items. However, they tend to focus on the last item, neglecting diverse user intents that may exist within a session. Thi…
▽ More
Session-based recommendation (SBR) aims to predict the following item a user will interact with during an ongoing session. Most existing SBR models focus on designing sophisticated neural-based encoders to learn a session representation, capturing the relationship among session items. However, they tend to focus on the last item, neglecting diverse user intents that may exist within a session. This limitation leads to significant performance drops, especially for longer sessions. To address this issue, we propose a novel SBR model, called Multi-intent-aware Session-based Recommendation Model (MiaSRec). It adopts frequency embedding vectors indicating the item frequency in session to enhance the information about repeated items. MiaSRec represents various user intents by deriving multiple session representations centered on each item and dynamically selecting the important ones. Extensive experimental results show that MiaSRec outperforms existing state-of-the-art SBR models on six datasets, particularly those with longer average session length, achieving up to 6.27% and 24.56% gains for MRR@20 and Recall@20. Our code is available at https://github.com/**530/MiaSRec.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions
Authors:
Donghee Choi,
Mogan Gim,
Donghyeon Park,
Mujeen Sung,
Hyunjae Kim,
Jaewoo Kang,
Jihun Choi
Abstract:
This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which res…
▽ More
This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which results in a rich knowledgebase of multidisciplinary food-related assertions. Additionally, we present FoodBench, a novel benchmark to evaluate culinary decision support systems. From evaluations with FoodBench, we empirically prove that CookingSense improves the performance of retrieval augmented language models. We also validate the quality and variety of assertions in CookingSense through qualitative analysis.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Authors:
Sungjune Park,
Hyunjun Kim,
Yong Man Ro
Abstract:
Pedestrian detection is a crucial field of computer vision research which can be adopted in various real-world applications (e.g., self-driving systems). However, despite noticeable evolution of pedestrian detection, pedestrian representations learned within a detection framework are usually limited to particular scene data in which they were trained. Therefore, in this paper, we propose a novel a…
▽ More
Pedestrian detection is a crucial field of computer vision research which can be adopted in various real-world applications (e.g., self-driving systems). However, despite noticeable evolution of pedestrian detection, pedestrian representations learned within a detection framework are usually limited to particular scene data in which they were trained. Therefore, in this paper, we propose a novel approach to construct versatile pedestrian knowledge bank containing representative pedestrian knowledge which can be applicable to various detection frameworks and adopted in diverse scenes. We extract generalized pedestrian knowledge from a large-scale pretrained model, and we curate them by quantizing most representative features and guiding them to be distinguishable from background scenes. Finally, we construct versatile pedestrian knowledge bank which is composed of such representations, and then we leverage it to complement and enhance pedestrian features within a pedestrian detection framework. Through comprehensive experiments, we validate the effectiveness of our method, demonstrating its versatility and outperforming state-of-the-art detection performances.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Joint Pricing and Matching for Resource Allocation Platforms via Min-cost Flow Problem
Authors:
Yuya Hikima,
Yasunori Akagi,
Hideaki Kim
Abstract:
Stochastic matching is the stochastic version of the well-known matching problem, which consists in maximizing the rewards of a matching under a set of probability distributions associated with the nodes and edges. In most stochastic matching problems, the probability distributions inherent in the nodes and edges are set a priori and are not controllable. However, many resource allocation platform…
▽ More
Stochastic matching is the stochastic version of the well-known matching problem, which consists in maximizing the rewards of a matching under a set of probability distributions associated with the nodes and edges. In most stochastic matching problems, the probability distributions inherent in the nodes and edges are set a priori and are not controllable. However, many resource allocation platforms can control the probability distributions by changing prices. For example, a rideshare platform can control the distribution of the number of requesters by setting the fare to maximize the reward of a taxi-requester matching. Although several methods for optimizing price have been developed, optimizations in consideration of the matching problem are still in its infancy. In this paper, we tackle the problem of optimizing price in the consideration of the resulting bipartite graph matching, given the effect of the price on the probabilistic uncertainty in the graph. Even though our problem involves hard to evaluate objective values and is non-convex, we construct a (1-1/e)-approximation algorithm under the assumption that a convex min-cost flow problem can be solved exactly.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction
Authors:
K. Aditya Mohan,
Massimiliano Ferrucci,
Chuck Divin,
Garrett A. Stevenson,
Hyo** Kim
Abstract:
4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in…
▽ More
4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission
Authors:
Mingyu Yang,
Bowen Liu,
Boyang Wang,
Hun-Seok Kim
Abstract:
Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages…
▽ More
Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages pre-trained text-to-image diffusion models to enhance the realism of images transmitted over the channel. The proposed DiffJSCC utilizes prior deep JSCC frameworks to deliver an initial reconstructed image at the receiver. Then, the spatial and textual features are extracted from the initial reconstruction, which, together with the channel state information (e.g., signal-to-noise ratio, SNR), are passed to a control module to fine-tune the pre-trained Stable Diffusion model. Extensive experiments on the Kodak dataset reveal that our method significantly surpasses both conventional methods and prior deep JSCC approaches on perceptual metrics such as LPIPS and FID scores, especially with poor channel conditions and limited bandwidth. Notably, DiffJSCC can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols (<0.008 symbols per pixel) under 1dB SNR. Our code will be released in https://github.com/mingyuyng/DiffJSCC.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout
Authors:
Julia Gonski,
Aseem Gupta,
Haoyi Jia,
Hyunjoon Kim,
Lorenzo Rota,
Larry Ruckman,
Angelo Dragone,
Ryan Herbst
Abstract:
Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experi…
▽ More
Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experiments. An open-source framework called "FABulous" was used to design eFPGAs using 130 nm and 28 nm CMOS technology nodes, which were subsequently fabricated and verified through testing. The capability of an eFPGA to act as a front-end readout chip was assessed using simulation of high energy particles passing through a silicon pixel sensor. A machine learning-based classifier, designed for reduction of sensor data at the source, was synthesized and configured onto the eFPGA. A successful proof-of-concept was demonstrated through reproduction of the expected algorithm result on the eFPGA with perfect accuracy. Further development of the eFPGA technology and its application to collider detector readout is discussed.
△ Less
Submitted 1 July, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG
Authors:
Cheol-Hui Lee,
Hakseung Kim,
Hyun-jee Han,
Min-Kyung Jung,
Byung C. Yoon,
Dong-Joo Kim
Abstract:
The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo…
▽ More
The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need for large datasets with labels and the inherent biases in human-generated annotations. This paper introduces NeuroNet, a self-supervised learning (SSL) framework designed to effectively harness unlabeled single-channel sleep electroencephalogram (EEG) signals by integrating contrastive learning tasks and masked prediction tasks. NeuroNet demonstrates superior performance over existing SSL methodologies through extensive experimentation conducted across three polysomnography (PSG) datasets. Additionally, this study proposes a Mamba-based temporal context module to capture the relationships among diverse EEG epochs. Combining NeuroNet with the Mamba-based temporal context module has demonstrated the capability to achieve, or even surpass, the performance of the latest supervised learning methodologies, even with a limited amount of labeled data. This study is expected to establish a new benchmark in sleep stage classification, promising to guide future research and applications in the field of sleep analysis.
△ Less
Submitted 13 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods
Authors:
Min Kyu Shin,
Su-Jeong Park,
Seung-Keol Ryu,
Heeyeon Kim,
Han-Lim Choi
Abstract:
This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories ge…
▽ More
This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Stability of the Standard Model vacuum with respect to vacuum tunneling to the Komatsu vacuum in the cMSSM
Authors:
Hyukjung Kim,
Ewan D. Stewart,
Heeseung Zoe
Abstract:
We investigate the stability of the Standard Model vacuum with respect to vacuum tunneling to the Komatsu vacuum, which exists when $m_L^2 + m_{H_u}^2<0$, in the cMSSM. Employing the numerical tools SARAH, SPheno and CosmoTransitions, we scan and constrain the parameter space of the cMSSM up to 10 TeV. Regions excluded due to having a vacuum tunneling half-life less than the age of the observable…
▽ More
We investigate the stability of the Standard Model vacuum with respect to vacuum tunneling to the Komatsu vacuum, which exists when $m_L^2 + m_{H_u}^2<0$, in the cMSSM. Employing the numerical tools SARAH, SPheno and CosmoTransitions, we scan and constrain the parameter space of the cMSSM up to 10 TeV. Regions excluded due to having a vacuum tunneling half-life less than the age of the observable universe are concentrated near the regions where the Standard Model vacuum is tachyonic and are more stringent at smaller $m_0$, larger and negative $A_0$, and larger $\tanβ$. New excluded regions, which satisfy $m_h \simeq 125 \text{GeV}$, are found.
△ Less
Submitted 26 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Authors:
Eunsu Baek,
Keondo Park,
Jiyoon Kim,
Hyung-Sin Kim
Abstract:
Computer vision applications predict on digital images acquired by a camera from physical scenes through light. However, conventional robustness benchmarks rely on perturbations in digitized images, diverging from distribution shifts occurring in the image acquisition process. To bridge this gap, we introduce a new distribution shift dataset, ImageNet-ES, comprising variations in environmental and…
▽ More
Computer vision applications predict on digital images acquired by a camera from physical scenes through light. However, conventional robustness benchmarks rely on perturbations in digitized images, diverging from distribution shifts occurring in the image acquisition process. To bridge this gap, we introduce a new distribution shift dataset, ImageNet-ES, comprising variations in environmental and camera sensor factors by directly capturing 202k images with a real camera in a controllable testbed. With the new dataset, we evaluate out-of-distribution (OOD) detection and model robustness. We find that existing OOD detection methods do not cope with the covariate shifts in ImageNet-ES, implying that the definition and detection of OOD should be revisited to embrace real-world distribution shifts. We also observe that the model becomes more robust in both ImageNet-C and -ES by learning environment and sensor variations in addition to existing digital augmentations. Lastly, our results suggest that effective shift mitigation via camera sensor control can significantly improve performance without increasing model size. With these findings, our benchmark may aid future research on robustness, OOD, and camera sensor control for computer vision. Our code and dataset are available at https://github.com/Edw2n/ImageNet-ES.
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Fast Ensembling with Diffusion Schrödinger Bridge
Authors:
Hyunsu Kim,
Jongmin Yoon,
Juho Lee
Abstract:
Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward pass…
▽ More
Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward passes for each parameter during the inference stage. We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge. Based on the theory of the Schrödinger bridge, this method directly learns to simulate an Stochastic Differential Equation (SDE) that connects the output distribution of a single ensemble member to the output distribution of the ensembled model, allowing us to obtain ensemble prediction without having to invoke forward pass through all the ensemble models. By substituting the heavy ensembles with this lightweight neural network constructing DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet. Our implementation is available at https://github.com/kim-hyunsu/dbn.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Lessons Learned in Performing a Trustworthy AI and Fundamental Rights Assessment
Authors:
Marjolein Boonstra,
Frédérick Bruneault,
Subrata Chakraborty,
Tjitske Faber,
Alessio Gallucci,
Eleanore Hickman,
Gerard Kema,
Hee** Kim,
Jaap Kooiker,
Elisabeth Hildt,
Annegret Lamadé,
Emilie Wiinblad Mathez,
Florian Möslein,
Genien Pathuis,
Giovanni Sartor,
Marijke Steege,
Alice Stocco,
Willy Tadema,
Jarno Tuimala,
Isabel van Vledder,
Dennis Vetter,
Jana Vetter,
Magnus Westerlund,
Roberto V. Zicari
Abstract:
This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 throug…
▽ More
This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 through January 2023. During the pilot, the practical application of a deep learning algorithm from the province of Frŷslan was assessed. The AI maps heathland grassland by means of satellite images for monitoring nature reserves. Environmental monitoring is one of the crucial activities carried on by society for several purposes ranging from maintaining standards on drinkable water to quantifying the CO2 emissions of a particular state or region. Using satellite imagery and machine learning to support decisions is becoming an important part of environmental monitoring. The main focus of this report is to share the experiences, results and lessons learned from performing both a Trustworthy AI assessment using the Z-Inspection$^{\small{\circledR}}$ process and the EU framework for Trustworthy AI, and combining it with a Fundamental Rights assessment using the Fundamental Rights and Algorithms Impact Assessment (FRAIA) as recommended by the Dutch government for the use of AI algorithms by the Dutch public authorities.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Robust electrothermal switching of optical phase change materials through computer-aided adaptive pulse optimization
Authors:
Parth Garud,
Kiumars Aryana,
Cosmin Constantin Popescu,
Steven Vitale,
Rashi Sharma,
Kathleen Richardson,
Tian Gu,
Juejun Hu,
Hyun Jung Kim
Abstract:
Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including…
▽ More
Electrically tunable optical devices present diverse functionalities for manipulating electromagnetic waves by leveraging elements capable of reversibly switching between different optical states. This adaptability in adjusting their responses to electromagnetic waves after fabrication is crucial for develo** more efficient and compact optical systems for a broad range of applications including sensing, imaging, telecommunications, and data storage. Chalcogenide-based phase change materials (PCMs) have shown great promise due to their stable, non-volatile phase transition between amorphous and crystalline states. Nonetheless, optimizing the switching parameters of PCM devices and maintaining their stable operation over thousands of cycles with minimal variation can be challenging. In this paper, we report on the critical role of PCM pattern as well as electrical pulse form in achieving reliable and stable switching, extending the operational lifetime of the device beyond 13,000 switching events. To achieve this, we have developed a computer-aided algorithm that monitors optical changes in the device and adjusts the applied voltage in accordance with the phase transformation process, thereby significantly enhancing the lifetime of these reconfigurable devices. Our findings reveal that patterned PCM structures show significantly higher endurance compared to blanket PCM thin films.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Anisotropic electron-phonon interactions in 2D lead-halide perovskites
Authors:
Jaco J. Geuchies,
Johan Klarbring,
Lucia Di Virgillio,
Shuai Fu,
Sheng Qu,
Guangyu Liu,
Hai Wang,
Jarvist M. Frost,
Aron Walsh,
Mischa Bonn,
Heejae Kim
Abstract:
Two-dimensional hybrid organic-inorganic metal halide perovskites offer enhanced stability for perovskite-based applications. Their crystal structure's soft and ionic nature gives rise to strong interactions between charge carriers and ionic rearrangements. Here, we investigate the interaction of photo-generated electrons and ionic polarizations in single-crystal 2D perovskite butylammonium lead i…
▽ More
Two-dimensional hybrid organic-inorganic metal halide perovskites offer enhanced stability for perovskite-based applications. Their crystal structure's soft and ionic nature gives rise to strong interactions between charge carriers and ionic rearrangements. Here, we investigate the interaction of photo-generated electrons and ionic polarizations in single-crystal 2D perovskite butylammonium lead iodide, varying the inorganic lammelae thickness in the 2D single crystals. We determined the directionality of the transition dipole moments of the relevant phonon modes (in the 0.3-3 THz range) by angle-and-polarization dependent THz transmission measurements. We find a clear anisotropy of the in-plane photoconductivity, with a 10% reduction along the axis parallel with the transition dipole moment of the most strongly coupled phonon. Detailed calculations, based on Feynman polaron theory, indicate that the anisotropy originates from directional electron-phonon interactions.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Pinwheel Outflow induced by Stellar Mass Loss in a Coplanar Triple System
Authors:
Hyosun Kim,
Mark R. Morris,
Jongsoo Kim,
**hua He
Abstract:
We develop a physical framework for interpreting complex circumstellar patterns whorled around asymptotic giant branch (AGB) stars by investigating stable, coplanar triple systems using hydrodynamic and particle simulations. The introduction of a close tertiary body causes an additional periodic variation in the orbital velocity and trajectory of the AGB star. As a result, the circumstellar outflo…
▽ More
We develop a physical framework for interpreting complex circumstellar patterns whorled around asymptotic giant branch (AGB) stars by investigating stable, coplanar triple systems using hydrodynamic and particle simulations. The introduction of a close tertiary body causes an additional periodic variation in the orbital velocity and trajectory of the AGB star. As a result, the circumstellar outflow builds a fine non-Archimedean spiral pattern superimposed upon the Archimedean spiral produced by the outer binary alone. This fine spiral can be approximated by off-centered circular rings that become tangent to each other at the location of the Archimedean spiral. The superimposed fine pattern fades out relatively quickly as a function of distance from the center of the system, in contrast to the dominant Archimedean spiral pattern, which presents a much slower fractional density decrease with radius. The different rates of radial decrease of the density contrast in the two superimposed patterns, coupled with their different time and spatial scales, lead to an apparent, but illusory radial change in the observed pattern interval, as has been reported, for example, in CW Leo. The function describing the detailed radial dependence of the expansion velocity is different in the two patterns, which may be used to distinguish them. The shape of the circumstellar whorled pattern is further explored as a function of the orbital eccentricity and the inner companion's mass. Although this study is confined to stable, coplanar triple systems, the results are likely applicable to moderately noncoplanar systems and open interesting avenues for studying noncoplanar systems.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Aligning Language Models to Explicitly Handle Ambiguity
Authors:
Hyuhng Joon Kim,
Youna Kim,
Cheonbok Park,
Junyeob Kim,
Choonghyun Park,
Kang Min Yoo,
Sang-goo Lee,
Taeuk Kim
Abstract:
In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure…
▽ More
In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.
△ Less
Submitted 16 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Saturated RISE control for considering rotor thrust saturation of fully actuated multirotor
Authors:
Dongjae Lee,
H. ** Kim
Abstract:
This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performanc…
▽ More
This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performance degradation or even instability due to a gap between the commanded input and the actual input during saturation. To address these issues, we present a modified saturated RISE (Robust Integral of the Sign of the Error) control method. The proposed modified saturated RISE controller is developed for expansion to a system with a non-diagonal, state-dependent input matrix. Next, we present reformulation of the system dynamics of a fully actuated multirotor, and apply the control law to the system. The proposed method is validated in simulation where the proposed controller outperforms the existing one thanks to the capability of handling the input matrix.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Autonomous aerial perching and unperching using omnidirectional tiltrotor and switching controller
Authors:
Dongjae Lee,
Sunwoo Hwang,
Jeonghyun Byun,
Seung Jae Lee,
H. ** Kim
Abstract:
Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and pe…
▽ More
Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and perching. To enable stable perching and unperching maneuvers on/from a vertical surface, a lightweight ($\approx$ $1$ \si{kg}), fully actuated tiltrotor that can hover at $90^\circ$ pitch angle is first developed. We design a perching/unperching module composed of a single servomotor and a magnet, which is then mounted on the tiltrotor. A switching controller including exclusive control modes for transitions between free-flight and perching is proposed. Lastly, we propose a simple yet effective strategy to ensure robust perching in the presence of measurement and control errors and avoid collisions with the perching site immediately after unperching. We validate the proposed framework in experiments where the tiltrotor successfully performs perching and unperching on/from a vertical surface during flight. We further show effectiveness of the proposed transition mode in the switching controller by ablation studies where large overshoot and even collision with a perching site occur. To the best of the authors' knowledge, this work presents the first autonomous aerial unperching framework using a fully actuated tiltrotor.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Object Remover Performance Evaluation Methods using Class-wise Object Removal Images
Authors:
Changsuk Oh,
Dongseok Shim,
Taekbeom Lee,
H. ** Kim
Abstract:
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work…
▽ More
Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current works reporting quantitative performance evaluations utilize original images as references. In this letter, to validate the current evaluation methods cannot properly evaluate the performance of an object remover, we create a dataset with object removal ground truth and compare the evaluations made by the current methods using original images to those utilizing object removal ground truth images. The disparities between two evaluation sets validate that the current methods are not suitable for measuring the performance of an object remover. Additionally, we propose new evaluation methods tailored to gauge the performance of an object remover. The proposed methods evaluate the performance through class-wise object removal results and utilize images without the target class objects as a comparison set. We confirm that the proposed methods can make judgments consistent with human evaluators in the COCO dataset, and that they can produce measurements aligning with those using object removal ground truth in the self-acquired dataset.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
Authors:
Iaroslav Melekhov,
Anand Umashankar,
Hyeong-** Kim,
Vladislav Serkov,
Dusty Argyle
Abstract:
We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To g…
▽ More
We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To guarantee the dataset's quality and utility, we have thoroughly curated the point labels through an internal team of experts, ensuring accuracy and consistency in semantic labeling. The dataset is engineered to move forward the fields of 3D urban modeling, scene understanding, and utility infrastructure management by presenting new challenges and potential applications. As a benchmark, we report qualitative and quantitative analysis of a voxel-based point cloud segmentation approach based on the Minkowski Engine.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Authors:
Huihan Li,
Liwei Jiang,
Jena D. Huang,
Hyunwoo Kim,
Sebastin Santy,
Taylor Sorensen,
Bill Yuchen Lin,
Nouha Dziri,
Xiang Ren,
Ye** Choi
Abstract:
As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are as…
▽ More
As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are associated to each culture by the LLM. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures. We also discover that LLMs have an uneven degree of diversity in the culture symbols, and that cultures from different geographic regions have different presence in LLMs' culture-agnostic generation. Our findings promote further research in studying the knowledge and fairness of global culture perception in LLMs. Code and Data can be found in: https://github.com/huihanlhh/Culture-Gen/
△ Less
Submitted 26 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
The Cost of Entanglement Renormalization on a Fault-Tolerant Quantum Computer
Authors:
Joshua Job,
Isaac H. Kim,
Eric Johnston,
Steve Adachi
Abstract:
We perform a detailed resource estimate for the prospect of using deep entanglement renormalization ansatz (DMERA) on a fault-tolerant quantum computer, focusing on the regime in which the target system is large. For probing a relatively large system size ($64\times 64$), we observe up to an order of magnitude reduction in the number of qubits, compared to the approaches based on quantum phase est…
▽ More
We perform a detailed resource estimate for the prospect of using deep entanglement renormalization ansatz (DMERA) on a fault-tolerant quantum computer, focusing on the regime in which the target system is large. For probing a relatively large system size ($64\times 64$), we observe up to an order of magnitude reduction in the number of qubits, compared to the approaches based on quantum phase estimation (QPE). We discuss two complementary strategies to measure the energy. The first approach is based on a random sampling of the local terms of the Hamiltonian, requiring $\mathcal{O}(1/ε^2)$ invocations of quantum circuits, each of which have depth of at most $\mathcal{O}(\log N)$, where $ε$ is the relative precision in the energy and $N$ is the system size. The second approach is based on a coherent estimation of the expectation value of observables averaged over space, which achieves the Heisenberg scaling while incurring only a logarithmic cost in the system size. For estimating the energy per site of $ε$, $\mathcal{O}\left(\frac{\log N}ε \right)$ $T$ gates and $\mathcal{O}\left(\log N \right)$ qubits suffice. The constant factor of the leading contribution is shown to be determined by the depth of the DMERA circuit, the gates used in the ansatz, and the periodicity of the circuit. We also derive tight bounds on the variance of the energy gradient, assuming the gates are random Pauli rotations.
△ Less
Submitted 16 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Observation of Cooper-pair density modulation state
Authors:
Lingyuan Kong,
Michał Papaj,
Hyun** Kim,
Yiran Zhang,
Eli Baum,
Hui Li,
Kenji Watanabe,
Takashi Taniguchi,
Genda Gu,
Patrick A. Lee,
Stevan Nadj-Perge
Abstract:
Superconducting states that break space-group symmetries of the underlying crystal can exhibit nontrivial spatial modulation of the order parameter. Previously, such remarkable states were intimately associated with the breaking of translational symmetry, giving rise to the density-wave orders, with wavelengths spanning several unit cells. However, a related basic concept has been long overlooked:…
▽ More
Superconducting states that break space-group symmetries of the underlying crystal can exhibit nontrivial spatial modulation of the order parameter. Previously, such remarkable states were intimately associated with the breaking of translational symmetry, giving rise to the density-wave orders, with wavelengths spanning several unit cells. However, a related basic concept has been long overlooked: when only intra-unit-cell symmetries of the space group are broken, the superconducting states can display a distinct type of nontrivial modulation preserving long-range lattice translation. Here, we refer to this new concept as the pair density modulation (PDM), and report the first observation of a PDM state in exfoliated thin flakes of iron-based superconductor FeTe$_{\text{0.55}}$Se$_{\text{0.45}}$. Using scanning tunneling microscopy, we discover robust superconducting gap modulation with the wavelength corresponding to the lattice periodicity and the amplitude exceeding 30% of the gap average. Importantly, we find that the observed modulation originates from the large difference in superconducting gaps on the two nominally equivalent iron sublattices. The experimental findings, backed up by model calculations, suggest that in contrast to the density-wave orders, the PDM state is driven by the interplay of sublattice symmetry breaking and a peculiar nematic distortion specific to the thin flakes. Our results establish new frontiers for exploring the intertwined orders in strong-correlated electronic systems and open a new chapter for iron-based superconductors.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Disorder Chaos in Short-Range, Diluted, and Lévy Spin Glasses
Authors:
Wei-Kuo Chen,
Heejune Kim,
Arnab Sen
Abstract:
In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models.
The first, called the mixed even $p$-spin shor…
▽ More
In a recent breakthrough [arXiv:2301.04112], Chatterjee proved site disorder chaos in the Edwards-Anderson (EA) short-range spin glass model utilizing the Hermite spectral method. In this paper, we demonstrate the further usefulness of this Hermite spectral approach by extending the validity of site disorder chaos in three related spin glass models.
The first, called the mixed even $p$-spin short-range model, is a generalization of the EA model where the underlying graph is a deterministic bounded degree hypergraph consisting of hyperedges with even number of vertices. The second model is the diluted mixed $p$-spin model, which is allowed to have hyperedges with both odd and even number of vertices. For both models, our results hold under general symmetric disorder distributions. The main novelty of our argument is played by an elementary algebraic equation for the Fourier-Hermite series coefficients for the two-spin correlation functions. It allows us to deduce necessary geometric conditions to determine the contributing coefficients in the overlap function, which in spirit is the same as the crucial Lemma 1 in [arXiv:2301.04112]. Finally, we also establish disorder chaos in the Lévy model with stable index $α\in (1, 2)$.
△ Less
Submitted 13 June, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Authors:
Si Ung Noh,
Junguk Hong,
Chaemin Lim,
Seongyeon Park,
Jeehyun Kim,
Hanjun Kim,
Youngsok Kim,
**ho Lee
Abstract:
Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel applications have been shown to benefit from these PIM-enabled DIMMs, but further speedup is often lim…
▽ More
Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel applications have been shown to benefit from these PIM-enabled DIMMs, but further speedup is often limited by the huge overhead of inter-PE communication. This mainly comes from the slow CPU-mediated inter-PE communication methods which incurs significant performance overheads, making it difficult for PIM-enabled DIMMs to accelerate a wider range of applications. Prior studies have tried to alleviate the communication bottleneck, but they lack enough flexibility and performance to be used for a wide range of applications. In this paper, we present PID-Comm, a fast and flexible collective inter-PE communication framework for commodity PIM-enabled DIMMs. The key idea of PID-Comm is to abstract the PEs as a multi-dimensional hypercube and allow multiple instances of collective inter-PE communication between the PEs belonging to certain dimensions of the hypercube. Leveraging this abstraction, PID-Comm first defines eight collective inter-PE communication patterns that allow applications to easily express their complex communication patterns. Then, PID-Comm provides high-performance implementations of the collective inter-PE communication patterns optimized for the DIMMs. Our evaluation using 16 UPMEM DIMMs and representative parallel algorithms shows that PID-Comm greatly improves the performance by up to 4.20x compared to the existing inter-PE communication implementations. The implementation of PID-Comm is available at https://github.com/AIS-SNU/PID-Comm.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Absolute dimensions of solar-type eclipsing binaries. NY Hya: A test for magnetic stellar evolution models
Authors:
T. C. Hinse,
O. Baştürk,
J. Southworth,
G. A. Feiden,
J. Tregloan-Reed,
V. B. Kostov,
J. Livingston,
E. M. Esmer,
Mesut Yılmaz,
Selçuk Yalçınkaya,
Şeyma Torun,
J. Vos,
D. F. Evans,
J. C. Morales,
J. C. A. Wolf,
E. H. Olsen,
J. V. Clausen,
B. E. Helt,
C. T. K. Lý,
O. Stahl,
R. Wells,
M. Herath,
U. G. Jørgensen,
M. Dominik,
J. Skottfelt
, et al. (7 additional authors not shown)
Abstract:
The binary star NY Hya is a bright, detached, double-lined eclipsing system with an orbital period of just under five days with two components each nearly identical to the Sun and located in the solar neighbourhood.
The objective of this study is to test and confront various stellar evolution models for solar-type stars based on accurate measurements of stellar mass and radius.
We present new…
▽ More
The binary star NY Hya is a bright, detached, double-lined eclipsing system with an orbital period of just under five days with two components each nearly identical to the Sun and located in the solar neighbourhood.
The objective of this study is to test and confront various stellar evolution models for solar-type stars based on accurate measurements of stellar mass and radius.
We present new ground-based spectroscopic and photometric as well as high-precision space-based photometric and astrometric data from which we derive orbital as well as physical properties of the components via the method of least-squares minimisation based on a standard binary model valid for two detached components. Classic statistical techniques were invoked to test the significance of model parameters. Additional empirical evidence was compiled from the public domain; the derived system properties were compared with archival broad-band photometry data enabling a measurement of the system's spectral energy distribution that allowed an independent estimate of stellar properties. We also utilised semi-empirical calibration methods to derive atmospheric properties from Strömgren photometry and related colour indices. Data was used to confront the observed physical properties with classic and magnetic stellar evolution models.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs
Authors:
Hyeon** Kim,
Yi Hu,
Kai Ye,
Ning Lu
Abstract:
This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,…
▽ More
This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset, comprising 1M load images derived from smart meter data collected over a two-year period from 2,000 residential users. The training methodology is self-supervised, masked image modeling, wherein masked load images are restored to reveal hidden relationships among image patches. The pre-trained ViT encoder is then applied to various downstream tasks, including the identification of electric vehicle (EV) charging loads and behind-the-meter solar photovoltaic (PV) systems and load disaggregation. Simulation results illustrate ViT4LPA's superior performance compared to existing neural network models in downstream tasks. Additionally, we conduct an in-depth analysis of the attention weights within the ViT4LPA model to gain insights into its information flow mechanisms.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Authors:
Minkuk Kim,
Hyeon Bae Kim,
**young Moon,
**woo Choi,
Seong Tae Kim
Abstract:
There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is chall…
▽ More
There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is challenging due to the lack of semantic content. In this study, we address this by proposing a novel framework inspired by the cognitive information processing of humans. Our model utilizes external memory to incorporate prior knowledge. The memory retrieval method is proposed with cross-modal video-to-text matching. To effectively incorporate retrieved text features, the versatile encoder and the decoder with visual and textual cross-attention modules are designed. Comparative experiments have been conducted to show the effectiveness of the proposed method on ActivityNet Captions and YouCook2 datasets. Experimental results show promising performance of our model without extensive pretraining from a large video dataset.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing
Authors:
Jaemin Kang,
Hoeseok Yang,
Hyungshin Kim
Abstract:
Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been co…
▽ More
Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution
Authors:
Jihee Kim,
Jia Park,
Jiwon Shin,
Hanseok Kim,
Kahyun Kim,
Haengbeom Shin,
Ha-Jung Park,
Woo-Seok Choi
Abstract:
This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq…
▽ More
This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators. To this end, a fractional divider controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the sub-optimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm, which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28nm CMOS process, the proposed 4x32Gb/s RX shows a low integrated fractional spur of -40.4dBc at a 2500ppm frequency offset. Furthermore, it improves bit-error-rate performance by increasing the VEM by 17%. The entire RX achieves the energy efficiency of 1.8pJ/bit with the aggregate data rate of 128Gb/s.
△ Less
Submitted 22 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2
Authors:
Daniel Enright,
Yecheng Xiang,
Hyunjong Choi,
Hyoseung Kim
Abstract:
This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor th…
▽ More
This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91\% reduction in end-to-end response time of their critical callback chains.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Efficient Quantum Circuits for Machine Learning Activation Functions including Constant T-depth ReLU
Authors:
Wei Zi,
Siyi Wang,
Hyunji Kim,
Xiaoming Sun,
Anupam Chattopadhyay,
Patrick Rebentrost
Abstract:
In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing $T$-depth. Spec…
▽ More
In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing $T$-depth. Specifically, we present novel implementations of ReLU and leaky ReLU activation functions, achieving constant $T$-depths of 4 and 8, respectively. Leveraging quantum lookup tables, we extend our exploration to other activation functions such as the sigmoid. This approach enables us to customize precision and $T$-depth by adjusting the number of qubits, making our results more adaptable to various application scenarios. This study represents a significant advancement towards enhancing the practicality and application of quantum machine learning.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136: Microlensing events with prominent orbital effects
Authors:
Cheongho Han,
Andrzej Udalski,
Ian A. Bond,
Chung-Uk Lee,
Andrew Gould,
Michael D. Albrow,
Sun-Ju Chung,
Kyu-Ha Hwang,
Youn Kil Jung,
Hyoun-Woo Kim,
Yoon-Hyun Ryu,
Yossi Shvartzvald,
In-Gu Shin,
Jennifer C. Yee,
Hong**g Yang,
Weicheng Zang,
Sang-Mok Cha,
Doeon Kim,
Dong-** Kim,
Seung-Lee Kim,
Dong-Joo Lee,
Yongseok Lee,
Byeong-Gon Park,
Richard W. Pogge,
Przemek Mróz
, et al. (38 additional authors not shown)
Abstract:
We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions…
▽ More
We undertake a project to reexamine microlensing data gathered from high-cadence surveys. The aim of the project is to reinvestigate lensing events with light curves exhibiting intricate anomaly features associated with caustics, yet lacking prior proposed models to explain these features. Through detailed reanalyses considering higher-order effects, we identify that accounting for orbital motions of lenses is vital in accurately explaining the anomaly features observed in the light curves of the lensing events OGLE-2018-BLG-0971, MOA-2023-BLG-065, and OGLE-2023-BLG-0136. We estimate the masses and distances to the lenses by conducting Bayesian analyses using the lensing parameters of the newly found lensing solutions. From these analyses, we identify that the lenses of the events OGLE-2018-BLG-0971 and MOA-2023-BLG-065 are binaries composed of M dwarfs, while the lens of OGLE-2023-BLG-0136 is likely to be a binary composed of an early K-dwarf primary and a late M-dwarf companion. For all lensing events, the probability of the lens residing in the bulge is considerably higher than that of it being located in the disk.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Strict area law implies commuting parent Hamiltonian
Authors:
Isaac H. Kim,
Ting-Chun Lin,
Daniel Ranard,
Bowen Shi
Abstract:
We show that in two spatial dimensions, when a quantum state has entanglement entropy obeying a strict area law, meaning $S(A)=α|\partial A| - γ$ for constants $α, γ$ independent of lattice region $A$, then it admits a commuting parent Hamiltonian. More generally, we prove that the entanglement bootstrap axioms in 2D imply the existence of a commuting, local parent Hamiltonian with a stable spectr…
▽ More
We show that in two spatial dimensions, when a quantum state has entanglement entropy obeying a strict area law, meaning $S(A)=α|\partial A| - γ$ for constants $α, γ$ independent of lattice region $A$, then it admits a commuting parent Hamiltonian. More generally, we prove that the entanglement bootstrap axioms in 2D imply the existence of a commuting, local parent Hamiltonian with a stable spectral gap. We also extend our proof to states that describe gapped domain walls. Physically, these results imply that the states studied in the entanglement bootstrap program correspond to ground states of some local Hamiltonian, describing a stable phase of matter. Our result also suggests that systems with chiral gapless edge modes cannot obey a strict area law provided they have finite local Hilbert space.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Retrieval-Augmented Open-Vocabulary Object Detection
Authors:
Jooyeon Kim,
Eulrang Cho,
Sehyung Kim,
Hyunwoo J. Kim
Abstract:
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R…
▽ More
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF .
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs
Authors:
Kush Hari,
Hansoul Kim,
Will Panitch,
Kishore Srinivas,
Vincent Schorp,
Karthik Dharmarajan,
Shreya Ganti,
Tara Sadjadpour,
Ken Goldberg
Abstract:
We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread swee**, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new sut…
▽ More
We present STITCH: an augmented dexterity pipeline that performs Suture Throws Including Thread Coordination and Handoffs. STITCH iteratively performs needle insertion, thread swee**, needle extraction, suture cinching, needle handover, and needle pose correction with failure recovery policies. We introduce a novel visual 6D needle pose estimation framework using a stereo camera pair and new suturing motion primitives. We compare STITCH to baselines, including a proprioception-only and a policy without visual servoing. In physical experiments across 15 trials, STITCH achieves an average of 2.93 sutures without human intervention and 4.47 sutures with human intervention. See https://sites.google.com/berkeley.edu/stitch for code and supplemental materials.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density
Authors:
Hyunjun Park,
Jiwon Shin,
Hanseok Kim,
Jihee Kim,
Haengbeom Shin,
Taehoon Kim,
Jung-Hun Park,
Woo-Seok Choi
Abstract:
This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n…
▽ More
This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through numerical experiments. XMAS not only demonstrates exceptional crosstalk removing capabilities but also exhibits robustness against noise, especially simultaneous switching noise. Fabricated in a 28-nm CMOS process, the prototype XMAS transceiver achieves an edge density of 3.6TB/s/mm and an energy efficiency of 0.65pJ/b. Compared to the single-ended signaling, the crosstalk-induced peak-to-peak jitter of the received eye with XMAS is reduced by 75% at 10GS/s/pin data rate, and the horizontal eye opening extends to 0.2UI at a bit error rate < 10$^{-12}$.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
The intersection cohomology Hodge module of toric varieties
Authors:
Hyunsuk Kim,
Sridhar Venkatesh
Abstract:
We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the hi…
▽ More
We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the higher direct images of the sheaf of reflexive differentials.
△ Less
Submitted 22 May, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Authors:
Gwanghyun Kim,
Hayeon Kim,
Hoigi Seo,
Dong Un Kang,
Se Young Chun
Abstract:
Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often…
▽ More
Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: https://janeyeon.github.io/beyond-scene.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
$K_{1}^{\pm}$ mesons moving in nuclear matter
Authors:
Seokwoo Yeo,
HyungJoo Kim,
Su Houng Lee
Abstract:
Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of…
▽ More
Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of $K_{1}^{\pm}$ mesons with finite three-momentum in nuclear medium are studied in the QCD sum rule approach. We find that the mass of $K_{1}^{+}$($K_{1}^{-}$) meson is increased(decreased) by the non-trivial momentum effect in both the transverse and longitudinal modes. Specifically, compared to its rest mass in the nuclear medium, in the transverse mode, the mass of $K_{1}^{+}(K_{1}^{-})$ is observed to shift by +2(-55) MeV, while in the longitudinal mode, the mass shift is +13(-11) MeV, all at a momentum of 0.5 GeV. Exploring the medium modifications of $K_{1}$ meson through kaon beams at J-PARC will provide insights on the partial restoration of chiral symmetry in nuclear matter.
△ Less
Submitted 10 April, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Machine Learning-Aided Cooperative Localization under Dense Urban Environment
Authors:
Hoon Lee,
Hong Ki Kim,
Seung Hyun Oh,
Sang Hyun Lee
Abstract:
Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin…
▽ More
Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Authors:
Hyeonwoo Kim,
Gyoung** Gim,
Yungi Kim,
Jihoo Kim,
Byungju Kim,
Wonseok Lee,
Chanjun Park
Abstract:
This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the…
▽ More
This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of Large Language Models (LLMs). We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.
△ Less
Submitted 24 April, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.