Search | arXiv e-print repository

Conformal geometry from entanglement

Authors: Isaac H. Kim, Xiang Li, Ting-Chun Lin, John McGreevy, Bowen Shi

Abstract: In a physical system with conformal symmetry, observables depend on cross-ratios, measures of distance invariant under global conformal transformations (conformal geometry for short). We identify a quantum information-theoretic mechanism by which the conformal geometry emerges at the gapless edge of a 2+1D quantum many-body system with a bulk energy gap. We introduce a novel pair of information-th… ▽ More In a physical system with conformal symmetry, observables depend on cross-ratios, measures of distance invariant under global conformal transformations (conformal geometry for short). We identify a quantum information-theoretic mechanism by which the conformal geometry emerges at the gapless edge of a 2+1D quantum many-body system with a bulk energy gap. We introduce a novel pair of information-theoretic quantities $(\mathfrak{c}_{\mathrm{tot}}, η)$ that can be defined locally on the edge from the wavefunction of the many-body system, without prior knowledge of any distance measure. We posit that, for a topological groundstate, the quantity $\mathfrak{c}_{\mathrm{tot}}$ is stationary under arbitrary variations of the quantum state, and study the logical consequences. We show that stationarity, modulo an entanglement-based assumption about the bulk, implies (i) $\mathfrak{c}_{\mathrm{tot}}$ is a non-negative constant that can be interpreted as the total central charge of the edge theory. (ii) $η$ is a cross-ratio, obeying the full set of mathematical consistency rules, which further indicates the existence of a distance measure of the edge with global conformal invariance. Thus, the conformal geometry emerges from a simple assumption on groundstate entanglement. We show that stationarity of $\mathfrak{c}_{\mathrm{tot}}$ is equivalent to a vector fixed-point equation involving $η$, making our assumption locally checkable. We also derive similar results for 1+1D systems under a suitable set of assumptions. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 48+31 pages, 25 figures

arXiv:2404.03691 [pdf, other]

Upgrade of NaI(Tl) crystal encapsulation for the NEON experiment

Authors: J. J. Choi, E. J. Jeon, J. Y. Kim, K. W. Kim, S. H. Kim, S. K. Kim, Y. D. Kim, Y. J. Ko, B. C. Koh, C. Ha, B. J. Park, S. H. Lee, I. S. Lee, H. Lee, H. S. Lee, J. Lee, Y. M. Oh

Abstract: The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which… ▽ More The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which operates at a thermal power of 2.8\,GW. Initial engineering operation was performed from May 2021 to March 2022 and observed unexpected photomultiplier-induced noise and a decreased light yield that were caused by leakage of liquid scintillator into the detector due to weakness of detector encapsulation. We upgraded the detector encapsulation design to prevent the leakage of the liquid scintillator. Meanwhile two small-sized detectors were replaced with larger ones resulting in a total mass of 16.7\,kg. With this new design implementation, the detector system has been operating stably since April 2022 for over a year without detector gain drop. In this paper, we present an improved crystal encapsulation design and stability of the NEON experiment. △ Less

Submitted 28 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.03138 [pdf, other]

Discontinuity-preserving Normal Integration with Auxiliary Edges

Authors: Hyomin Kim, Yucheol Jung, Seungyong Lee

Abstract: Many surface reconstruction methods incorporate normal integration, which is a process to obtain a depth map from surface gradients. In this process, the input may represent a surface with discontinuities, e.g., due to self-occlusion. To reconstruct an accurate depth map from the input normal map, hidden surface gradients occurring from the jumps must be handled. To model these jumps correctly, we… ▽ More Many surface reconstruction methods incorporate normal integration, which is a process to obtain a depth map from surface gradients. In this process, the input may represent a surface with discontinuities, e.g., due to self-occlusion. To reconstruct an accurate depth map from the input normal map, hidden surface gradients occurring from the jumps must be handled. To model these jumps correctly, we design a novel discretization scheme for the domain of normal integration. Our key idea is to introduce auxiliary edges, which bridge between piecewise-smooth patches in the domain so that the magnitude of hidden jumps can be explicitly expressed. Using the auxiliary edges, we design a novel algorithm to optimize the discontinuity and the depth map from the input normal map. Our method optimizes discontinuities by using a combination of iterative re-weighted least squares and iterative filtering of the jump magnitudes on auxiliary edges to provide strong sparsity regularization. Compared to previous discontinuity-preserving normal integration methods, which model the magnitudes of jumps only implicitly, our method reconstructs subtle discontinuities accurately thanks to our explicit representation of jumps allowing for strong sparsity regularization. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: To appear at CVPR 2024. For supplementary video, see https://youtu.be/MTTcW5kAOFE

ACM Class: I.4.5

arXiv:2404.02405 [pdf, other]

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression

Authors: Ho-Joong Kim, Jung-Ho Hong, Heejo Kong, Seong-Whan Lee

Abstract: In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \mode… ▽ More In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \modelname{}, a full end-to-end temporal action detection transformer that integrates time-aligned coordinate expression. We reformulate coordinate expression utilizing actual timeline values, ensuring length-invariant representations from the extremely diverse video duration environment. Furthermore, our proposed adaptive query selection dynamically adjusts the number of queries based on video length, providing a suitable solution for varying video durations compared to a fixed query set. Our approach not only simplifies the TAD process by eliminating the need for hand-crafted components but also significantly improves the performance of query-based detectors. Our TE-TAD outperforms the previous query-based detectors and achieves competitive performance compared to state-of-the-art methods on popular benchmark datasets. Code is available at: https://github.com/Dotori-HJ/TE-TAD △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.02342 [pdf, other]

A Computational Analysis of Lyric Similarity Perception

Authors: Haven Kim, Taketo Akama

Abstract: In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of… ▽ More In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of lyric similarity, primarily due to limited research in this area. To bridge this gap, we conducted a comparative analysis of computational methods for modeling lyric similarity with human perception. Results indicated that computational models based on similarities between embeddings from pre-trained BERT-based models, the audio from which the lyrics are derived, and phonetic components are indicative of perceptual lyric similarity. This finding underscores the importance of semantic, stylistic, and phonetic similarities in human perception about lyric similarity. We anticipate that our findings will enhance the development of similarity-based lyric recommendation systems by offering pseudo-labels for neural network development and introducing objective evaluation metrics. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01807 [pdf, other]

Stacking of charge-density waves in 2H-NbSe$_2$ bilayers

Authors: Fabrizio Cossu, Dhani Nafday, Krisztian Palotás, Mehdi Biderang, Heung-Sik Kim, Alireza Akbari, Igor Di Marco

Abstract: We employ ab-initio electronic structure calculations to investigate the charge-density waves and periodic lattice distortions in bilayer 2H-NbSe$_2$. We demonstrate that the vertical stacking can give rise to a variety of patterns that may lower the symmetry of the CDW exhibited separately by the two composing 1H-NbSe$_2$ monolayers. The general tendency to a spontaneous symmetry breaking observe… ▽ More We employ ab-initio electronic structure calculations to investigate the charge-density waves and periodic lattice distortions in bilayer 2H-NbSe$_2$. We demonstrate that the vertical stacking can give rise to a variety of patterns that may lower the symmetry of the CDW exhibited separately by the two composing 1H-NbSe$_2$ monolayers. The general tendency to a spontaneous symmetry breaking observed in the ground state and the first excited states is shown to originate from a non-negligible inter-layer coupling. Simulated images for scanning tunnelling microscopy (STM) as well as diffraction/scattering patterns show signatures of the different stacking orders. This may not only be useful to reinterpret past experiments on surfaces and thin films, but may also be exploited to devise ad-hoc experiments for the investigation of the stacking order in 2H-NbSe$_2$. We anticipate that our analysis does not only apply to the 2H-NbSe$_2$ bilayer, but is also relevant for thin films and bulk, whose smallest centro-symmetric component is indeed the bilayer. Finally, our results illustrate clearly that the vertical stacking is not only important for 1T structures, as exemplified by the metal-to-insulator transition observed in 1T-TaS$_2$, but seems to be a general feature of metallic layered transition metal dichalcogenides as well. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

arXiv:2404.01661 [pdf, other]

Interaction-Aware Vehicle Motion Planning with Collision Avoidance Constraints in Highway Traffic

Authors: Dongryul Kim, Hyeonjeong Kim, Kyoungseok Han

Abstract: This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique un… ▽ More This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique under state constraints, utilizing a planner based on Pontryagin's Minimum Principle, capable of numerically solving collision avoidance scenarios with surrounding vehicles. Simulation results demonstrate the effectiveness of the proposed approach regarding interaction-based motion planning for different scenarios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01628 [pdf, other]

Learning Equi-angular Representations for Online Continual Learning

Authors: Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

Abstract: Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so th… ▽ More Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.01042 [pdf, ps, other]

Multiplicative Hecke operators and their applications

Authors: Gyucheol Shin, Chang Heon Kim

Abstract: In this paper, we define the multiplicative Hecke operators $\mathcal{T}(n)$ for any positive integer on the integral weight meromorphic modular forms for $Γ_{0}(N)$. We then show that they have properties similar to those of additive Hecke operators. Moreover, we prove that multiplicative Hecke eigenforms with integer Fourier coefficients are eta quotients, and vice versa. In addition, we prove t… ▽ More In this paper, we define the multiplicative Hecke operators $\mathcal{T}(n)$ for any positive integer on the integral weight meromorphic modular forms for $Γ_{0}(N)$. We then show that they have properties similar to those of additive Hecke operators. Moreover, we prove that multiplicative Hecke eigenforms with integer Fourier coefficients are eta quotients, and vice versa. In addition, we prove that the Borcherds product and logarithmic derivative are Hecke equivariant with the multiplicative Hecke operators and the Hecke operators on the half-integral weight harmonic weak Maass forms and weight 2 meromorphic modular forms. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 22 pages

MSC Class: 11F03; 11F12; 11F20; 11F25; 11F37

arXiv:2404.00851 [pdf, other]

Prompt Learning via Meta-Regularization

Authors: **young Park, Juyeon Ko, Hyunwoo J. Kim

Abstract: Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th… ▽ More Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.00830 [pdf, other]

2D Ego-Motion with Yaw Estimation using Only mmWave Radars via Two-Way weighted ICP

Authors: Hojune Kim, Hyesu Jang, Ayoung Kim

Abstract: The interest in single-chip mmWave Radar is driven by their compact form factor, cost-effectiveness, and robustness under harsh environmental conditions. Despite its promising attributes, the principal limitation of mmWave radar lies in its capacity for autonomous yaw rate estimation. Conventional solutions have often resorted to integrating inertial measurement unit (IMU) or deploying multiple ra… ▽ More The interest in single-chip mmWave Radar is driven by their compact form factor, cost-effectiveness, and robustness under harsh environmental conditions. Despite its promising attributes, the principal limitation of mmWave radar lies in its capacity for autonomous yaw rate estimation. Conventional solutions have often resorted to integrating inertial measurement unit (IMU) or deploying multiple radar units to circumvent this shortcoming. This paper introduces an innovative methodology for two-dimensional ego-motion estimation, focusing on yaw rate deduction, utilizing solely mmWave radar sensors. By applying a weighted Iterated Closest Point (ICP) algorithm to register processed points derived from heatmap data, our method facilitates 2D ego-motion estimation devoid of prior information. Through experimental validation, we verified the effectiveness and promise of our technique for ego-motion estimation using exclusively radar data. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00678 [pdf, other]

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

Authors: Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang, James Tompkin, Min H. Kim

Abstract: We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data… ▽ More We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data structure and use a complementary efficient tree traversal strategy based on a breadth-first search for sampling. Unlike regular grids or trees, the shape of this structure well-matches the camera setting, creating a better memory-quality trade-off. From an initial depth estimate, the binoctree is adaptively subdivided throughout the optimization; previous methods use a fixed depth that leaves the scene undersampled. In comparison with three neural optimization methods and two non-neural methods, ours shows decreased geometry error on average, especially in a detailed scene, while significantly reducing the required number of voxels to represent such details. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.00676 [pdf, other]

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Authors: Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

Abstract: Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only… ▽ More Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.00376 [pdf, other]

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges. △ Less

Submitted 30 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Added new LLaMA-3-based models and experiments on NEJM case challenges

arXiv:2404.00201 [pdf, other]

Angular analysis of $B \to K^* e^+ e^-$ in the low-$q^2$ region with new electron identification at Belle

Authors: Belle Collaboration, D. Ferlewicz, P. Urquijo, I. Adachi, K. Adamczyk, H. Aihara, D. M. Asner, H. Atmacan, R. Ayad, V. Babu, Sw. Banerjee, P. Behera, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, B. Bhuyan, T. Bilka, D. Biswas, D. Bodrov, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola , et al. (145 additional authors not shown)

Abstract: We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from r… ▽ More We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from right-handed currents from physics beyond the Standard Model by constraining the Wilson coefficients $\mathcal{C}_7^{(\prime)}$. We perform a fit to the $B\to K^* e^+ e^-$ differential decay rate and measure the imaginary component of the transversality amplitude to be $A_T^{\rm Im} = -1.27 \pm 0.52 \pm 0.12$, and the $K^*$ transverse asymmetry to be $A_T^{(2)} = 0.52 \pm 0.53 \pm 0.11$. The resulting constraints on the value of $\mathcal{C}_7^{\prime}$ are consistent with the Standard Model within a $2σ$ confidence interval. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: Submitted to PRD

Report number: Belle preprint 2023-20, KEK preprint 2023-38

arXiv:2403.19785 [pdf, other]

Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple output (D-MIMO) systems deploy a large number of distributed nodes and efficiently control them, making distributed integrated sensing and communications (ISAC) possible. In this paper, we investigate ISAC in D-MIMO through the lens of different design architectures and deployments, revealing both conflicts and synergies. In addition, simulation and demonstration results reveal both opportunities and challenges towards the implementation of ISAC in D-MIMO. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19270 [pdf, other]

sDPO: Don't Use Your Data All at Once

Authors: Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park

Abstract: As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at… ▽ More As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18411 [pdf, ps, other]

Role of hidden-color components in the tetraquark mixing model

Authors: Hungchong Kim, K. S. Kim

Abstract: Multiquarks can have two-hadron components and hidden-color components in their wave functions. The presence of two-hadron components in multiquarks introduces a potential source of confusion, particularly with respect to their resemblance to hadronic molecules. On the other hand, hidden-color components are essential for distinguishing between multiquarks and hadronic molecules. In this work, we… ▽ More Multiquarks can have two-hadron components and hidden-color components in their wave functions. The presence of two-hadron components in multiquarks introduces a potential source of confusion, particularly with respect to their resemblance to hadronic molecules. On the other hand, hidden-color components are essential for distinguishing between multiquarks and hadronic molecules. In this work, we study the hidden-color components in the wave functions of the tetraquark mixing model, a model that has been proposed as a suitable framework for describing the properties of two nonets in the $J^P=0^+$ channel: the light nonet [$a_0 (980)$, $K_0^* (700)$, $f_0 (500)$, $f_0 (980)$] and the heavy nonet [$a_0 (1450)$, $K_0^* (1430)$, $f_0 (1370)$, $f_0 (1500)$]. Our analysis reveals a substantial presence of hidden-color components within the tetraquark wave functions. To elucidate the impact of hidden-color components on physical quantities, we conduct computations of the hyperfine masses, $\langle V_{CS}\rangle$, for the two nonets, considering scenarios involving only the two-meson components and those incorporating the hidden-color components. We demonstrate that the hidden-color components constitute an important part of the hyperfine masses, such that the mass difference formula, $ΔM\approx Δ\langle V_{CS}\rangle$, which has been successful for the two nonets, cannot be achieved without the hidden-color contributions. This can provide another evidence supporting the tetraquark nature of the two nonets. △ Less

Submitted 29 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 10 pages, no figure. The version accepted for publication in EPJC

arXiv:2403.18410 [pdf, other]

Chiral Virasoro algebra from a single wavefunction

Authors: Isaac H. Kim, Xiang Li, Ting-Chun Lin, John McGreevy, Bowen Shi

Abstract: Chiral edges of 2+1D systems can have very robust emergent conformal symmetry. When the edge is purely chiral, the Hilbert space of low-energy edge excitations can form a representation of a single Virasoro algebra. We propose a method to systematically extract the generators of the Virasoro algebra from a single ground state wavefunction, using entanglement bootstrap and an input from the edge co… ▽ More Chiral edges of 2+1D systems can have very robust emergent conformal symmetry. When the edge is purely chiral, the Hilbert space of low-energy edge excitations can form a representation of a single Virasoro algebra. We propose a method to systematically extract the generators of the Virasoro algebra from a single ground state wavefunction, using entanglement bootstrap and an input from the edge conformal field theory. We corroborate our construction by numerically verifying the commutation relations of the generators. We also study the unitary flows generated by these operators, whose properties (such as energy and state overlap) are shown numerically to agree with our analytical predictions. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 60+20 pages, 28 figures

arXiv:2403.17977 [pdf, ps, other]

Abelian Chern-Simons term as a Kaluza-Klein dimensional reduction of the Gibbons-Hawking surface term

Authors: Hongsu Kim, Jae Sok Oh

Abstract: It is suggested that the original, minimal Kaluza-Klein theory should be extended by adding a 5-dimensional version of the Gibbons-Hawking gravitational surface term. It is then demonstrated that the usual dimensional reduction of the newly added surface (boundary) term leads to the emergence of the famous Abelian Chern-Simons term. It is stressed that the advent of this Chern-Simons term is not m… ▽ More It is suggested that the original, minimal Kaluza-Klein theory should be extended by adding a 5-dimensional version of the Gibbons-Hawking gravitational surface term. It is then demonstrated that the usual dimensional reduction of the newly added surface (boundary) term leads to the emergence of the famous Abelian Chern-Simons term. It is stressed that the advent of this Chern-Simons term is not merely a parametrization artefact but a real thing. Finally, the issue of finite-ranged electromagnetic interaction due to massive photons on a plane has been interpreted in terms of the violation of the local gauge invariance of this extended version of Kaluza-Klein theory. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2403.17709 [pdf, other]

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Authors: Jongha Kim, Jihwan Park, **young Park, **young Kim, Sehyung Kim, Hyunwoo J. Kim

Abstract: Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to… ▽ More Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.17458 [pdf, ps, other]

Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice

Authors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, ** B. Hong

Abstract: Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere I… ▽ More Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere IoT datasets both capture IoT-related attacks, but the deep neural network performed the best when tested using the BoT_IoT dataset while HELAD performed the best when tested using the Stratosphere IoT dataset. So although we found that a deep neural network solution had the highest average F1 scores on tested datasets, it is not always the best-performing one. We further discuss difficulties in using IDS from literature and project repositories, which complicated drawing definitive conclusions regarding IDS selection. △ Less

Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 10 pages

MSC Class: 68M25; 68M20 ACM Class: C.4; D.m

arXiv:2403.16911 [pdf]

Achieving Optical Refractive Index of 10-Plus by Colloidal Self-Assembly

Authors: NaYeoun Kim, Ji-Hyeok Huh, YongDeok Cho, Sung Hun Park, Hyeon Ho Kim, Kyung Hun Rho, Jaewon Lee, Seungwoo Lee

Abstract: This study demonstrates the developments of self-assembled optical metasurfaces to overcome inherent limitations in polarization density (P) within natural materials, which hinder achieving high refractive indices (n) at optical frequencies. The Maxwellian macroscopic description establishes a link between P and n, revealing a static limit in natural materials, restricting n to approximately 4.0 a… ▽ More This study demonstrates the developments of self-assembled optical metasurfaces to overcome inherent limitations in polarization density (P) within natural materials, which hinder achieving high refractive indices (n) at optical frequencies. The Maxwellian macroscopic description establishes a link between P and n, revealing a static limit in natural materials, restricting n to approximately 4.0 at optical frequencies. Optical metasurfaces, utilizing metallic colloids on a deep-subwavelength scale, offer a solution by unnaturally enhancing n through electric dipolar (ED) resonances. Self-assembly enables the creation of nanometer-scale metallic gaps between metallic nanoparticles (NPs), paving the way for achieving exceptionally high n at optical frequencies. This study focuses on assembling polyhedral gold (Au) NPs into a closely packed monolayer by rationally designing the polymeric ligand to balance attractive and repulsive forces, in that polymeric brush-mediated self-assembly of the close-packed Au NP monolayer is robustly achieved over a large-area. The resulting monolayer of Au nanospheres (NSs), nanooctahedras (NOs), and nanocubes (NCs) exhibits high macroscopic integrity and crystallinity, sufficiently enough for pushing n to record-high regimes. The study underlies the significance of capacitive coupling in achieving an unnaturally high n and explores fine-tuning Au NC size to optimize this coupling. The achieved n of 10.12 at optical frequencies stands as a benchmark, highlighting the potential of polyhedral Au NPs in advancing optical metasurfaces. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15725 [pdf, other]

Customizable wave tailoring materials enabled by nonlinear bilevel inverse design

Authors: Brianna MacNider, Haning Xiu, Kai Qian, Ian Frankel, Hyunsun Alicia Kim, Nicholas Boechler

Abstract: Passive transformation of waves via nonlinear systems is ubiquitous in settings ranging from acoustics to optics and electromagnetics. Passivity is of particular importance for responding rapidly to stimuli and nonlinearity enormously expands signal transformability compared to linear systems due to the breaking of superposition. It is well known that different types of nonlinearity yield vastly d… ▽ More Passive transformation of waves via nonlinear systems is ubiquitous in settings ranging from acoustics to optics and electromagnetics. Passivity is of particular importance for responding rapidly to stimuli and nonlinearity enormously expands signal transformability compared to linear systems due to the breaking of superposition. It is well known that different types of nonlinearity yield vastly different effects on propagating signals, which raises the question of ``what precise nonlinearity is the best for a given wave tailoring application?'' Considering a one-dimensional spring-mass chain as a testbed, we couple the shape optimization of structures for tailored nonlinear constitutive responses with reduced-order nonlinear dynamical inverse design. Using minimization of peak kinetic energy transmission from impact as a case study, we identify ideal nonlinear constitutive responses and the geometries needed to achieve them. As part of this, we show the large sensitivity of this metric to small changes in nonlinearity, and thus the need for high precision, free-form nonlinearity tailoring. We validate our predictions using impact experiments in a chain of nonlinear springs and masses. This work sets the foundation for broader passive nonlinear mechanical wave tailoring material design, with applications to computing, signal processing, shock mitigation, and autonomous materials. △ Less

Submitted 30 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.15209 [pdf, other]

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

Authors: Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

Abstract: Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate moda… ▽ More Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate modality bias in multispectral pedestrian detection using Large Language Models (LLMs). Accordingly, we design a Multispectral Chain-of-Thought (MSCoT) prompting strategy, which prompts the LLM to perform multispectral pedestrian detection. Moreover, we propose a novel Multispectral Chain-of-Thought Detection (MSCoTDet) framework that integrates MSCoT prompting into multispectral pedestrian detection. To this end, we design a Language-driven Multi-modal Fusion (LMF) strategy that enables fusing the outputs of MSCoT prompting with the detection results of vision-based multispectral pedestrian detection models. Extensive experiments validate that MSCoTDet effectively mitigates modality biases and improves multispectral pedestrian detection. △ Less

Submitted 29 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.14946 [pdf, other]

A Single Linear Layer Yields Task-Adapted Low-Rank Matrices

Authors: Hwichan Kim, Shota Sasaki, Sho Hoshino, Ukyo Honda

Abstract: Low-Rank Adaptation (LoRA) is a widely used Parameter-Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $ΔW$ consisted by two low-rank matrices $A$ and $B$. A previous study suggested that there is correlation between $W_0$ and $ΔW$. In this study, we aim to delve deeper into relationships between $W_0$ and low-rank matrices $A$ and $B$ to further… ▽ More Low-Rank Adaptation (LoRA) is a widely used Parameter-Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $ΔW$ consisted by two low-rank matrices $A$ and $B$. A previous study suggested that there is correlation between $W_0$ and $ΔW$. In this study, we aim to delve deeper into relationships between $W_0$ and low-rank matrices $A$ and $B$ to further comprehend the behavior of LoRA. In particular, we analyze a conversion matrix that transform $W_0$ into low-rank matrices, which encapsulates information about the relationships. Our analysis reveals that the conversion matrices are similar across each layer. Inspired by these findings, we hypothesize that a single linear layer, which takes each layer's $W_0$ as input, can yield task-adapted low-rank matrices. To confirm this hypothesis, we devise a method named Conditionally Parameterized LoRA (CondLoRA) that updates initial weight matrices with low-rank matrices derived from a single linear layer. Our empirical results show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA. Therefore, we conclude that "a single linear layer yields task-adapted low-rank matrices." △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2403.14902 [pdf, other]

Hydro: Adaptive Query Processing of ML Queries

Authors: Gaurav Tarlok Kakkar, Jiashen Cao, Aubhro Sengupta, Joy Arulraj, Hyesoon Kim

Abstract: Query optimization in relational database management systems (DBMSs) is critical for fast query processing. The query optimizer relies on precise selectivity and cost estimates to effectively optimize queries prior to execution. While this strategy is effective for relational DBMSs, it is not sufficient for DBMSs tailored for processing machine learning (ML) queries. In ML-centric DBMSs, query opt… ▽ More Query optimization in relational database management systems (DBMSs) is critical for fast query processing. The query optimizer relies on precise selectivity and cost estimates to effectively optimize queries prior to execution. While this strategy is effective for relational DBMSs, it is not sufficient for DBMSs tailored for processing machine learning (ML) queries. In ML-centric DBMSs, query optimization is challenging for two reasons. First, the performance bottleneck of the queries shifts to user-defined functions (UDFs) that often wrap around deep learning models, making it difficult to accurately estimate UDF statistics without profiling the query. This leads to inaccurate statistics and sub-optimal query plans. Second, the optimal query plan for ML queries is data-dependent, necessitating DBMSs to adapt the query plan on the fly during execution. So, a static query plan is not sufficient for such queries. In this paper, we present Hydro, an ML-centric DBMS that utilizes adaptive query processing (AQP) for efficiently processing ML queries. Hydro is designed to quickly evaluate UDF-based query predicates by ensuring optimal predicate evaluation order and improving the scalability of UDF execution. By integrating AQP, Hydro continuously monitors UDF statistics, routes data to predicates in an optimal order, and dynamically allocates resources for evaluating predicates. We demonstrate Hydro's efficacy through four illustrative use cases, delivering up to 11.52x speedup over a baseline system. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14482 [pdf, other]

Assessing exchange-correlation functionals for heterogeneous catalysis of nitrogen species

Authors: Honghui Kim, Neung-Kyung Yu, Nianhan Tian, Andrew J. Medford

Abstract: Increasing interest in sustainable synthesis of ammonia, nitrates, and urea has led to an increase in studies of catalytic conversion between nitrogen-containing compounds using heterogeneous catalysts. Density functional theory (DFT) is commonly employed to obtain molecular-scale insight into these reactions, but there have been relatively few assessments of the exchange-correlation functionals t… ▽ More Increasing interest in sustainable synthesis of ammonia, nitrates, and urea has led to an increase in studies of catalytic conversion between nitrogen-containing compounds using heterogeneous catalysts. Density functional theory (DFT) is commonly employed to obtain molecular-scale insight into these reactions, but there have been relatively few assessments of the exchange-correlation functionals that are best suited for heterogeneous catalysis of nitrogen compounds. Here, we assess a range of functionals ranging from the generalized gradient approximation (GGA) to the random phase approximation (RPA) for the formation energies of gas-phase nitrogen species, the lattice constants of representative solids from several common classes of catalysts (metals, oxides, and metal-organic frameworks (MOFs)), and the adsorption energies of a range of nitrogen-containing intermediates on these materials. The results reveal that the choice of exchange-correlation functional and van der Waals correction can have a surprisingly large effect and that increasing the level of theory does not always improve the accuracy for nitrogen-containing compounds. This suggests that the selection of functionals should be carefully evaluated on the basis of the specific reaction and material being studied. △ Less

Submitted 20 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 44 pages, 20 figures. Figure 4 (MIL-125) data is changed. Relevant contents (texts, tables, figures, SI) are changed. VASP data is shared with accessible Zenodo link

arXiv:2403.14176 [pdf, other]

ReFeree: Radar-based efficient global descriptor using a Feature and Free space for Place Recognition

Authors: Byunghee Choi, Hogyun Kim, Younggun Cho

Abstract: Radar is highlighted for robust sensing capabilities in adverse weather conditions (e.g. dense fog, heavy rain, or snowfall). In addition, Radar can cover wide areas and penetrate small particles. Despite these advantages, Radar-based place recognition remains in the early stages compared to other sensors due to its unique characteristics such as low resolution, and significant noise. In this pape… ▽ More Radar is highlighted for robust sensing capabilities in adverse weather conditions (e.g. dense fog, heavy rain, or snowfall). In addition, Radar can cover wide areas and penetrate small particles. Despite these advantages, Radar-based place recognition remains in the early stages compared to other sensors due to its unique characteristics such as low resolution, and significant noise. In this paper, we propose a Radarbased place recognition utilizing a descriptor called ReFeree using a feature and free space. Unlike traditional methods, we overwhelmingly summarize the Radar image. Despite being lightweight, it contains semi-metric information and is also outstanding from the perspective of place recognition performance. For concrete validation, we test a single session from the MulRan dataset and a multi-session from the Oxford Offroad Radar, Oxford Radar RobotCar, and the Boreas dataset. △ Less

Submitted 6 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 5 pages, 4 figures

arXiv:2403.13376 [pdf, other]

Correlation Clustering of Organoid Images

Authors: Jannik Presberger, Rashmiparvathi Keshara, David Stein, Yung Hae Kim, Anne Grapin-Botton, Bjoern Andres

Abstract: In biological and medical research, scientists now routinely acquire microscopy images of hundreds of morphologically heterogeneous organoids and are then faced with the task of finding patterns in the image collection, i.e., subsets of organoids that appear similar and potentially represent the same morphological class. We adopt models and algorithms for correlating organoid images, i.e., for qua… ▽ More In biological and medical research, scientists now routinely acquire microscopy images of hundreds of morphologically heterogeneous organoids and are then faced with the task of finding patterns in the image collection, i.e., subsets of organoids that appear similar and potentially represent the same morphological class. We adopt models and algorithms for correlating organoid images, i.e., for quantifying the similarity in appearance and geometry of the organoids they depict, and for clustering organoid images by consolidating conflicting correlations. For correlating organoid images, we adopt and compare two alternatives, a partial quadratic assignment problem and a twin network. For clustering organoid images, we employ the correlation clustering problem. Empirically, we learn the parameters of these models, infer a clustering of organoid images, and quantify the accuracy of the inferred clusters, with respect to a training set and a test set we contribute of state-of-the-art light microscopy images of organoids clustered manually by biologists. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 19 pages

arXiv:2403.13347 [pdf, other]

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

Authors: Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

Abstract: Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., bac… ▽ More Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., backgrounds, degrade the generalization performance of models. To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training. For vid-TLDR, we introduce a novel approach to capture the salient regions in videos only with the attention map. Further, we introduce the saliency-aware token merging strategy by drop** the background tokens and sharpening the object scores. Our experiments show that vid-TLDR significantly mitigates the computational complexity of video Transformers while achieving competitive performance compared to the base model without vid-TLDR. Code is available at https://github.com/mlvlab/vid-TLDR. △ Less

Submitted 30 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2403.12821 [pdf, other]

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Authors: Dongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin

Abstract: The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estima… ▽ More The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estimation, and graphbased methods, which treat an architecture as a graph, have shown prominent performance. For enhanced representation learning of neural architectures, we introduce FlowerFormer, a powerful graph transformer that incorporates the information flows within a neural architecture. FlowerFormer consists of two key components: (a) bidirectional asynchronous message passing, inspired by the flows; (b) global attention built on flow-based masking. Our extensive experiments demonstrate the superiority of FlowerFormer over existing neural encoding methods, and its effectiveness extends beyond computer vision models to include graph neural networks and auto speech recognition models. Our code is available at http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer. △ Less

Submitted 21 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: CVPR 2024 Camera-Ready

arXiv:2403.12726 [pdf]

Small Distance Increment Method for Measuring Complex Permittivity With mmWave Radar

Authors: Hang Song, Hyun Joon Kim, Mingxia Wan, Bo Wei, Takamaro Kikkawa, Jun-ichi Takada

Abstract: Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods re… ▽ More Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods require the destruction of samples, which is not suitable for non-destructive inspection. In this work, a small distance increment (SDI) method is proposed to non-destructively measure the complex permittivity of material. In SDI, the transmitter and receiver are formed as the monostatic radar, which is facing towards the material under test (MUT). During the measurement, the distance between radar and MUT changes with small increments and the signals are recorded at each position. A mathematical model is formulated to depict the relationship among the complex permittivity, distance increment, and measured signals. By fitting the model, the complex permittivity of MUT is estimated. To implement and evaluate the proposed SDI method, a commercial off-the-shelf mmWave radar is utilized and the measurement system is developed. Then, the evaluation was carried out on the acrylic plate. With the proposed method, the estimated complex permittivity of acrylic plate shows good agreement with the literature values, demonstrating the efficacy of SDI method for characterizing the complex permittivity of material. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12341 [pdf, ps, other]

Diophantine approximation by rational numbers of certain parity types

Authors: Dong Han Kim, Seul Bee Lee, Lingmin Liao

Abstract: For a given irrational number, we consider the properties of best rational approximations of given parities. There are three different kinds of rational numbers according to the parity of the numerator and denominator, say odd/odd, even/odd and odd/even rational numbers. We study algorithms to find best approximations by rational numbers of given parities and compare these algorithms with continue… ▽ More For a given irrational number, we consider the properties of best rational approximations of given parities. There are three different kinds of rational numbers according to the parity of the numerator and denominator, say odd/odd, even/odd and odd/even rational numbers. We study algorithms to find best approximations by rational numbers of given parities and compare these algorithms with continued fraction expansions. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 22 pages, 6 figures

MSC Class: 11J04; 11J70

arXiv:2403.11512 [pdf, ps, other]

Linking numbers of Montesinos links

Authors: Hyoungjun Kim, Sungjong No, Hyungkee Yoo

Abstract: The linking number of an oriented two-component link is an invariant indicating how intertwined the two components are. Tuler proved that the linking number of a two-component rational $\frac{p}{q}$-link is $$\sum^{\frac{|p|}{2}}_{k=1} (-1)^{\big\lfloor (2k-1) \frac{q}{p} \big\rfloor }.$$ In this paper, we provide a simple proof the above result, and introduce the numerical algorithm to find linki… ▽ More The linking number of an oriented two-component link is an invariant indicating how intertwined the two components are. Tuler proved that the linking number of a two-component rational $\frac{p}{q}$-link is $$\sum^{\frac{|p|}{2}}_{k=1} (-1)^{\big\lfloor (2k-1) \frac{q}{p} \big\rfloor }.$$ In this paper, we provide a simple proof the above result, and introduce the numerical algorithm to find linking numbers of rational links. Using this result, we find linking numbers between any two components in a Montesinos link. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures

MSC Class: 57K10

arXiv:2403.11382 [pdf, other]

Topological singularity-induced self-energy in strongly correlated fermion systems

Authors: Byungkyun Kang, Zachary Brown, Myoung-Hwan Kim, Hyunsoo Kim, Chul Hong Park

Abstract: Employing ab initio many-body perturbation theory combined with dynamical mean field theory, we discovered that in strongly correlated topological semimetals HoPtBi and PrAlGe, which exhibit topological singular points in the vicinity of the Fermi level, the formation of 4$f$ quasiparticles are forbidden. We show that blocking hybridization channels at the topological singular point effectively en… ▽ More Employing ab initio many-body perturbation theory combined with dynamical mean field theory, we discovered that in strongly correlated topological semimetals HoPtBi and PrAlGe, which exhibit topological singular points in the vicinity of the Fermi level, the formation of 4$f$ quasiparticles are forbidden. We show that blocking hybridization channels at the topological singular point effectively enhances on-site Coulomb repulsion, resulting in a substantial self-energy. This renders the topological singular point incompatible with the presence of strongly correlated electrons at the Fermi level. In contrast to the Kondo effect, our findings suggest that the topological quasiparticles in close proximity to the singular points do not hybridize with 4$f$ electrons due to the self-energy, thus hindering the manifestation of heavy-fermion behavior when the singular points persist at the Fermi level. △ Less

Submitted 6 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11316 [pdf]

Multi-Scale Experimental Characterization for LS-DYNA MAT213 Modeling of Composite Structures under High Strain Rate

Authors: Jackob Black, Ryan Premo, Robert K. Goldberg, Trenton M. Ricks, Troy Lyons, Han-Gyu Kim

Abstract: Aerospace structures often experience high strain rate events such as ballistic impact, crash, or crush. A material model has been developed that enhances the capability to simulate the dynamic response of composite materials under these loading conditions. The material model has been implemented into the commercially available transient dynamic finite element code LS-DYNA as MAT213. The model can… ▽ More Aerospace structures often experience high strain rate events such as ballistic impact, crash, or crush. A material model has been developed that enhances the capability to simulate the dynamic response of composite materials under these loading conditions. The material model has been implemented into the commercially available transient dynamic finite element code LS-DYNA as MAT213. The model can simulate the nonlinear deformation, damage, and failure that takes place in a composite under dynamic loading conditions. The specific goal of this work is to characterize the MAT213 input for the representative material. The specific composite material being examined consists of T700G unidirectional carbon fibers and a low-melt PolyArylEtherKetone (LMPAEK) thermoplastic resin system. It is formally referred to as Toray TC1225 LMPAEK T700G. As the initial part of this work, this paper is focused on characterizing the material parameters for the MAT213 deformation model based on results obtained from multi-scale experimentation. The effort concentrated on characterizing the in-plane material response suitable for use with thin shell elements. For shell elements within MAT213, tabulated stress-strain results from tension and compression tests in the longitudinal and transverse directions and in-plane shear tests are required. Due to the difficulty of measuring small strains in the transverse direction, a multi-scale testing method was developed. Macro-scale testing is performed per the typical ASTM methods while micro-scale testing uses a microscope along with smaller coupon sizes to obtain the smaller strains in the transverse direction of each test. For both testing methods, a VIC-2D camera and software for digital image correlation analysis are used. Using the DIC combined with each test fixture, reliable stress and strain data are collected. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10882 [pdf, other]

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

Authors: ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, Hye** Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim

Abstract: Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly… ▽ More Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models. △ Less

Submitted 21 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10820 [pdf, other]

Active Label Correction for Semantic Segmentation with Foundation Models

Authors: Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok

Abstract: Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels,… ▽ More Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels, which in turn is more annotator-friendly than the standard one inquiring to classify a pixel directly according to our theoretical analysis and user study. Specifically, leveraging foundation models providing useful zero-shot predictions on pseudo labels and superpixels, our method comprises two key techniques: (i) an annotator-friendly design of correction query with the pseudo labels, and (ii) an acquisition function looking ahead label expansions based on the superpixels. Experimental results on PASCAL, Cityscapes, and Kvasir-SEG datasets demonstrate the effectiveness of our ALC framework, outperforming prior methods for active semantic segmentation and label correction. Notably, utilizing our method, we obtained a revised dataset of PASCAL by rectifying errors in 2.6 million pixels in PASCAL dataset. △ Less

Submitted 4 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10751 [pdf, other]

LIGHTCODE: Light Analytical and Neural Codes for Channels with Feedback

Authors: Sravan Kumar Ankireddy, Krishna Narayanan, Hyeji Kim

Abstract: The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity c… ▽ More The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity coding schemes that are interpretable and more suitable for communication systems. We advance both analytical and neural codes. First, we demonstrate that POWERBLAST, an analytical coding scheme inspired by Schalkwijk-Kailath (SK) and Gallager-Nakiboglu (GN) schemes, achieves notable reliability improvements over both SK and GN schemes, outperforming neural codes in high signal-to-noise ratio (SNR) regions. Next, to enhance reliability in low-SNR regions, we propose LIGHTCODE, a lightweight neural code that achieves state-of-the-art reliability while using a fraction of memory and compute compared to existing deep-learning-based codes. Finally, we systematically analyze the learned codes, establishing connections between LIGHTCODE and POWERBLAST, identifying components crucial for performance, and providing interpretation aided by linear regression analysis. △ Less

Submitted 13 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 13 pages, 11 figures

arXiv:2403.10391 [pdf, other]

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Authors: Hyuck Lee, Heeyoung Kim

Abstract: Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with… ▽ More Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with that of a labeled set. We propose a novel class-imbalanced SSL algorithm called class-distribution-mismatch-aware debiasing (CDMAD). For each iteration of training, CDMAD first assesses the classifier's biased degree towards each class by calculating the logits on an image without any patterns (e.g., solid color image), which can be considered irrelevant to the training set. CDMAD then refines biased pseudo-labels of the base SSL algorithm by ensuring the classifier's neutrality. CDMAD uses these refined pseudo-labels during the training of the base SSL algorithm to improve the quality of the representations. In the test phase, CDMAD similarly refines biased class predictions on test samples. CDMAD can be seen as an extension of post-hoc logit adjustment to address a challenge of incorporating the unknown class distribution of the unlabeled set for re-balancing the biased classifier under class distribution mismatch. CDMAD ensures Fisher consistency for the balanced error. Extensive experiments verify the effectiveness of CDMAD. △ Less

Submitted 25 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.10348 [pdf, other]

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

Authors: **-Young Kim, Hyojun Go, Soonwoo Kwon, Hyun-Gyoon Kim

Abstract: Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficul… ▽ More Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with descending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging the benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 22 pages, 8 figures, 5 tables

arXiv:2403.10217 [pdf, other]

Effectiveness of the syndrome extraction circuit with flag qubits on IBM quantum hardware

Authors: Younghun Kim, Hansol Kim, Jeongsoo Kang, Wonjae Choi, Younghun Kwon

Abstract: Large-scale quantum circuits are required to exploit the advantages of quantum computers. Present-day quantum computers have become less reliable with increasing depths of quantum circuits. To overcome this limitation, quantum error-correction codes have been introduced. Although the success of quantum error correction codes has been announced in Google[1, 2] and neutral atom[3] quantum computers,… ▽ More Large-scale quantum circuits are required to exploit the advantages of quantum computers. Present-day quantum computers have become less reliable with increasing depths of quantum circuits. To overcome this limitation, quantum error-correction codes have been introduced. Although the success of quantum error correction codes has been announced in Google[1, 2] and neutral atom[3] quantum computers, there have been no reports on IBM quantum computers showing error suppression owing to its unique heavy-hexagon structure. This structure restricts connectivity, and quantum error-correction codes on IBM quantum computers require flag qubits. Here, we report the successful implementation of a syndrome extraction circuit with flag qubits on IBM quantum computers. Moreover, we demonstrate its effectiveness by considering the repetition code as a test code among the quantum error-correcting codes. Even though the data qubit is not adjacent to the syndrome qubit, logical error rates diminish exponentially as the distance of the repetition code increases from three to nine. Even when two flag qubits exist between the data and syndrome qubits, the logical error rates decrease as the distance increases similarly. This confirms the successful implementation of the syndrome extraction circuit with flag qubits on the IBM quantum computer. △ Less

Submitted 10 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10041 [pdf, other]

Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)

Authors: Jeongeun Park, Taemoon Jeong, Hyeonseong Kim, Taehyun Byun, Seungyoon Shin, Keunjun Choi, Jaewoon Kwon, Taeyoon Lee, Matthew Pan, Sungjoon Choi

Abstract: This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animate… ▽ More This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animated Social Kinematics (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 4 pages, 3 figures

arXiv:2403.10030 [pdf, other]

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

Authors: Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

Abstract: Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minim… ▽ More Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (e.g., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy trade-off in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various Vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://github.com/mlvlab/MCTF. △ Less

Submitted 1 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2403.09437 [pdf, other]

Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting

Authors: Pawel Knap, Peter Hardy, Alberto Tamajo, Hwasup Lim, Hansung Kim

Abstract: Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person. Therefore, this paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time and is also able to handle basic forms of occlusion. First, we adjust an off-the-shelf 2D detector and an unsupervised 2D-3D lifting model for use with a 360… ▽ More Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person. Therefore, this paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time and is also able to handle basic forms of occlusion. First, we adjust an off-the-shelf 2D detector and an unsupervised 2D-3D lifting model for use with a 360$^\circ$ panoramic camera and mmWave radar sensors. We then introduce several contributions, including camera and radar calibrations, and the improved matching of people within the image and radar space. The system addresses both the depth and scale ambiguity problems by employing a lightweight 2D-3D pose lifting algorithm that is able to work in real-time while exhibiting accurate performance in both indoor and outdoor environments which offers both an affordable and scalable solution. Notably, our system's time complexity remains nearly constant irrespective of the number of detected individuals, achieving a frame rate of approximately 7-8 fps on a laptop with a commercial-grade GPU. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09199 [pdf, other]

Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation

Authors: Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Bae

Abstract: Recently, foundation models trained on massive datasets to adapt to a wide range of domains have attracted considerable attention and are actively being explored within the computer vision community. Among these, the Segment Anything Model (SAM) stands out for its remarkable progress in generalizability and flexibility for image segmentation tasks, achieved through prompt-based object mask generat… ▽ More Recently, foundation models trained on massive datasets to adapt to a wide range of domains have attracted considerable attention and are actively being explored within the computer vision community. Among these, the Segment Anything Model (SAM) stands out for its remarkable progress in generalizability and flexibility for image segmentation tasks, achieved through prompt-based object mask generation. However, despite its strength, SAM faces two key limitations when applied to customized instance segmentation that segments specific objects or those in unique environments not typically present in the training data: 1) the ambiguity inherent in input prompts and 2) the necessity for extensive additional training to achieve optimal segmentation. To address these challenges, we propose a novel method, customized instance segmentation via prompt learning tailored to SAM. Our method involves a prompt learning module (PLM), which adjusts input prompts into the embedding space to better align with user intentions, thereby enabling more efficient training. Furthermore, we introduce a point matching module (PMM) to enhance the feature representation for finer segmentation by ensuring detailed alignment with ground truth boundaries. Experimental results on various customized instance segmentation scenarios demonstrate the effectiveness of the proposed method. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 11 pages, 10 figures

arXiv:2403.08908 [pdf]

doi 10.1002/adfm.202311287

Electrically Tunable Spin Exchange Splitting in Graphene Hybrid Heterostructure

Authors: Dongwon Shin, Hyeonbeom Kim, Sung Ju Hong, Sehwan Song, Yeongju Choi, Youngkuk Kim, Sungkyun Park, Dongseok Suh, Woo Seok Choi

Abstract: Graphene, with spin and valley degrees of freedom, fosters unexpected physical and chemical properties for the realization of next-generation quantum devices. However, the spin symmetry of graphene is rather robustly protected, hampering manipulation of the spin degrees of freedom for the application of spintronic devices such as electric gate tunable spin filters. We demonstrate that a hybrid het… ▽ More Graphene, with spin and valley degrees of freedom, fosters unexpected physical and chemical properties for the realization of next-generation quantum devices. However, the spin symmetry of graphene is rather robustly protected, hampering manipulation of the spin degrees of freedom for the application of spintronic devices such as electric gate tunable spin filters. We demonstrate that a hybrid heterostructure composed of graphene and LaCoO3 epitaxial thin film exhibits an electrically tunable spin exchange splitting. The large and adjustable spin exchange splitting of 155.9 - 306.5 meV was obtained by the characteristic shifts in both the spin symmetry broken quantum Hall states and the Shubnikov-de-Haas oscillations. Strong hybridization induced charge transfer across the hybrid heterointerface has been identified for the observed spin exchange splitting. The substantial and facile controllability of the spin exchange splitting provides an opportunity for spintronics applications with the electrically-tunable spin polarization in hybrid heterostructures. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures, 1 table

Journal ref: Adv. Funct. Mater. 34, 2311287 (2023)

arXiv:2403.08827 [pdf, other]

Locational Scenario-based Pricing in a Bilateral Distribution Energy Market under Uncertainty

Authors: Hien Thanh Doan, Minsoo Kim, Keunju Song, Hongseok Kim

Abstract: In recent years, there has been a significant focus on advancing the next generation of power systems. Despite these efforts, persistent challenges revolve around addressing the operational impact of uncertainty on predicted data, especially concerning economic dispatch and optimal power flow. To tackle these challenges, we introduce a stochastic day-ahead scheduling approach for a community. This… ▽ More In recent years, there has been a significant focus on advancing the next generation of power systems. Despite these efforts, persistent challenges revolve around addressing the operational impact of uncertainty on predicted data, especially concerning economic dispatch and optimal power flow. To tackle these challenges, we introduce a stochastic day-ahead scheduling approach for a community. This method involves iterative improvements in economic dispatch and optimal power flow, aiming to minimize operational costs by incorporating quantile forecasting. Then, we present a real-time market and payment problem to handle optimization in real-time decision-making and payment calculation. We assess the effectiveness of our proposed method against benchmark results and conduct a test using data from 50 real households to demonstrate its practicality. Furthermore, we compare our method with existing studies in the field across two different seasons of the year. In the summer season, our method decreases optimality gap by 60% compared to the baseline, and in the winter season, it reduces optimality gap by 67%. Moreover, our proposed method mitigates the congestion of distribution network by 16.7\% within a day caused by uncertain energy, which is a crucial aspect for implementing energy markets in the real world. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Showing 151–200 of 5,682 results for author: Kim, H