Search | arXiv e-print repository

$K_{1}^{\pm}$ mesons moving in nuclear matter

Authors: Seokwoo Yeo, HyungJoo Kim, Su Houng Lee

Abstract: Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of… ▽ More Observing the mass shifts of mesons immersed in nuclear matter is interesting, as the changes are expected to shed light on the effects of chiral symmetry breaking on the origin of hadron masses. At the same time, it is important to understand the momentum dependence of the masses for spin-1 mesons, as the changes manifest differently across the two polarization modes. Here, the mass shifts of $K_{1}^{\pm}$ mesons with finite three-momentum in nuclear medium are studied in the QCD sum rule approach. We find that the mass of $K_{1}^{+}$($K_{1}^{-}$) meson is increased(decreased) by the non-trivial momentum effect in both the transverse and longitudinal modes. Specifically, compared to its rest mass in the nuclear medium, in the transverse mode, the mass of $K_{1}^{+}(K_{1}^{-})$ is observed to shift by +2(-55) MeV, while in the longitudinal mode, the mass shift is +13(-11) MeV, all at a momentum of 0.5 GeV. Exploring the medium modifications of $K_{1}$ meson through kaon beams at J-PARC will provide insights on the partial restoration of chiral symmetry in nuclear matter. △ Less

Submitted 10 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures, acknowledgments added

arXiv:2404.04153 [pdf, other]

Evaluation of the performance of the event reconstruction algorithms in the JSNS$^2$ experiment using a $^{252}$Cf calibration source

Authors: D. H. Lee, M. K. Cheoun, J. H. Choi, J. Y. Choi, T. Dodo, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, T. Iida, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. J. Kim, J. Y. Kim, S. B Kim, W. Kim, H. Kinoshita, T. Konno, I. T. Lim , et al. (28 additional authors not shown)

Abstract: JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of th… ▽ More JSNS$^2$ searches for short baseline neutrino oscillations with a baseline of 24~meters and a target of 17~tonnes of the Gd-loaded liquid scintillator. The correct algorithm on the event reconstruction of events, which determines the position and energy of neutrino interactions in the detector, are essential for the physics analysis of the data from the experiment. Therefore, the performance of the event reconstruction is carefully checked with calibrations using $^{252}$Cf source. This manuscript describes the methodology and the performance of the event reconstruction. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03691 [pdf, other]

Upgrade of NaI(Tl) crystal encapsulation for the NEON experiment

Authors: J. J. Choi, E. J. Jeon, J. Y. Kim, K. W. Kim, S. H. Kim, S. K. Kim, Y. D. Kim, Y. J. Ko, B. C. Koh, C. Ha, B. J. Park, S. H. Lee, I. S. Lee, H. Lee, H. S. Lee, J. Lee, Y. M. Oh

Abstract: The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which… ▽ More The Neutrino Elastic-scattering Observation with NaI(Tl) experiment (NEON) aims to detect coherent elastic neutrino-nucleus scattering~(\cenns) in a NaI(Tl) crystal using reactor anti-electron neutrinos at the Hanbit nuclear power plant complex. A total of 13.3 kg of NaI(Tl) crystals were initially installed in December 2020 at the tendon gallery, 23.7$\pm$0.3\,m away from the reactor core, which operates at a thermal power of 2.8\,GW. Initial engineering operation was performed from May 2021 to March 2022 and observed unexpected photomultiplier-induced noise and a decreased light yield that were caused by leakage of liquid scintillator into the detector due to weakness of detector encapsulation. We upgraded the detector encapsulation design to prevent the leakage of the liquid scintillator. Meanwhile two small-sized detectors were replaced with larger ones resulting in a total mass of 16.7\,kg. With this new design implementation, the detector system has been operating stably since April 2022 for over a year without detector gain drop. In this paper, we present an improved crystal encapsulation design and stability of the NEON experiment. △ Less

Submitted 28 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.03679 [pdf, other]

Pulse Shape Discrimination in JSNS$^2$

Authors: T. Dodo, M. K. Cheoun, J. H. Choi, J. Y. Choi, J. Goh, K. Haga, M. Harada, S. Hasegawa, W. Hwang, T. Iida, H. I. Jang, J. S. Jang, K. K. Joo, D. E. Jung, S. K. Kang, Y. Kasugai, T. Kawasaki, E. J. Kim, J. Y. Kim, S. B. Kim, W. Kim, H. Kinoshita, T. Konno, D. H. Lee, I. T. Lim , et al. (29 additional authors not shown)

Abstract: JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is loca… ▽ More JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment that is searching for sterile neutrinos via the observation of $\barν_μ \rightarrow \barν_e$ appearance oscillations using neutrinos with muon decay-at-rest. For this search, rejecting cosmic-ray-induced neutron events by Pulse Shape Discrimination (PSD) is essential because the JSNS$^2$ detector is located above ground, on the third floor of the building. We have achieved 95$\%$ rejection of neutron events while kee** 90$\%$ of signal, electron-like events using a data driven likelihood method. △ Less

Submitted 28 March, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2111.07482, arXiv:2308.02722

arXiv:2404.03613 [pdf, other]

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Authors: Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh

Abstract: As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field,… ▽ More As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce an efficient training strategy for faster convergence and higher quality. Project page: https://jeongminb.github.io/e-d3dgs/ △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.03188 [pdf]

Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture

Authors: W. S. H. M. W. Ahmad, M. F. A. Fauzi, M. K. Abdullahi, Jenny T. H. Lee, N. S. A. Basry, A Yahaya, A. M. Ismail, A. Adam, Elaine W. L. Chan, F. S. Abas

Abstract: Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (… ▽ More Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (LHP), nasopharyngeal carcinoma (NPC) and normal tissue. This paper is our first initiative to identify the difference between NPC, NPI and normal cases. Seven whole slide images (WSIs) with gigapixel resolutions from seven different patients and two hospitals were experimented with using two test setups, consisting of a different set of images. The tissue regions are patched into smaller blocks and classified using DenseNet architecture with 21 dense layers. Two tests are carried out, each for proof of concept (Test 1) and real-test scenario (Test 2). The accuracy achieved for NPC class is 94.8% for Test 1 and 67.0% for Test 2. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: This article has been accepted in the Journal of Engineering Science and Technology (JESTEC) and awaiting publication

arXiv:2404.02581 [pdf, other]

Multi-Granularity Guided Fusion-in-Decoder

Authors: Eunseong Choi, Hyeri Lee, Jongwuk Lee

Abstract: In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose th… ▽ More In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD), discerning evidence across multiple levels of granularity. Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification. It aggregates evident sentences into an anchor vector that instructs the decoder. Additionally, it improves decoding efficiency by reusing the results of passage re-ranking for passage pruning. Through our experiments, MGFiD outperforms existing models on the Natural Questions (NQ) and TriviaQA (TQA) datasets, highlighting the benefits of its multi-granularity solution. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Findings of the Association for Computational Linguistics: NAACL 2024; 12 pages; 8 figures and 5 tables. Code and data available at http://github.com/eunseongc/MGFiD

arXiv:2404.02486 [pdf, other]

Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach

Authors: Hyeonho Noh, Harim Lee, Hyun Jong Yang

Abstract: This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data deman… ▽ More This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data demands fluctuate. In unsaturated traffic conditions, considering packet volumes per user introduces a combinatorial problem, requiring the simultaneous optimization of MU-MIMO user selection and RA along the time-frequency-space axis. Consequently, dealing with the combinatorial nature of this problem, characterized by a large cardinality of unknown variables, poses a challenge that conventional optimization methods find nearly impossible to address. In response, this letter proposes an approach with deep hierarchical reinforcement learning (DHRL) to solve the joint problem. Rather than simply adopting off-the-shelf DHRL, we \textit{tailor} the DHRL to the joint USRA and MS problem, thereby significantly improving the convergence speed and throughput. Extensive simulation results show that the proposed algorithm achieves significantly improved throughput compared to the existing schemes under various unsaturated traffic conditions. △ Less

Submitted 15 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02157 [pdf, other]

Segment Any 3D Object with Language

Authors: Seungjun Lee, Yuyang Zhao, Gim Hee Lee

Abstract: In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with free-form language instructions. Earlier works that rely on only annotated base categories for training suffer from limited generalization to unseen novel categories. Recent works mitigate poor generalizability to novel categories by generating class-agnostic masks or projecting generalized masks from 2D to 3D, b… ▽ More In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with free-form language instructions. Earlier works that rely on only annotated base categories for training suffer from limited generalization to unseen novel categories. Recent works mitigate poor generalizability to novel categories by generating class-agnostic masks or projecting generalized masks from 2D to 3D, but disregard semantic or geometry information, leading to sub-optimal performance. Instead, generating generalizable but semantic-related masks directly from 3D point clouds would result in superior outcomes. In this paper, we introduce Segment any 3D Object with LanguagE (SOLE), which is a semantic and geometric-aware visual-language learning framework with strong generalizability by generating semantic-related masks directly from 3D point clouds. Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder. In addition, to align the 3D segmentation model with various language instructions and enhance the mask quality, we introduce three types of multimodal associations as supervision. Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks, and the results are even close to the fully-supervised counterpart despite the absence of class annotations in the training. Furthermore, extensive qualitative results demonstrate the versatility of our SOLE to language instructions. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Project Page: https://cvrp-sole.github.io

arXiv:2404.02072 [pdf, other]

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Authors: **bae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park

Abstract: Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attenti… ▽ More Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr. △ Less

Submitted 24 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024 (Best paper award candidate)

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01690 [pdf, other]

RefQSR: Reference-based Quantization for Image Super-Resolution Networks

Authors: Hongjae Lee, Jun-Sang Yoo, Seung-Won Jung

Abstract: Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively stu… ▽ More Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively studied. However, existing quantization methods developed for SISR have yet to effectively exploit image self-similarity, which is a new direction for exploration in this study. We introduce a novel method called reference-based quantization for image super-resolution (RefQSR) that applies high-bit quantization to several representative patches and uses them as references for low-bit quantization of the rest of the patches in an image. To this end, we design dedicated patch clustering and reference-based quantization modules and integrate them into existing SISR network quantization methods. The experimental results demonstrate the effectiveness of RefQSR on various SISR networks and quantization methods. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Transactions on Image Processing (TIP)

arXiv:2404.01687 [pdf, other]

Search for a sub-eV sterile neutrino using Daya Bay's full dataset

Authors: F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding, Y. Y. Ding , et al. (176 additional authors not shown)

Abstract: This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis… ▽ More This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2404.01628 [pdf, other]

Learning Equi-angular Representations for Online Continual Learning

Authors: Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

Abstract: Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so th… ▽ More Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.01123 [pdf, other]

CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment

Authors: Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok, Sunghyun Cho

Abstract: Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is con… ▽ More Recent image tone adjustment (or enhancement) approaches have predominantly adopted supervised learning for learning human-centric perceptual assessment. However, these approaches are constrained by intrinsic challenges of supervised learning. Primarily, the requirement for expertly-curated or retouched images escalates the data acquisition expenses. Moreover, their coverage of target style is confined to stylistic variants inferred from the training data. To surmount the above challenges, we propose an unsupervised learning-based approach for text-based image tone adjustment method, CLIPtone, that extends an existing image enhancement method to accommodate natural language descriptions. Specifically, we design a hyper-network to adaptively modulate the pretrained parameters of the backbone model based on text description. To assess whether the adjusted image aligns with the text description without ground truth image, we utilize CLIP, which is trained on a vast set of language-image pairs and thus encompasses knowledge of human perception. The major advantages of our approach are three fold: (i) minimal data collection expenses, (ii) support for a range of adjustments, and (iii) the ability to handle novel text descriptions unseen in training. Our approach's efficacy is demonstrated through comprehensive experiments, including a user study. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.01019 [pdf, other]

Source-Aware Training Enables Knowledge Attribution in Language Models

Authors: Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng

Abstract: Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs su… ▽ More Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a post pretraining recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training can easily be applied to pretrained LLMs off the shelf, and diverges minimally from existing pretraining/fine-tuning frameworks. Through experiments on carefully curated data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's quality compared to standard pretraining. Our results also highlight the importance of data augmentation in achieving attribution. Code and data available here: \url{https://github.com/mukhal/intrinsic-source-citation} △ Less

Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00963 [pdf, other]

doi 10.1039/D4CP00517A

Inversion and Tunability of Van Hove Singularities in $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs) kagome metals

Authors: Sangjun Sim, Min Yong Jeong, Hyunggeun Lee, Dong Hyun David Lee, Myung Joon Han

Abstract: To understand the alkali-metal-dependent material properties of recently discovered $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs), we conducted a detailed electronic structure analysis based on first-principles density functional theory calculations. Contrary to the case of $A$ = K and Rb, the energetic positions of the low-lying Van Hove singularities are reversed in CsV$_{3}$Sb$_{5}$, and the charact… ▽ More To understand the alkali-metal-dependent material properties of recently discovered $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs), we conducted a detailed electronic structure analysis based on first-principles density functional theory calculations. Contrary to the case of $A$ = K and Rb, the energetic positions of the low-lying Van Hove singularities are reversed in CsV$_{3}$Sb$_{5}$, and the characteristic higher-order Van Hove point gets closer to the Fermi level. We found that this notable difference can be attributed to the chemical effect, apart from structural differences. Due to their different orbital compositions, Van Hove points show qualitatively different responses to the structure changes. A previously unnoticed highest lying point can be lowered, locating close to or even below the other ones in response to a reasonable range of bi- and uni-axial strain. Our results can be useful in better understanding the material-dependent features reported in this family and in realizing experimental control of exotic quantum phases. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Physical Chemistry Chemical Physics (PCCP) in press

arXiv:2404.00931 [pdf, other]

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

Authors: Yunsong Wang, Hanlin Chen, Gim Hee Lee

Abstract: Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding. However, the generalizability of existing methods is constrained due to their framework designs and their reliance on 3D data. We address this limitation by introducing Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF), a novel approach offering a generalizable… ▽ More Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding. However, the generalizability of existing methods is constrained due to their framework designs and their reliance on 3D data. We address this limitation by introducing Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF), a novel approach offering a generalizable implicit representation of 3D scenes with open-vocabulary semantics. We aggregate the geometry-aware features using a cost volume, and propose a Multi-view Joint Fusion module to aggregate multi-view features through a cross-view attention mechanism, which effectively predicts view-specific blending weights for both colors and open-vocabulary features. Remarkably, our GOV-NeSF exhibits state-of-the-art performance in both 2D and 3D open-vocabulary semantic segmentation, eliminating the need for ground truth semantic labels or depth priors, and effectively generalize across scenes and datasets without fine-tuning. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00874 [pdf, other]

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Authors: Jie Long Lee, Chen Li, Gim Hee Lee

Abstract: We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate… ▽ More We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate the inconsistency problem via the inherent multi-view consistency property of NeRF. Specifically, our I3DS alternates between upscaling low-resolution (LR) rendered images with diffusion models, and updating the underlying 3D representation with standard NeRF training. We further introduce Renoised Score Distillation (RSD), a novel score-distillation objective for 2D image resolution. Our RSD combines features from ancestral sampling and Score Distillation Sampling (SDS) to generate sharp images that are also LR-consistent. Qualitative and quantitative results on both synthetic and real-world datasets demonstrate that our DiSR-NeRF can achieve better results on NeRF super-resolution compared with existing works. Code and video results available at the project website. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00733 [pdf, other]

Smooth Information Gathering in Two-Player Noncooperative Games

Authors: Fernando Palafox, Jesse Milzman, Dong Ho Lee, Ryan Park, David Fridovich-Keil

Abstract: We present a mathematical framework for modeling two-player noncooperative games in which one player (the defender) is uncertain of the costs of the game and the second player's (the attacker's) intention but can preemptively allocate information-gathering resources to reduce this uncertainty. We obtain the defender's decisions by solving a two-stage problem. In Stage 1, the defender allocates inf… ▽ More We present a mathematical framework for modeling two-player noncooperative games in which one player (the defender) is uncertain of the costs of the game and the second player's (the attacker's) intention but can preemptively allocate information-gathering resources to reduce this uncertainty. We obtain the defender's decisions by solving a two-stage problem. In Stage 1, the defender allocates information-gathering resources, and in Stage 2, the information-gathering resources output a signal that informs the defender about the costs of the game and the attacker's intent, and then both players play a noncooperative game. We provide a gradient-based algorithm to solve the two-stage game and apply this framework to a tower-defense game which can be interpreted as a variant of a Colonel Blotto game with smooth payoff functions and uncertainty over battlefield valuations. Finally, we analyze how optimal decisions shift with changes in information-gathering allocations and perturbations in the cost functions. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: https://github.com/CLeARoboticsLab/GamesVoI.jl

arXiv:2404.00559 [pdf, other]

Hierarchical Climate Control Strategy for Electric Vehicles with Door-Opening Consideration

Authors: Sanghyeon Nam, Hye** Lee, Youngki Kim, Kyoung hyun Kwak, Kyoungseok Han

Abstract: This study proposes a novel climate control strategy for electric vehicles (EVs) by addressing door-opening interruptions, an overlooked aspect in EV thermal management. We create and validate an EV simulation model that incorporates door-opening scenarios. Three controllers are compared using the simulation model: (i) a hierarchical non-linear model predictive control (NMPC) with a unique coolant… ▽ More This study proposes a novel climate control strategy for electric vehicles (EVs) by addressing door-opening interruptions, an overlooked aspect in EV thermal management. We create and validate an EV simulation model that incorporates door-opening scenarios. Three controllers are compared using the simulation model: (i) a hierarchical non-linear model predictive control (NMPC) with a unique coolant dividing layer and a component for cabin air inflow regulation based on door-opening signals; (ii) a single MPC controller; and (iii) a rule-based controller. The hierarchical controller outperforms, reducing door-opening temperature drops by 46.96% and 51.33% compared to single layer MPC and rule-based methods in the relevant section. Additionally, our strategy minimizes the maximum temperature gaps between the sections during recovery by 86.4% and 78.7%, surpassing single layer MPC and rule-based approaches, respectively. We believe that this result opens up future possibilities for incorporating the thermal comfort of passengers across all sections within the vehicle. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: This paper, intended for presentation at the IEEE Intelligent Vehicles Symposium (IV) 2024, comprises six pages and includes eight figures

arXiv:2403.19813 [pdf, ps, other]

Zaremba problem with degenerate weights

Authors: Anna Kh. Balci, Ho-Sik Lee

Abstract: We establish Zaremba problem for Laplacian and $p$-Laplacian with degenerate weights when the Dirichlet condition is only imposed in a set of positive weighted capacity. We prove weighted Sobolev-Poincaré inequality with sharp scaling-invariant constants involving weighted capacity. Then we show higher integrability of the gradient of the solution (Meyers estimate) with minimal conditions on the p… ▽ More We establish Zaremba problem for Laplacian and $p$-Laplacian with degenerate weights when the Dirichlet condition is only imposed in a set of positive weighted capacity. We prove weighted Sobolev-Poincaré inequality with sharp scaling-invariant constants involving weighted capacity. Then we show higher integrability of the gradient of the solution (Meyers estimate) with minimal conditions on the part of the boundary where the Dirichlet condition is assumed. Our results are new both for the linear $p=2$ and nonlinear case and include problems with the weight not only as a measure but also as a multiplier of the gradient of the solution. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19427 [pdf]

Dynamic Phase Enabled Topological Mode Steering in Composite Su-Schrieffer-Heeger Waveguide Arrays

Authors: Min Tang, Chi Pang, Christian N. Saggau, Haiyun Dong, Ching Hua Lee, Ronny Thomale, Sebastian Klembt, Ion Cosma Fulga, Jeroen Van Den Brink, Yana Vaynzof, Oliver G. Schmidt, Jiawei Wang, Libo Ma

Abstract: Topological boundary states localize at interfaces whenever the interface implies a change of the associated topological invariant encoded in the geometric phase. The generically present dynamic phase, however, which is energy and time dependent, has been known to be non-universal, and hence not to intertwine with any topological geometric phase. Using the example of topological zero modes in comp… ▽ More Topological boundary states localize at interfaces whenever the interface implies a change of the associated topological invariant encoded in the geometric phase. The generically present dynamic phase, however, which is energy and time dependent, has been known to be non-universal, and hence not to intertwine with any topological geometric phase. Using the example of topological zero modes in composite Su-Schrieffer-Heeger (c-SSH) waveguide arrays with a central defect, we report on the selective excitation and transition of topological boundary mode based on dynamic phase-steered interferences. Our work thus provides a new knob for the control and manipulation of topological states in composite photonic devices, indicating promising applications where topological modes and their bandwidth can be jointly controlled by the dynamic phase, geometric phase, and wavelength in on-chip topological devices. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18644 [pdf, other]

Analysis of the monotonicity method for an anisotropic scatterer with a conductive boundary

Authors: Victor Hughes, Isaac Harris, Hee** Lee

Abstract: In this paper, we consider the inverse scattering problem associated with an anisotropic medium with a conductive boundary. We will assume that the corresponding far-field pattern is known/measured and we consider two inverse problems. First, we show that the far-field data uniquely determines the boundary coefficient. Next, since it is known that anisotropic coefficients are not uniquely determin… ▽ More In this paper, we consider the inverse scattering problem associated with an anisotropic medium with a conductive boundary. We will assume that the corresponding far-field pattern is known/measured and we consider two inverse problems. First, we show that the far-field data uniquely determines the boundary coefficient. Next, since it is known that anisotropic coefficients are not uniquely determined by this data we will develop a qualitative method to recover the scatterer. To this end, we study the so-called monotonicity method applied to this inverse shape problem. This method has recently been applied to some inverse scattering problems but this is the first time it has been applied to an anisotropic scatterer. This method allows one to recover the scatterer but considering the eigenvalues of an operator associated with the far--field operator. We present some simple numerical reconstructions to illustrate our theory in two dimensions. For our reconstructions, we need to compute the adjoint of the Herglotz wave function as an operator map** into $H^1$ of a small ball. △ Less

Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18288 [pdf, other]

Identifying the transverse and longitudinal modes of the $K^*$ and $K_{1}$ mesons through their angular dependent decay modes

Authors: In Woo Park, Hiroyuki Sako, Kazuya Aoki, Philipp Gubler, Su Houng Lee

Abstract: Observing the mass shifts of chiral partners will provide invaluable insight into the role of chiral symmetry breaking in the generation of hadron masses. Because both the $K^*$ and $K_1$ mesons have vacuum widths smaller than 100 MeV, they are ideal candidates for realizing mass shift measurements. On the other hand, the different momentum dependence of the longitudinal and transverse modes smear… ▽ More Observing the mass shifts of chiral partners will provide invaluable insight into the role of chiral symmetry breaking in the generation of hadron masses. Because both the $K^*$ and $K_1$ mesons have vacuum widths smaller than 100 MeV, they are ideal candidates for realizing mass shift measurements. On the other hand, the different momentum dependence of the longitudinal and transverse modes smear the peak positions. In this work, we analyze the angular dependence of the two-body decays of both the $K^*$ and $K_1$. It is found that the longitudinal and transverse modes of the $K^*$ can be isolated by observing the pseudoscalar decay in either the forward or perpendicular directions, respectively. For the $K_1$ decaying into a vector meson and a pseudoscalar meson, one can accomplish the same goal by further observing the polarization of the vector meson through its angular dependence on the two pseudoscalar meson decay. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 11 pages, 6 figures

arXiv:2403.17771 [pdf, other]

Giant planet formation in the solar system

Authors: Anuja Raorane, Ramon Brasser, Soko Matsumura, Tommy Chi Ho Lau, Man Hoi Lee, Audrey Bouvier

Abstract: The formation history of Jupiter has been of interest due to its ability to shape the solar system's history. Yet little attention has been paid to the formation and growth of Saturn and the other giant planets. Here, we explore the implications of the simplest disc and pebble accretion model with steady-state accretion on the formation of giant planets in the solar system through N-body simulatio… ▽ More The formation history of Jupiter has been of interest due to its ability to shape the solar system's history. Yet little attention has been paid to the formation and growth of Saturn and the other giant planets. Here, we explore the implications of the simplest disc and pebble accretion model with steady-state accretion on the formation of giant planets in the solar system through N-body simulations. We conducted a statistical survey of different disc parameters and initial conditions of the protoplanetary disc to establish which combination best reproduces the present outer solar system. We examined the effect of the initial planetesimal disc mass, the number of planetesimals and their size-frequency distribution slope, pebble accretion prescription, and sticking efficiency on the likelihood of forming gas giants and their orbital distribution. The results reveal that the accretion sticking efficiency is the most sensitive parameter for controlling the final masses and number of giant planets. We have been unable to replicate the formation of all three types of giant planets in the solar system in a single simulation. The probability distribution of the final location of the giant planets is approximately constant in $\log r$, suggesting there is a slight preference for formation closer to the Sun but no preference for more massive planets to form closer. The eccentricity distribution has a higher mean for more massive planets, indicating that systems with more massive planets are more violent. The formation timescales of the cores of the gas giants are distinct, suggesting that they formed sequentially. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 17 pages, 18 figures, 1 table

arXiv:2403.17329 [pdf, other]

Deep Support Vectors

Authors: Junhoo Lee, Hyunho Lee, Kyomin Hwang, Nojun Kwak

Abstract: Deep learning has achieved tremendous success. \nj{However,} unlike SVMs, which provide direct decision criteria and can be trained with a small dataset, it still has significant weaknesses due to its requirement for massive datasets during training and the black-box characteristics on decision criteria. \nj{This paper addresses} these issues by identifying support vectors in deep learning models.… ▽ More Deep learning has achieved tremendous success. \nj{However,} unlike SVMs, which provide direct decision criteria and can be trained with a small dataset, it still has significant weaknesses due to its requirement for massive datasets during training and the black-box characteristics on decision criteria. \nj{This paper addresses} these issues by identifying support vectors in deep learning models. To this end, we propose the DeepKKT condition, an adaptation of the traditional Karush-Kuhn-Tucker (KKT) condition for deep learning models, and confirm that generated Deep Support Vectors (DSVs) using this condition exhibit properties similar to traditional support vectors. This allows us to apply our method to few-shot dataset distillation problems and alleviate the black-box characteristics of deep learning models. Additionally, we demonstrate that the DeepKKT condition can transform conventional classification models into generative models with high fidelity, particularly as latent \jh{generative} models using class labels as latent variables. We validate the effectiveness of DSVs \nj{using common datasets (ImageNet, CIFAR10 \nj{and} CIFAR100) on the general architectures (ResNet and ConvNet)}, proving their practical applicability. (See Fig.~\ref{fig:generated}) △ Less

Submitted 27 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16011 [pdf, other]

Uncovering the Ghostly Remains of an Extremely Diffuse Satellite in the Remote Halo of NGC 253

Authors: Sakurako Okamoto, Annette M. N. Ferguson, Nobuo Arimoto, Itsuki Ogami, Rokas Zemaitis, Masashi Chiba, Mike J. Irwin, In Sung Jang, ** Koda, Yutaka Komiyama, Myung Gyoon Lee, Jeong Hwan Lee, Michael Rich, Masayuki Tanaka, Mikito Tanaka

Abstract: We present the discovery of NGC253-SNFC-dw1, a new satellite galaxy in the remote stellar halo of the Sculptor Group spiral, NGC 253. The system was revealed using deep resolved star photometry obtained as part of the Subaru Near-Field Cosmology Survey that uses the Hyper Suprime-Cam on the Subaru Telescope. Although rather luminous ($\rm{M_{V}} = -11.7 \pm 0.2$) and massive (… ▽ More We present the discovery of NGC253-SNFC-dw1, a new satellite galaxy in the remote stellar halo of the Sculptor Group spiral, NGC 253. The system was revealed using deep resolved star photometry obtained as part of the Subaru Near-Field Cosmology Survey that uses the Hyper Suprime-Cam on the Subaru Telescope. Although rather luminous ($\rm{M_{V}} = -11.7 \pm 0.2$) and massive ($M_* \sim 1.25\times 10^7~\rm{M}_{\odot}$), the system is one of the most diffuse satellites yet known, with a half-light radius of $\rm{R_{h}} = 3.37 \pm 0.36$ kpc and an average surface brightness of $\sim 30.1$ mag arcmin$^{-2}$ within the $\rm{R_{h}}$. The colour-magnitude diagram shows a dominant old ($\sim 10$ Gyr) and metal-poor ($\rm{[M/H]}=-1.5 \pm 0.1$ dex) stellar population, as well as several candidate thermally-pulsing asymptotic giant branch stars. The distribution of red giant branch stars is asymmetrical and displays two elongated tidal extensions pointing towards NGC 253, suggestive of a highly disrupted system being observed at apocenter. NGC253-SNFC-dw1 has a size comparable to that of the puzzling Local Group dwarfs Andromeda XIX and Antlia 2 but is two magnitudes brighter. While unambiguous evidence of tidal disruption in these systems has not yet been demonstrated, the morphology of NGC253-SNFC-dw1 clearly shows that this is a natural path to produce such diffuse and extended galaxies. The surprising discovery of this system in a previously well-searched region of the sky emphasizes the importance of surface brightness limiting depth in satellite searches. △ Less

Submitted 26 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 10 pages, 4 figures, 1 table. Accepted for publication in ApJL

arXiv:2403.15730 [pdf, other]

Incorporating Heterogeneous Interactions for Ecological Biodiversity

Authors: Jong Il Park, Deok-Sun Lee, Sang Hoon Lee, Hye ** Park

Abstract: Understanding the behaviors of ecological systems is challenging given their multi-faceted complexity. To proceed, theoretical models such as Lotka-Volterra dynamics with random interactions have been investigated by the dynamical mean-field theory to provide insights into underlying principles such as how biodiversity and stability depend on the randomness in interaction strength. Yet the fully-c… ▽ More Understanding the behaviors of ecological systems is challenging given their multi-faceted complexity. To proceed, theoretical models such as Lotka-Volterra dynamics with random interactions have been investigated by the dynamical mean-field theory to provide insights into underlying principles such as how biodiversity and stability depend on the randomness in interaction strength. Yet the fully-connected structure assumed in these previous studies is not realistic as revealed by a vast amount of empirical data. We derive a generic formula for the abundance distribution under an arbitrary distribution of degree, the number of interacting neighbors, which leads to degree-dependent abundance patterns of species. Notably, in contrast to the well-mixed system, the number of surviving species can be reduced as the community becomes cooperative in heterogeneous interaction structures. Our study, therefore, demonstrates that properly taking into account heterogeneity in the interspecific interaction structure is indispensable to understanding the diversity in large ecosystems, and our general theoretical framework can apply to a much wider range of interacting many-body systems. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.15672 [pdf, other]

Deciphering the Digital Veil: Exploring the Ecosystem of DNS HTTPS Resource Records

Authors: Hongying Dong, Yizhe Zhang, Hyeonmin Lee, Shumon Huque, Yixin Sun

Abstract: The DNS HTTPS resource record is a new DNS record type designed for the delivery of configuration information and parameters required to initiate connections to HTTPS network services. It provides the ability to perform zone apex redirection to a third-party provider, which the existing CNAME record cannot do. In addition, it is a key enabler for TLS Encrypted ClientHello (ECH) by providing the cr… ▽ More The DNS HTTPS resource record is a new DNS record type designed for the delivery of configuration information and parameters required to initiate connections to HTTPS network services. It provides the ability to perform zone apex redirection to a third-party provider, which the existing CNAME record cannot do. In addition, it is a key enabler for TLS Encrypted ClientHello (ECH) by providing the cryptographic keying material needed to encrypt the initial exchange. To understand the adoption and security of this new DNS HTTPS record, we perform a longitudinal study on the server-side deployment of DNS HTTPS for Tranco top 1 million domains over 8 months, as well as the client-side support for DNS HTTPS from major browsers. To the best of knowledge, our work is the first longitudinal study on DNS HTTPS server deployment, and the first known study on client-side support for DNS HTTPS. Despite the rapidly growing trend of DNS HTTPS adoption, our study highlights concerns in the deployment by both servers and clients, such as the complexity in properly maintaining HTTPS records and the concerning hardfail mechanisms in browser when using HTTPS records. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 20 pages

arXiv:2403.15456 [pdf, other]

WoLF: Wide-scope Large Language Model Framework for CXR Understanding

Authors: Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang

Abstract: Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive… ▽ More Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive Visual Question Answering (VQA), especially when additional health-related data like medication history and prior diagnoses are needed. (2) Previous methods use raw CXR reports, which are often arbitrarily structured. While modern language models can understand various text formats, restructuring reports for clearer, organized anatomy-based information could enhance their usefulness. (3) Current evaluation methods for CXR-VQA primarily emphasize linguistic correctness, lacking the capability to offer nuanced assessments of the generated answers. In this work, to address the aforementioned caveats, we introduce WoLF, a Wide-scope Large Language Model Framework for CXR understanding. To resolve (1), we capture multi-faceted records of patients, which are utilized for accurate diagnoses in real-world clinical scenarios. Specifically, we adopt the Electronic Health Records (EHR) to generate instruction-following data suited for CXR understanding. Regarding (2), we enhance report generation performance by decoupling knowledge in CXR reports based on anatomical structure even within the attention step via masked attention. To address (3), we introduce an AI-evaluation protocol optimized for assessing the capabilities of LLM. Through extensive experimental validations, WoLF demonstrates superior performance over other models on MIMIC-CXR in the AI-evaluation arena about VQA (up to +9.47%p mean score) and by metrics about report generation (+7.3%p BLEU-1). △ Less

Submitted 29 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 11 pages main paper, 2 pages supplementary

arXiv:2403.15266 [pdf]

Graph neural network coarse-grain force field for the molecular crystal RDX

Authors: Brian H. Lee, James P. Larentzos, John K. Brennan, Alejandro Strachan

Abstract: Condense phase molecular systems organize in wide range of distinct molecular configurations, including amorphous melt and glass as well as crystals often exhibiting polymorphism, that originate from their intricate intra- and intermolecular forces. While accurate coarse-grain (CG) models for these materials are critical to understand phenomena beyond the reach of all-atom simulations, current mod… ▽ More Condense phase molecular systems organize in wide range of distinct molecular configurations, including amorphous melt and glass as well as crystals often exhibiting polymorphism, that originate from their intricate intra- and intermolecular forces. While accurate coarse-grain (CG) models for these materials are critical to understand phenomena beyond the reach of all-atom simulations, current models cannot capture the diversity of molecular structures. We introduce a generally applicable approach to develop CG force fields for molecular crystals combining graph neural networks (GNN) and data from an all-atom simulations and apply it to the high-energy density material RDX. We address the challenge of expanding the training data with relevant configurations via an iterative procedure that performs CG molecular dynamics of processes of interest and reconstructs the atomistic configurations using a pre-trained neural network decoder. The multi-site CG model uses a GNN architecture constructed to satisfy translational invariance and rotational covariance for forces. The resulting model captures both crystalline and amorphous states for a wide range of temperatures and densities. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.14269 [pdf, ps, other]

Dirac's theorem for linear hypergraphs

Authors: Seonghyuk Im, Hyunwoo Lee

Abstract: Dirac's theorem states that any $n$-vertex graph $G$ with even integer $n$ satisfying $δ(G) \geq n/2$ contains a perfect matching. We generalize this to $k$-uniform linear hypergraphs by proving the following. Any $n$-vertex $k$-uniform linear hypergraph $H$ with minimum degree at least $\frac{n}{k} + Ω(1)$ contains a matching that covers at least $(1-o(1))n$ vertices. This minimum degree conditio… ▽ More Dirac's theorem states that any $n$-vertex graph $G$ with even integer $n$ satisfying $δ(G) \geq n/2$ contains a perfect matching. We generalize this to $k$-uniform linear hypergraphs by proving the following. Any $n$-vertex $k$-uniform linear hypergraph $H$ with minimum degree at least $\frac{n}{k} + Ω(1)$ contains a matching that covers at least $(1-o(1))n$ vertices. This minimum degree condition is asymptotically tight and obtaining perfect matching is impossible with any degree condition. Furthermore, we show that if $δ(H) \geq (\frac{1}{k}+o(1))n$, then $H$ contains almost spanning linear cycles, almost spanning hypertrees with $o(n)$ leaves, and ``long subdivisions'' of any $o(\sqrt{n})$-vertex graphs. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 13 pages

arXiv:2403.13680 [pdf, other]

Step-Calibrated Diffusion for Biomedical Optical Image Restoration

Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

Abstract: High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, wh… ▽ More High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, which can result in diagnostic errors and misguided treatment. Restoration of optical images is a challenging computer vision task because the sources of image degradation are multi-factorial, stochastic, and tissue-dependent, preventing a straightforward method to obtain paired low-quality/high-quality data. Here, we present Restorative Step-Calibrated Diffusion (RSCD), an unpaired image restoration method that views the image restoration problem as completing the finishing steps of a diffusion-based image generation task. RSCD uses a step calibrator model to dynamically determine the severity of image degradation and the number of steps required to complete the reverse diffusion process for image restoration. RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation metrics for restoring optical images. Medical imaging experts consistently prefer images restored using RSCD in blinded comparison experiments and report minimal to no hallucinations. Finally, we show that RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging. Our code is available at https://github.com/MLNeurosurg/restorative_step-calibrated_diffusion. △ Less

Submitted 16 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13289 [pdf, other]

Text-to-3D Shape Generation

Authors: Han-Hung Lee, Manolis Savva, Angel X. Chang

Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination a… ▽ More Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination as they enable non-expert users to easily create 3D content directly from text. However, there are still many limitations and challenges remaining in this problem space. In this state-of-the-art report, we provide a survey of the underlying technology and methods enabling text-to-3D shape generation to summarize the background literature. We then derive a systematic categorization of recent work on text-to-3D shape generation based on the type of supervision data required. Finally, we discuss limitations of the existing categories of methods, and delineate promising directions for future work. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13110 [pdf, other]

Single-Shot Readout and Weak Measurement of a Tin-Vacancy Qubit in Diamond

Authors: Eric I. Rosenthal, Souvik Biswas, Giovanni Scuri, Hope Lee, Abigail J. Stein, Hannah C. Kleidermacher, Jakob Grzesik, Alison E. Rugar, Shahriar Aghaeimeibodi, Daniel Riedel, Michael Titze, Edward S. Bielejec, Joonhee Choi, Christopher P. Anderson, Jelena Vuckovic

Abstract: The negatively charged tin-vacancy center in diamond (SnV$^-$) is an emerging platform for building the next generation of long-distance quantum networks. This is due to the SnV$^-$'s favorable optical and spin properties including bright emission, insensitivity to electronic noise, and long spin coherence times at temperatures above 1 Kelvin. Here, we demonstrate measurement of a single SnV$^-$ e… ▽ More The negatively charged tin-vacancy center in diamond (SnV$^-$) is an emerging platform for building the next generation of long-distance quantum networks. This is due to the SnV$^-$'s favorable optical and spin properties including bright emission, insensitivity to electronic noise, and long spin coherence times at temperatures above 1 Kelvin. Here, we demonstrate measurement of a single SnV$^-$ electronic spin with a single-shot readout fidelity of $87.4\%$, which can be further improved to $98.5\%$ by conditioning on multiple readouts. We show this performance is compatible with rapid microwave spin control, demonstrating that the trade-off between optical readout and spin control inherent to group-IV centers in diamond can be overcome for the SnV$^-$. Finally, we use weak quantum measurement to study measurement induced dephasing; this illuminates the fundamental interplay between measurement and decoherence in quantum mechanics, and makes use of the qubit's spin coherence as a metrological tool. Taken together, these results overcome an important hurdle in the development of the SnV$^-$ based quantum technologies, and in the process, develop techniques and understanding broadly applicable to the study of solid-state quantum emitters. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11324 [pdf, other]

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

Authors: Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

Abstract: During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue,… ▽ More During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue, we propose a novel approach called GeoGaussian. Based on the smoothly connected areas observed from point clouds, this method introduces a novel pipeline to initialize thin Gaussians aligned with the surfaces, where the characteristic can be transferred to new generations through a carefully designed densification strategy. Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints. Benefiting from the proposed architecture, the generative ability of 3D Gaussians is enhanced, especially in structured regions. Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction, as evaluated qualitatively and quantitatively on public datasets. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11140 [pdf]

doi 10.1039/C9RA07700F

Theoretical investigation of the vertical dielectric screening dependence on defects for few-layered van der Waals materials

Authors: Amit Singh, Seunghan Lee, Hyeonhu Bae, Jahyun Koo, Li Yang, Hoonkyung Lee

Abstract: First-principle calculations were employed to analyze the effects induced by vacancies of molybdenum (Mo) and sulfur (S) on the dielectric properties of few-layered MoS2. We explored the combined effects of vacancies and dipole interactions on the dielectric properties of few-layered MoS2. In the presence of dielectric screening, we investigated uniformly distributed Mo and S vacancies, and then c… ▽ More First-principle calculations were employed to analyze the effects induced by vacancies of molybdenum (Mo) and sulfur (S) on the dielectric properties of few-layered MoS2. We explored the combined effects of vacancies and dipole interactions on the dielectric properties of few-layered MoS2. In the presence of dielectric screening, we investigated uniformly distributed Mo and S vacancies, and then considered the case of concentrated vacancies. Our results show that the dielectric screening remarkably depends on the distribution of vacancies owing to the polarization induced by the vacancies and on the interlayer distances. This conclusion was validated for a wide range of wide-gap semiconductors with different positions and distributions of vacancies, providing an effective and reliable method for calculating and predicting electrostatic screening of dimensionally reduced materials. We further provided a method for engineering the dielectric constant by changing the interlayer distance, tuning the number of vacancies and the distribution of vacancies in few-layered van der Waals materials for their application in nanodevices and supercapacitors. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Journal ref: RSC Adv., 2019, 9, 40309-40315

arXiv:2403.11094 [pdf, other]

Nonlinear Self-Interference Cancellation With Learnable Orthonormal Polynomials for Full-Duplex Wireless Systems

Authors: Hyowon Lee, Jungyeon Kim, Geon Choi, Ian P. Roberts, **seok Choi, Namyoon Lee

Abstract: Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This beco… ▽ More Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This becomes especially difficult when the system input has a non-stationary distribution. In this paper, we propose a novel algorithm for nonlinear digital SIC that adaptively constructs orthonormal polynomial basis functions according to the non-stationary moments of the transmit signal. By combining these basis functions with the least mean squares (LMS) algorithm, we introduce a new SIC technique, called as the adaptive orthonormal polynomial LMS (AOP-LMS) algorithm. To reduce computational complexity for practical systems, we augment our approach with a precomputed look-up table, which maps a given modulation and coding scheme to its corresponding basis functions. Numerical simulation indicates that our proposed method surpasses existing state-of-the-art SIC algorithms in terms of convergence speed and mean squared error when the transmit signal is non-stationary, such as with adaptive modulation and coding. Experimental evaluation with a wireless testbed confirms that our proposed approach outperforms existing digital SIC algorithms. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 13 pages, total 16 figures

arXiv:2403.10906 [pdf, other]

HourglassNeRF: Casting an Hourglass as a Bundle of Rays for Few-shot Neural Rendering

Authors: Seunghyeon Seo, Yeon** Chang, Jayeon Yoo, Seungwoo Lee, Hojun Lee, Nojun Kwak

Abstract: Recent advancements in the Neural Radiance Field (NeRF) have bolstered its capabilities for novel view synthesis, yet its reliance on dense multi-view training images poses a practical challenge. Addressing this, we propose HourglassNeRF, an effective regularization-based approach with a novel hourglass casting strategy. Our proposed hourglass is conceptualized as a bundle of additional rays withi… ▽ More Recent advancements in the Neural Radiance Field (NeRF) have bolstered its capabilities for novel view synthesis, yet its reliance on dense multi-view training images poses a practical challenge. Addressing this, we propose HourglassNeRF, an effective regularization-based approach with a novel hourglass casting strategy. Our proposed hourglass is conceptualized as a bundle of additional rays within the area between the original input ray and its corresponding reflection ray, by featurizing the conical frustum via Integrated Positional Encoding (IPE). This design expands the coverage of unseen views and enables an adaptive high-frequency regularization based on target pixel photo-consistency. Furthermore, we propose luminance consistency regularization based on the Lambertian assumption, which is known to be effective for training a set of augmented rays under the few-shot setting. Leveraging the inherent property of a Lambertian surface, which retains consistent luminance irrespective of the viewing angle, we assume our proposed hourglass as a collection of flipped diffuse reflection rays and enhance the luminance consistency between the original input ray and its corresponding hourglass, resulting in more physically grounded training framework and performance improvement. Our HourglassNeRF outperforms its baseline and achieves competitive results on multiple benchmarks with sharply rendered fine details. The code will be available. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 21 pages, 11 figures

arXiv:2403.10882 [pdf, other]

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

Authors: ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, Hye** Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim

Abstract: Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly… ▽ More Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models. △ Less

Submitted 21 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10391 [pdf, other]

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Authors: Hyuck Lee, Heeyoung Kim

Abstract: Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with… ▽ More Pseudo-label-based semi-supervised learning (SSL) algorithms trained on a class-imbalanced set face two cascading challenges: 1) Classifiers tend to be biased towards majority classes, and 2) Biased pseudo-labels are used for training. It is difficult to appropriately re-balance the classifiers in SSL because the class distribution of an unlabeled set is often unknown and could be mismatched with that of a labeled set. We propose a novel class-imbalanced SSL algorithm called class-distribution-mismatch-aware debiasing (CDMAD). For each iteration of training, CDMAD first assesses the classifier's biased degree towards each class by calculating the logits on an image without any patterns (e.g., solid color image), which can be considered irrelevant to the training set. CDMAD then refines biased pseudo-labels of the base SSL algorithm by ensuring the classifier's neutrality. CDMAD uses these refined pseudo-labels during the training of the base SSL algorithm to improve the quality of the representations. In the test phase, CDMAD similarly refines biased class predictions on test samples. CDMAD can be seen as an extension of post-hoc logit adjustment to address a challenge of incorporating the unknown class distribution of the unlabeled set for re-balancing the biased classifier under class distribution mismatch. CDMAD ensures Fisher consistency for the balanced error. Extensive experiments verify the effectiveness of CDMAD. △ Less

Submitted 25 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.10134 [pdf, other]

Measurement of groomed event shape observables in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (123 additional authors not shown)

Abstract: The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurem… ▽ More The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurements in hadronic collisions; this paper presents the first application of grooming to DIS data. The analysis is carried out in the Breit frame, utilizing the novel Centauro jet clustering algorithm that is designed for DIS event topologies. Events are required to have squared momentum-transfer $Q^2 > 150$ GeV$^2$ and inelasticity $ 0.2 < y < 0.7$. We report measurements of the production cross section of groomed event 1-jettiness and groomed invariant mass for several choices of grooming parameter. Monte Carlo model calculations and analytic calculations based on Soft Collinear Effective Theory are compared to the measurements. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 32 pages, 17 tables, 7 figures, submitted to EPJ C

Report number: DESY-24-036

arXiv:2403.10119 [pdf, other]

URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields

Authors: Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Lee

Abstract: We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict se… ▽ More We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict sequential data input, limiting its widespread applicability. In constant, our method recovers the physical formation of RS images by estimating camera poses and velocities, thereby removing the input constraints on sequential data. Moreover, we adopt a coarse-to-fine training strategy, in which the RS epipolar constraints of the pairwise frames in the scene graph are used to detect the camera poses that fall into local minima. The poses detected as outliers are corrected by the interpolation method with neighboring poses. The experimental results validate the effectiveness of our method over state-of-the-art works and demonstrate that the reconstruction of 3D representations is not constrained by the requirement of video sequence input. △ Less

Submitted 24 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10109 [pdf, other]

Measurement of the 1-jettiness event shape observable in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τ_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τ_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corres… ▽ More The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τ_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τ_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corresponding to an integrated luminosity of $351.1\,\text{pb}^{-1}$. Triple differential cross sections are provided as a function of $τ_1^b$, event virtuality $Q^2$, and inelasticity $y$, in the kinematic region $Q^2>150\,\text{GeV}^{2}$. Single differential cross section are provided as a function of $τ_1^b$ in a limited kinematic range. Double differential cross sections are measured, in contrast, integrated over $τ_1^b$ and represent the inclusive neutral-current DIS cross section measured as a function of $Q^2$ and $y$. The data are compared to a variety of predictions and include classical and modern Monte Carlo event generators, predictions in fixed-order perturbative QCD where calculations up to $\mathcal{O}(α_s^3)$ are available for $τ_1^b$ or inclusive DIS, and resummed predictions at next-to-leading logarithmic accuracy matched to fixed order predictions at $\mathcal{O}(α_s^2)$. These comparisons reveal sensitivity of the 1-jettiness observable to QCD parton shower and resummation effects, as well as the modeling of hadronization and fragmentation. Within their range of validity, the fixed-order predictions provide a good description of the data. Monte Carlo event generators are predictive over the full measured range and hence their underlying models and parameters can be constrained by comparing to the presented data. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 45 pages, 38 tables, 13 figures

Report number: DESY-24-035

arXiv:2403.09635 [pdf, other]

Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models

Authors: Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia, Jungho Jung, Harshith Goka, Haejun Lee

Abstract: In spite of their huge success, transformer models remain difficult to scale in depth. In this work, we develop a unified signal propagation theory and provide formulae that govern the moments of the forward and backward signal through the transformer model. Our framework can be used to understand and mitigate vanishing/exploding gradients, rank collapse, and instability associated with high atten… ▽ More In spite of their huge success, transformer models remain difficult to scale in depth. In this work, we develop a unified signal propagation theory and provide formulae that govern the moments of the forward and backward signal through the transformer model. Our framework can be used to understand and mitigate vanishing/exploding gradients, rank collapse, and instability associated with high attention scores. We also propose DeepScaleLM, an initialization and scaling scheme that conserves unit output/gradient moments throughout the model, enabling the training of very deep models with 100s of layers. We find that transformer models could be much deeper - our deep models with fewer parameters outperform shallow models in Language Modeling, Speech Translation, and Image Classification, across Encoder-only, Decoder-only and Encoder-Decoder variants, for both Pre-LN and Post-LN transformers, for multiple datasets and model sizes. These improvements also translate into improved performance on downstream Question Answering tasks and improved robustness for image classification. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia equal contribution. Source code is available at https://github.com/akhilkedia/TranformersGetStable

ACM Class: I.2.7; I.2.10

arXiv:2403.09502 [pdf, other]

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung

Abstract: Recent advancements in self-supervised audio-visual representation learning have demonstrated its potential to capture rich and comprehensive representations. However, despite the advantages of data augmentation verified in many learning methods, audio-visual learning has struggled to fully harness these benefits, as augmentations can easily disrupt the correspondence between input pairs. To addre… ▽ More Recent advancements in self-supervised audio-visual representation learning have demonstrated its potential to capture rich and comprehensive representations. However, despite the advantages of data augmentation verified in many learning methods, audio-visual learning has struggled to fully harness these benefits, as augmentations can easily disrupt the correspondence between input pairs. To address this limitation, we introduce EquiAV, a novel framework that leverages equivariance for audio-visual contrastive learning. Our approach begins with extending equivariance to audio-visual learning, facilitated by a shared attention-based transformation predictor. It enables the aggregation of features from diverse augmentations into a representative embedding, providing robust supervision. Notably, this is achieved with minimal computational overhead. Extensive ablation studies and qualitative results verify the effectiveness of our method. EquiAV outperforms previous works across various audio-visual benchmarks. The code is available on https://github.com/JongSuk1/EquiAV. △ Less

Submitted 20 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 15 pages, 3 figures; Accepted to ICML 2024 (camera ready version)

arXiv:2403.09168 [pdf, other]

VIVID: Human-AI Collaborative Authoring of Vicarious Dialogues from Lecture Videos

Authors: Seulgi Choi, Hyewon Lee, Yoonjoo Lee, Juho Kim

Abstract: The lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and s… ▽ More The lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and seven instructors to present key guidelines and the potential use of large language models (LLM) to transform a monologue lecture script into pedagogically meaningful dialogue. Applying these design guidelines, we created VIVID which allows instructors to collaborate with LLMs to design, evaluate, and modify pedagogical dialogues. In a within-subjects study with instructors (N=12), we show that VIVID helped instructors select and revise dialogues efficiently, thereby supporting the authoring of quality dialogues. Our findings demonstrate the potential of LLMs to assist instructors with creating high-quality educational dialogues across various learning stages. △ Less

Submitted 10 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09024 [pdf, other]

Semiparametric Token-Sequence Co-Supervision

Authors: Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo

Abstract: In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequen… ▽ More In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequence embedding space is constructed by a separate language model tasked to condense an input text into a single representative embedding. Our experiments demonstrate that a model trained via both supervisions consistently surpasses models trained via each supervision independently. Analysis suggests that this co-supervision encourages a broader generalization capability across the model. Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Showing 201–250 of 6,930 results for author: Lee, H