Search | arXiv e-print repository

Understanding the Gains from Repeated Self-Distillation

Authors: Divyansh Pareek, Simon S. Du, Sewoong Oh

Abstract: Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically observed to improve performance, especially when applied repeatedly. For such a process, there is a fundamental question of interest: How much gain is possible by… ▽ More Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model. Despite using the same architecture and the same training data, self-distillation has been empirically observed to improve performance, especially when applied repeatedly. For such a process, there is a fundamental question of interest: How much gain is possible by applying multiple steps of self-distillation? To investigate this relative gain, we propose studying the simple but canonical task of linear regression. Our analysis shows that the excess risk achieved by multi-step self-distillation can significantly improve upon a single step of self-distillation, reducing the excess risk by a factor as large as $d$, where $d$ is the input dimension. Empirical results on regression tasks from the UCI repository show a reduction in the learnt model's risk (MSE) by up to 47%. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 31 pages, 10 figures

arXiv:2407.04314 [pdf, ps, other]

Beale--Kato--Majda-type continuation criteria for Hall- and electron-magnetohydrodynamics

Authors: Mimi Dai, Sung-** Oh

Abstract: We show that regular solutions to electron-MHD with resistivity can be continued as long as the time integral of the supremum of the current gradient remains finite. This dimensionless continuation criterion is analogous to the celebrated result of Beale--Kato--Majda for the incompressible Euler and Navier--Stokes equations. A similar continuation criterion, formulated in terms of the time integra… ▽ More We show that regular solutions to electron-MHD with resistivity can be continued as long as the time integral of the supremum of the current gradient remains finite. This dimensionless continuation criterion is analogous to the celebrated result of Beale--Kato--Majda for the incompressible Euler and Navier--Stokes equations. A similar continuation criterion, formulated in terms of the time integral of the supremum of the vorticity, velocity gradient and current gradient, is established for the Hall-MHD with resistivity as well. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.02447 [pdf, other]

PLeaS -- Merging Models with Permutations and Least Squares

Authors: Anshul Nasery, Jonathan Hayase, Pang Wei Koh, Sewoong Oh

Abstract: The democratization of machine learning systems has made the process of fine-tuning accessible to a large number of practitioners, leading to a wide range of open-source models fine-tuned on specialized tasks and datasets. Recent work has proposed to merge such models to combine their functionalities. However, prior approaches are restricted to models that are fine-tuned from the same base model.… ▽ More The democratization of machine learning systems has made the process of fine-tuning accessible to a large number of practitioners, leading to a wide range of open-source models fine-tuned on specialized tasks and datasets. Recent work has proposed to merge such models to combine their functionalities. However, prior approaches are restricted to models that are fine-tuned from the same base model. Furthermore, the final merged model is typically restricted to be of the same size as the original models. In this work, we propose a new two-step algorithm to merge models-termed PLeaS-which relaxes these constraints. First, leveraging the Permutation symmetries inherent in the two models, PLeaS partially matches nodes in each layer by maximizing alignment. Next, PLeaS computes the weights of the merged model as a layer-wise Least Squares solution to minimize the approximation error between the features of the merged model and the permuted features of the original models. into a single model of a desired size, even when the two original models are fine-tuned from different base models. We also present a variant of our method which can merge models without using data from the fine-tuning domains. We demonstrate our method to merge ResNet models trained with shared and different label spaces, and show that we can perform better than the state-of-the-art merging methods by 8 to 15 percentage points for the same target compute while merging models trained on DomainNet and on fine-grained classification tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02245 [pdf, other]

Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards

Authors: Hyeok** Kwon, Gunmin Lee, Junseo Lee, Songhwai Oh

Abstract: In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR)… ▽ More In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR) framework, a novel method that utilizes two types of expert demonstrations$\unicode{x2013}$reward expert demonstrations focusing on performance optimization and safe expert demonstrations prioritizing safety. By exploiting a constraint reward (CoR), our framework guides the agent to balance performance goals of reward sum with safety constraints. We test the proposed framework in diverse environments, including the safety gym, metadrive, and the real$\unicode{x2013}$world Jackal platform. Our proposed framework enhances the performance of algorithms by $39\%$ and reduces constraint violations by $88\%$ on the real-world Jackal platform, demonstrating the framework's efficacy. Through this innovative approach, we expect significant advancements in real-world performance, leading to transformative effects in the realm of safe and reliable autonomous agents. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to the Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

arXiv:2407.01540 [pdf, other]

Towards a Partial Computation offloading in In-networking Computing-Assisted MEC: A Digital Twin Approach

Authors: Ibrahim Aliyu, Awwal Arigi, Seungmin Oh, Tai-Won Um, **sul Kim

Abstract: This paper addresses the problem of minimizing latency with partial computation offloading within Industrial Internet-of-Things (IoT) systems in in-network computing (COIN)-assisted Multiaccess Edge Computing (C-MEC) via ultra-reliable and low latency communications (URLLC) links. We propose a digital twin (DT) scheme for a multiuser scenario, allowing collaborative partial task offloading from us… ▽ More This paper addresses the problem of minimizing latency with partial computation offloading within Industrial Internet-of-Things (IoT) systems in in-network computing (COIN)-assisted Multiaccess Edge Computing (C-MEC) via ultra-reliable and low latency communications (URLLC) links. We propose a digital twin (DT) scheme for a multiuser scenario, allowing collaborative partial task offloading from user equipment (UE) to COIN-aided nodes or MEC. Specifically, we formulate the problem as joint task offloading decision, ratio and resource allocation. We employ game theory to create a low-complexity distributed offloading scheme in which the task offloading decision problem is modelled as an exact potential game. Double Deep Q-Network (DDQN) is utilized within the game to proactively predict optimal offloading ratio and resource allocation. This approach optimizes resource allocation across the whole system and enhances the robustness of the computing framework, ensuring efficient execution of computation-intensive services. Additionally, it addresses centralized approaches and UE resource contention issues, thus ensuring faster and more reliable communication. △ Less

Submitted 8 April, 2024; originally announced July 2024.

Comments: 9 pages, 3 figures

arXiv:2406.15710 [pdf, other]

doi 10.1038/s41566-022-01039-2

A photonic quantum engine driven by superradiance

Authors: **uk Kim, Seung-hoon Oh, Daeho Yang, Junki Kim, Moonjoo Lee, Kyungwon An

Abstract: Performance of nano- and micro-scale heat engines can be improved with a help from quantum mechanical phenomena. Recently, heat reservoirs with quantum coherence have been proposed to enhance engine performance beyond the Carnot limit even with a single reservoir. However, no physical realizations have been achieved so far. Here, we report the first proof-of-principle experimental demonstration of… ▽ More Performance of nano- and micro-scale heat engines can be improved with a help from quantum mechanical phenomena. Recently, heat reservoirs with quantum coherence have been proposed to enhance engine performance beyond the Carnot limit even with a single reservoir. However, no physical realizations have been achieved so far. Here, we report the first proof-of-principle experimental demonstration of a photonic quantum engine driven by superradiance employing a single heat reservoir composed of atoms and photonic vacuum. Reservoir atoms prepared in a quantum coherent superposition state underwent superradiance while traversing the cavity. This led to about 40-fold increase of the effective engine temperature, resulting in a near-unity engine efficiency. Moreover, the observed engine output power grew quadratically with respect to the atomic injection rate. Our work can be utilized in quantum mechanical heat transfer as well as in boosting engine powers, opening a pathway to development of photomechanical devices that run on quantum coherence embedded in heat baths. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures, 1 extended data figure

Journal ref: Nat. Photon. 16, 707 (2022)

arXiv:2406.13924 [pdf, other]

Impact of Internal Dust Correction on the Stellar Populations of Galaxies Estimated Using the Full Spectrum Fitting

Authors: Joon Hyeop Lee, Hyun** Jeong, Jiwon Chung, Mina Pak, Sree Oh

Abstract: Full spectrum fitting is a powerful tool for estimating the stellar populations of galaxies, but the fitting results are often significantly influenced by internal dust attenuation. For understanding how the choice of the internal dust correction method affects the detailed stellar populations estimated from the full spectrum fitting, we analyze the Sydney-Australian Astronomical Observatory Multi… ▽ More Full spectrum fitting is a powerful tool for estimating the stellar populations of galaxies, but the fitting results are often significantly influenced by internal dust attenuation. For understanding how the choice of the internal dust correction method affects the detailed stellar populations estimated from the full spectrum fitting, we analyze the Sydney-Australian Astronomical Observatory Multi-object Integral field spectrograph (SAMI) galaxy survey data using the Penalized PiXel-Fitting (PPXF) package. Three choices are compared: (Choice-1) using the PPXF reddening option, (Choice-2) using the multiplicative Legendre polynomial, and (Choice-3) using none of them (no dust correction). In any case, the total mean stellar populations show reasonable mass-age and mass-metallicity relations (MTR and MZR), although the correlations appear to be strongest for Choice-1 (MTR) and Choice-2 (MZR). When we compare the age-divided mean stellar populations, the MZR of young (< 10^9.5 yr ~ 3.2 Gyr) stellar components in Choice-2 is consistent with the gas-phase MZR, whereas those in the other two choices hardly are. On the other hand, the MTR of old (>= 10^9.5 yr) stellar components in Choice-1 seems to be more reasonable than that in Choice-2, because the old stellar components in low-mass galaxies tend to be relatively younger than those in massive galaxies. Based on the results, we provide empirical guidelines for choosing the optimal options for dust correction. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 10 pages, 8 figures, accepted for publication in Journal of the Korean Astronomical Society

arXiv:2406.13160 [pdf, ps, other]

Global bases for Bosonic extensions of quantum unipotent coordinate rings

Authors: Masaki Kashiwara, Myungho Kim, Se-** Oh, Euiyong Park

Abstract: In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules… ▽ More In the paper, we establish the global basis theory for the bosonic extension $\widehat{\mathcal{A}}$ associated with an arbitrary generalized Cartan matrix. When $\widehat{\mathcal{A}}$ is of simply-laced finite type, it is isomorphic to the quantum Grothendieck ring of the Hernandez-Leclerc category over a quantum affine algebra. In this case, we show that the $(t,q)$-characters of simple modules in the Hernandez-Leclerc category correspond to the normalized global basis of $\widehat{\mathcal{A}}$. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 37pages

MSC Class: 05E10; 05E18; 17B37}

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2406.08527 [pdf, other]

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning

Authors: Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, **woo Shin

Abstract: Learning effective representations from raw data is crucial for the success of deep learning methods. However, in the tabular domain, practitioners often prefer augmenting raw column features over using learned representations, as conventional tree-based algorithms frequently outperform competing approaches. As a result, feature engineering methods that automatically generate candidate features ha… ▽ More Learning effective representations from raw data is crucial for the success of deep learning methods. However, in the tabular domain, practitioners often prefer augmenting raw column features over using learned representations, as conventional tree-based algorithms frequently outperform competing approaches. As a result, feature engineering methods that automatically generate candidate features have been widely used. While these approaches are often effective, there remains ambiguity in defining the space over which to search for candidate features. Moreover, they often rely solely on validation scores to select good features, neglecting valuable feedback from past experiments that could inform the planning of future experiments. To address the shortcomings, we propose a new tabular learning framework based on large language models (LLMs), coined Optimizing Column feature generator with decision Tree reasoning (OCTree). Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space and provide language-based reasoning information highlighting past experiments as feedback for iterative rule improvements. Here, we choose a decision tree as reasoning as it can be interpreted in natural language, effectively conveying knowledge of past experiments (i.e., the prediction models trained with the generated features) to the LLM. Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models across diverse tabular benchmarks, outperforming competing automatic feature engineering methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages

arXiv:2406.07899 [pdf, other]

Josephson Parametric Amplifier based Quantum Noise Limited Amplifier Development for Axion Search Experiments in CAPP

Authors: Sergey V. Uchaikin, **myeong Kim, Caglar Kutlu, Boris I. Ivanov, **su Kim, Arjan F. van Loo, Yasunobu Nakamura, Saebyeok Ahn, Seonjeong Oh, Minsu Ko, Yannis K. Semertzidis

Abstract: This paper provides a comprehensive overview of the development of flux-driven Josephson Parametric Amplifiers (JPAs) as Quantum Noise Limited Amplifier for axion search experiments conducted at the Center for Axion and Precision Physics Research (CAPP) of the Institute for Basic Science. It focuses on the characterization, and optimization of JPAs, which are crucial for achieving the highest sens… ▽ More This paper provides a comprehensive overview of the development of flux-driven Josephson Parametric Amplifiers (JPAs) as Quantum Noise Limited Amplifier for axion search experiments conducted at the Center for Axion and Precision Physics Research (CAPP) of the Institute for Basic Science. It focuses on the characterization, and optimization of JPAs, which are crucial for achieving the highest sensitivity in axion particle detection. We discuss various characterization techniques, methods for improving bandwidth, and the attainment of ultra-low noise temperatures. JPAs have emerged as indispensable tools in CAPPs axion search endeavors, playing a significant role in advancing our understanding of fundamental physics and unraveling the mysteries of the universe. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 29 pages, 15 figures

arXiv:2406.07514 [pdf, other]

Scintillation Light in SBND: Simulation, Reconstruction, and Expected Performance of the Photon Detection System

Authors: SBND Collaboration, P. Abratenko, R. Acciarri, C. Adams, L. Aliaga-Soplin, O. Alterkait, R. Alvarez-Garrote, C. Andreopoulos, A. Antonakis, L. Arellano, J. Asaadi, W. Badgett, S. Balasubramanian, V. Basque, A. Beever, B. Behera, E. Belchior, M. Betancourt, A. Bhat, M. Bishai, A. Blake, B. Bogart, J. Bogenschuetz, D. Brailsford, A. Brandt , et al. (158 additional authors not shown)

Abstract: SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its… ▽ More SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its design is a dual readout concept combining a system of 120 photomultiplier tubes, used for triggering, with a system of 192 X-ARAPUCA devices, located behind the anode wire planes. Furthermore, covering the cathode plane with highly-reflective panels coated with a wavelength-shifting compound recovers part of the light emitted towards the cathode, where no optical detectors exist. We show how this new design provides a high light yield and a more uniform detection efficiency, an excellent timing resolution and an independent 3D-position reconstruction using only the scintillation light. Finally, the whole reconstruction chain is applied to recover the temporal structure of the beam spill, which is resolved with a resolution on the order of nanoseconds. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 21 pages, 17 figures

Report number: FERMILAB-PUB-24-0303-PPD

arXiv:2406.06112 [pdf]

doi 10.1021/acsami.4c05656

Resilient Growth of Highly Crystalline Topological Insulator-Superconductor Heterostructure Enabled by Ex-situ Nitride Film

Authors: Renjie Xie, Min Ge, Shaozhu Xiao, Jiahui Zhang, Jiachang Bi, Xiaoyu Yuan, Hee Taek Yi, Baomin Wang, Seongshik Oh, Yanwei Cao, Xiong Yao

Abstract: Highly crystalline and easily feasible topological insulator-superconductor (TI-SC) heterostructures are crucial for the development of practical topological qubit devices. The optimal superconducting layer for TI-SC heterostructures should be highly resilient against external contaminations and structurally compatible with TIs. In this study, we provide a solution to this challenge by showcasing… ▽ More Highly crystalline and easily feasible topological insulator-superconductor (TI-SC) heterostructures are crucial for the development of practical topological qubit devices. The optimal superconducting layer for TI-SC heterostructures should be highly resilient against external contaminations and structurally compatible with TIs. In this study, we provide a solution to this challenge by showcasing the growth of a highly crystalline TI-SC heterostructure using refractory TiN (111) as the superconducting layer. This approach can eliminate the need for in-situ cleaving or growth. More importantly, the TiN surface shows high resilience against contaminations during air exposure, as demonstrated by the successful recyclable growth of Bi2Se3. Our findings indicate that TI-SC heterostructures based on nitride films are compatible with device fabrication techniques, paving a path to the realization of practical topological qubit devices in the future. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 22 pages, 4 figures, accepted by ACS Applied Materials & Interfaces

arXiv:2406.06009 [pdf]

The Impact of AI on Academic Research and Publishing

Authors: Brady Lund, Manika Lamba, Sang Hoo Oh

Abstract: Generative artificial intelligence (AI) technologies like ChatGPT, have significantly impacted academic writing and publishing through their ability to generate content at levels comparable to or surpassing human writers. Through a review of recent interdisciplinary literature, this paper examines ethical considerations surrounding the integration of AI into academia, focusing on the potential for… ▽ More Generative artificial intelligence (AI) technologies like ChatGPT, have significantly impacted academic writing and publishing through their ability to generate content at levels comparable to or surpassing human writers. Through a review of recent interdisciplinary literature, this paper examines ethical considerations surrounding the integration of AI into academia, focusing on the potential for this technology to be used for scholarly misconduct and necessary oversight when using it for writing, editing, and reviewing of scholarly papers. The findings highlight the need for collaborative approaches to AI usage among publishers, editors, reviewers, and authors to ensure that this technology is used ethically and productively. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05293 [pdf, other]

Ubiquitous Flat Bands in a Cr-based Kagome Superconductor

Authors: Yucheng Guo, Zehao Wang, Fang Xie, Yuefei Huang, Bin Gao, Ji Seop Oh, Han Wu, Zhaoyu Liu, Zheng Ren, Yuan Fang, Ananya Biswas, Yichen Zhang, Ziqin Yue, Cheng Hu, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Makoto Hashimoto, Donghui Lu, Junichiro Kono, Jiun-Haw Chu, Boris I Yakobson, Robert J Birgeneau, Qimiao Si, Pengcheng Dai , et al. (1 additional authors not shown)

Abstract: In the quest for novel quantum states driven by topology and correlation, kagome lattice materials have garnered significant interest due to their distinctive electronic band structures, featuring flat bands (FBs) arising from the quantum destructive interference of the electronic wave function. The tuning of the FBs to the chemical potential would lead to the possibility of liberating electronic… ▽ More In the quest for novel quantum states driven by topology and correlation, kagome lattice materials have garnered significant interest due to their distinctive electronic band structures, featuring flat bands (FBs) arising from the quantum destructive interference of the electronic wave function. The tuning of the FBs to the chemical potential would lead to the possibility of liberating electronic instabilities that lead to emergent electronic orders. Despite extensive studies, direct evidence of FBs tuned to the chemical potential and their participation in emergent electronic orders have been lacking in bulk quantum materials. Here using a combination of Angle-Resolved Photoemission Spectroscopy (ARPES) and Density Functional Theory (DFT), we reveal that the low-energy electronic structure of the recently discovered Cr-based kagome metal superconductor CsCr3Sb5 is dominated by a pervasive FB in close proximity to, and below the Fermi level. A comparative analysis with orbital-projected DFT and polarization dependence measurement uncovers that an orbital-selective renormalization mechanism is needed to reconcile the discrepancy with the DFT calculations, which predict the FB to appear 200 meV above the Fermi level. Furthermore, we observe the FB to shift away from the Fermi level by 20 meV in the low-temperature density wave-ordered phase, highlighting the role of the FB in the emergent electronic order. Our results reveal CsCr3Sb5 to stand out as a promising platform for further exploration into the effects of FBs near the Fermi level on kagome lattices, and their role in emergent orders in bulk quantum materials. △ Less

Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03665 [pdf, other]

Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning

Authors: Jihyeon Seong, Sekwang Oh, Jaesik Choi

Abstract: Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,' resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extrem… ▽ More Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,' resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extreme values, including both abrupt changes and noise. In this paper, we propose Trend Point Detection formulated as a Markov Decision Process (MDP), a novel approach to identifying essential points that should be reflected in the trend, departing from approximations. We term these essential points as Dynamic Trend Points (DTPs) and extract trends by interpolating them. To identify DTPs, we utilize Reinforcement Learning (RL) within a discrete action space and a forecasting sum-of-squares loss function as a reward, referred to as the Dynamic Trend Filtering network (DTF-net). DTF-net integrates flexible noise filtering, preserving critical original subsequences while removing noise as required for other subsequences. We demonstrate that DTF-net excels at capturing abrupt changes compared to other trend filtering algorithms and enhances forecasting performance, as abrupt changes are predicted rather than smoothed out. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures

Journal ref: IJCAI 2024

arXiv:2405.20627 [pdf, other]

The SAMI Galaxy Survey: impact of star formation and AGN feedback processes on the ionized gas velocity dispersion

Authors: Sree Oh, Matthew Colless, Stefania Barsanti, Henry R. M. Zovaro, Scott M. Croom, Sukyoung K. Yi, Andrei Ristea, Jesse van de Sande, Francesco D'Eugenio, Joss Bland-Hawthorn, Julia J. Bryant, Sarah Casura, Hyun** Jeong, Sarah M. Sweet, Tayyaba Zafar

Abstract: We investigate the influence of star formation and instantaneous AGN feedback processes on the ionized gas velocity dispersion in a sample of 1285 emission-line galaxies with stellar masses $\log\,(M_*/M_{\odot}) \geq 9$ from the integral-field spectroscopy SAMI Galaxy Survey. We fit both narrow and broad emission line components using aperture spectra integrated within one effective radius, while… ▽ More We investigate the influence of star formation and instantaneous AGN feedback processes on the ionized gas velocity dispersion in a sample of 1285 emission-line galaxies with stellar masses $\log\,(M_*/M_{\odot}) \geq 9$ from the integral-field spectroscopy SAMI Galaxy Survey. We fit both narrow and broad emission line components using aperture spectra integrated within one effective radius, while ensuring the elimination of velocity differences between the spectra of individual spaxels. Our analysis reveals that 386 (30%) galaxies can be adequately described using a single emission component while 356 (28%) galaxies require two (broad and narrow) components. Galaxies characterized by high mass, elevated star formation rate surface density, or type-2 AGN-like emissions tend to feature an additional broad emission-line component, leading to their classification as double-component galaxies. We explore the correlations between $M_*$ and gas velocity dispersions, highlighting that the prominence of the broad component significantly contributes to elevating the gas velocity dispersion. Galaxies displaying AGN-like emission based on optical definitions show enhanced gas velocity dispersions. In star-forming galaxies, both stellar mass and star-formation rate surface density substantially contribute to the velocity dispersion of the narrow component. Increased star-forming activity appears to elevate the velocity dispersion of the narrow component. The broad component exhibits a weaker dependence on stellar mass and is primarily driven by galactic outflows. We suggest that strong star forming activity leads to the formation of a broad emission-line component, but the impact on inflating gas velocity dispersion is moderate. On the other hand, AGN-driven outflows appear to be a more important contributor to the elevated velocity dispersion of the ionized gas. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 17 pages, 11 figures

arXiv:2405.20537 [pdf, other]

doi 10.1088/1361-6382/aa8ab2

Poisson algebra of quasilocal angular momentum and its asymptotic limit

Authors: Jong Hyuk Yoon, Seung Hun Oh

Abstract: We study the previously proposed quasilocal angular momentum of gravitational fields in the absence of isometries. The quasilocal angular momentum $L(ξ)$ has the following attractive properties; ({\it i}) it follows from the Einstein's constraint equations, ({\it ii}) it satisfies the Poisson algebra $\{L(ξ), L(η) \}_{\rm P.B.} =({1/16π)}\, L( [ξ, η]_{\rm L} )$, ({\it iii}) its Poisson algebra red… ▽ More We study the previously proposed quasilocal angular momentum of gravitational fields in the absence of isometries. The quasilocal angular momentum $L(ξ)$ has the following attractive properties; ({\it i}) it follows from the Einstein's constraint equations, ({\it ii}) it satisfies the Poisson algebra $\{L(ξ), L(η) \}_{\rm P.B.} =({1/16π)}\, L( [ξ, η]_{\rm L} )$, ({\it iii}) its Poisson algebra reduces to the standard $SO(3)$ algebra of angular momentum at null infinity, and ({\it iv}) it reproduces the standard value for the Kerr spacetime at null infinity. It will be argued that our definition is a quasilocal and canonical generalization of A. Rizzi's geometric definition at null infinity. We also propose a new definition of an {\it invariant} quasilocal angular momentum $L^{2}$ such that $\{ L^2, L(ξ) \}_{\rm P.B.} = 0$, which becomes $(ma)^{2}$ at the null infinity of the Kerr spacetime. Therefore, it may be regarded as a quasilocal generalization of the Casimir invariant of ordinary angular momentum in the flat spacetime. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Journal ref: Classical and Quantum Gravity 35, 015003 (2018)

arXiv:2405.18698 [pdf, other]

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

Authors: Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh

Abstract: The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained… ▽ More The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO), a bilevel optimization approach that utilizes the duality of spectral risk measures. In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables. The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting. Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 26 pages

arXiv:2405.17646 [pdf, ps, other]

Gap between the number of facets of the two poset polytopes

Authors: Binaya Bhandari, Debra Cunningham, Grace Morrell, SuHo Oh, Paxton Smith

Abstract: We study the difference between the number of facets of the order polytope and the chain polytope of a poset. Hibi and Li classified posets where the gap is exactly zero. We describe the bounds on this gap using the new notion of crossing numbers, and then use this result to classify the posets where the gap is exactly one. We study the difference between the number of facets of the order polytope and the chain polytope of a poset. Hibi and Li classified posets where the gap is exactly zero. We describe the bounds on this gap using the new notion of crossing numbers, and then use this result to classify the posets where the gap is exactly one. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures

arXiv:2405.16915 [pdf, other]

Multilingual Diversity Improves Vision-Language Representations

Authors: Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

Abstract: Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text… ▽ More Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text pairs and discard many potentially useful non-English samples. Our work questions this practice. Multilingual data is inherently enriching not only because it provides a gateway to learn about culturally salient concepts, but also because it depicts common concepts differently from monolingual data. We thus conduct a systematic study to explore the performance benefits of using more samples of non-English origins with respect to English vision tasks. By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set. Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet, ImageNet distribution shifts, image-English-text retrieval and on average across 38 tasks from the DataComp benchmark. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa. In addition, we quantitatively show that English and non-English data are significantly different in both image and (translated) text space. We hope that our findings motivate future work to be more intentional about including multicultural and multilingual data, not just when non-English or geographically diverse tasks are involved, but to enhance model capabilities at large. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15640 [pdf, other]

GECKO: Generative Language Model for English, Code and Korean

Authors: Sungwoo Oh, Donggyu Kim

Abstract: We introduce GECKO, a bilingual large language model (LLM) optimized for Korean and English, along with programming languages. GECKO is pretrained on the balanced, high-quality corpus of Korean and English employing LLaMA architecture. In this report, we share the experiences of several efforts to build a better data pipeline for the corpus and to train our model. GECKO shows great efficiency in t… ▽ More We introduce GECKO, a bilingual large language model (LLM) optimized for Korean and English, along with programming languages. GECKO is pretrained on the balanced, high-quality corpus of Korean and English employing LLaMA architecture. In this report, we share the experiences of several efforts to build a better data pipeline for the corpus and to train our model. GECKO shows great efficiency in token generations for both Korean and English, despite its small size of vocabulary. We measure the performance on the representative benchmarks in terms of Korean, English and Code, and it exhibits great performance on KMMLU (Korean MMLU) and modest performance in English and Code, even with its smaller number of trained tokens compared to English-focused LLMs. GECKO is available to the open-source community under a permissive license. We hope our work offers a research baseline and practical insights for Korean LLM research. The model can be found at: https://huggingface.co/kifai/GECKO-7B △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.13065 [pdf, other]

Exploring Teachers' Perception of Artificial Intelligence: The Socio-emotional Deficiency as Opportunities and Challenges in Human-AI Complementarity in K-12 Education

Authors: Soon-young Oh, Yongsu Ahn

Abstract: In schools, teachers play a multitude of roles, serving as educators, counselors, decision-makers, and members of the school community. With recent advances in artificial intelligence (AI), there is increasing discussion about how AI can assist, complement, and collaborate with teachers. To pave the way for better teacher-AI complementary relationships in schools, our study aims to expand the disc… ▽ More In schools, teachers play a multitude of roles, serving as educators, counselors, decision-makers, and members of the school community. With recent advances in artificial intelligence (AI), there is increasing discussion about how AI can assist, complement, and collaborate with teachers. To pave the way for better teacher-AI complementary relationships in schools, our study aims to expand the discourse on teacher-AI complementarity by seeking educators' perspectives on the potential strengths and limitations of AI across a spectrum of responsibilities. Through a mixed method using a survey with 100 elementary school teachers in South Korea and in-depth interviews with 12 teachers, our findings indicate that teachers anticipate AI's potential to complement human teachers by automating administrative tasks and enhancing personalized learning through advanced intelligence. Interestingly, the deficit of AI's socio-emotional capabilities has been perceived as both challenges and opportunities. Overall, our study demonstrates the nuanced perception of teachers and different levels of expectations over their roles, challenging the need for decisions about AI adoption tailored to educators' preferences and concerns. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.08530 [pdf, other]

Parameter-Efficient Instance-Adaptive Neural Video Compression

Authors: Hyunmo Yang, Seungjun Oh, Eunbyung Park

Abstract: Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instan… ▽ More Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instance-adaptive video compression techniques have recently been suggested as a viable solution, fine-tuning the encoder or decoder networks for a particular test instance video. However, fine-tuning all the model parameters incurs high computational costs, increases the bitrates, and often leads to unstable training. In this work, we propose a parameter-efficient instance-adaptive video compression framework. Inspired by the remarkable success of parameter-efficient fine-tuning on large-scale neural network models, we propose to use a lightweight adapter module that can be easily attached to the pretrained NVCs and fine-tuned for test video sequences. The resulting algorithm significantly improves compression performance and reduces the encoding time compared to the existing instant-adaptive video compression algorithms. Furthermore, the suggested fine-tuning method enhances the robustness of the training process, allowing for the proposed method to be widely used in many practical settings. We conducted extensive experiments on various standard benchmark datasets, including UVG, MCL-JVC, and HEVC sequences, and the experimental results have shown a significant improvement in rate-distortion (RD) curves (up to 5 dB PSNR improvements) and BD rates compared to the baselines NVC. Our code is available on https://github.com/ohsngjun/PEVC}{https://github.com/ohsngjun/PEVC. △ Less

Submitted 11 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: 23 pages, 13 figures

arXiv:2405.06752 [pdf, other]

Polarization Entanglement with highly non-degenerate photon pairs enhanced by effective walk-off compensation method

Authors: Sungeun Oh, Thomas Jennewein

Abstract: We demonstrate polarization entanglement in highly non-degenerate photon pairs, generated through Type-0 spontaneous parametric down conversion (SPDC) using bulk periodically poled Lithium Niobate (PPLN) crystals. Through the utilization of both a beam displacer interferometer scheme and a Sagnac interferometer, we ensure high polarisation contrast and stable interference of the highly non-degener… ▽ More We demonstrate polarization entanglement in highly non-degenerate photon pairs, generated through Type-0 spontaneous parametric down conversion (SPDC) using bulk periodically poled Lithium Niobate (PPLN) crystals. Through the utilization of both a beam displacer interferometer scheme and a Sagnac interferometer, we ensure high polarisation contrast and stable interference of the highly non-degenerate photon pairs, which however causes substantial spatial and temporal walk-offs of the photon paths which poses a formidable challenge. We introduce an effective compensation method using birefringent crystal wedges to eliminate spatial and temporal walkoffs simultaneously. This method is implemented in our entangled photon source (EPS) designed specifically for testing entanglement-based quantum key distribution (EBQKD) between ground and satellite, as part of the Quantum Encryption and Science Satellite (QEYSSat) mission funded by the Canadian Space Agency (CSA). We observed a coincidence rate of N = (33.33+-0.05)kHz, a significant improvement compared to the absence of the spatial compensation. We also observed an estimated pair generation rate of (2.92+-0.12)MHz and an entanglement visibility of (96.6+-0.3)% from only 1.0mW of pump power, making it a promising source for long-distance quantum communication for ground-to-satellite and fiber optic links. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05175 [pdf, other]

Air Gap: Protecting Privacy-Conscious Conversational Agents

Authors: Eugene Bagdasaryan, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02905 [pdf, other]

Mixture of partially linear experts

Authors: Yeongsan Hwang, Byungtae Seo, Sangkon Oh

Abstract: In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to cap… ▽ More In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to capture nonlinear relationships. We establish the identifiability of the proposed model under mild conditions and introduce a practical estimation algorithm. We present the performance of our approach through numerical studies, including simulations and real data analysis. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02341 [pdf, other]

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

Authors: Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu

Abstract: We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square… ▽ More We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.01846 [pdf]

Imaging thermally fluctuating Nèel vectors in van der Waals antiferromagnet NiPS3

Authors: You** Lee, Chaebin Kim, Suhan Son, **gyuan Cui, Giung Park, Kai-Xuan Zhang, Siwon Oh, Hyeonsik Cheong, Armin Kleibert, Je-Geun Park

Abstract: Studying antiferromagnetic domains is essential for fundamental physics and potential spintronics applications. Despite its importance, few systematic studies have been performed on van der Waals (vdW) antiferromagnets (AFMs) domains with high spatial resolutions, and direct probing of the Nèel vectors remains challenging. In this work, we found a multidomain in vdW AFM NiPS3, a material extensive… ▽ More Studying antiferromagnetic domains is essential for fundamental physics and potential spintronics applications. Despite its importance, few systematic studies have been performed on van der Waals (vdW) antiferromagnets (AFMs) domains with high spatial resolutions, and direct probing of the Nèel vectors remains challenging. In this work, we found a multidomain in vdW AFM NiPS3, a material extensively investigated for its exotic magnetic exciton. We employed photoemission electron microscopy combined with the X-ray magnetic linear dichroism (XMLD-PEEM) to image the NiPS3's magnetic structure. The nanometer-spatial resolution of XMLD-PEEM allows us to determine local Nèel vector orientations and discover thermally fluctuating Néel vectors that are independent of the crystal symmetry even at 65 K, well below TN of 155 K. We demonstrate a Ni ions' small in-plane orbital moment anisotropy is responsible for the weak magneto-crystalline anisotropy. The observed multidomain's thermal fluctuations may explain the broadening of magnetic exciton peaks at higher temperatures. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01620 [pdf, other]

JWST Lensed quasar dark matter survey II: Strongest gravitational lensing limit on the dark matter free streaming length to date

Authors: Ryan E. Keeley, Anna M. Nierenberg, Daniel Gilman, Charles Gannon, Simon Birrer, Tommaso Treu, Andrew J. Benson, Xiaolong Du, K. N. Abazajian, T. Anguita, V. N. Bennert, S. G. Djorgovski, K. K. Gupta, S. F. Hoenig, A. Kusenko, C. Lemon, M. Malkan, V. Motta, L. A. Moustakas, M. S. H. Oh, D. Sluse, D. Stern, R. H. Wechsler

Abstract: This is the second in a series of papers in which we use JWST MIRI multiband imaging to measure the warm dust emission in a sample of 31 multiply imaged quasars, to be used as a probe of the particle nature of dark matter. We present measurements of the relative magnifications of the strongly lensed warm dust emission in a sample of 9 systems. The warm dust region is compact and sensitive to pertu… ▽ More This is the second in a series of papers in which we use JWST MIRI multiband imaging to measure the warm dust emission in a sample of 31 multiply imaged quasars, to be used as a probe of the particle nature of dark matter. We present measurements of the relative magnifications of the strongly lensed warm dust emission in a sample of 9 systems. The warm dust region is compact and sensitive to perturbations by populations of halos down to masses $\sim 10^6$ M$_{\odot}$. Using these warm dust flux-ratio measurements in combination with 5 previous narrow-line flux-ratio measurements, we constrain the halo mass function. In our model, we allow for complex deflector macromodels with flexible third and fourth-order multipole deviations from ellipticity, and we introduce an improved model of the tidal evolution of subhalos. We constrain a WDM model and find an upper limit on the half-mode mass of $10^{7.6} M_\odot$ at posterior odds of 10:1. This corresponds to a lower limit on a thermally produced dark matter particle mass of 6.1 keV. This is the strongest gravitational lensing constraint to date, and comparable to those from independent probes such as the Ly$α$ forest and Milky Way satellite galaxies. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.17124 [pdf, other]

Improving flux ratio anomaly precision by measuring gravitational lens multipole moments with extended arcs

Authors: Maverick S. H. Oh, Anna Nierenberg, Daniel Gilman, Simon Birrer

Abstract: In a strong gravitational lens, perturbations by low-mass dark matter halos can be detected by differences between the measured image fluxes relative to the expectation from a smooth model for the mass distribution which contains only the gravitational effects of the main deflector. The abundance of these low-mass structures can be used to constrain the properties of dark matter. Traditionally onl… ▽ More In a strong gravitational lens, perturbations by low-mass dark matter halos can be detected by differences between the measured image fluxes relative to the expectation from a smooth model for the mass distribution which contains only the gravitational effects of the main deflector. The abundance of these low-mass structures can be used to constrain the properties of dark matter. Traditionally only the lensed quasar positions have been to predict the smooth-model flux ratios. We demonstrate that significant additional information can be gained by using the lensed quasar host galaxy which appears as an extended arc and constrains the smooth-model over a much larger angular area. We simulate Hubble Space Telescope-quality mock observations based on the lensing system WGD2038-4008 and we compare the model-predicted flux ratio precision and accuracy for two cases; one of which the inference is based only on the lensed quasar image positions, and the other based on the extended arcs as well as lensed quasar image positions. For our mock lens systems we include both elliptical, and higher order m=3 and m=4 multipole terms in the smooth-mass distributions with amplitudes based on the optically measured shapes of massive elliptical galaxies. We find that the extended arcs improve the precision of the model-predicted flux ratios by a factor of 6-8, depending on the strength of the multipole terms. Furthermore, with the extended arcs, we are also able to accurately recover the m=3, 4 mass multipole strengths and angles $a_3/a$, $a_4/a$, $φ_3-φ_0$, and $φ_4-φ_0$ to a precision of 0.002, 0.002, $3^\circ$ and $3^\circ$, respectively. This work implies that lensed arcs can constrain deviations from ellipticity in strong lens systems, and potentially lead to more robust constraints on substructure properties from flux ratios. △ Less

Submitted 16 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 23 pages, 8 figures

arXiv:2404.16035 [pdf, other]

MaGGIe: Masked Guided Gradual Human Instance Matting

Authors: Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee

Abstract: Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human i… ▽ More Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although kee** constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project link: https://maggie-matt.github.io

arXiv:2404.16032 [pdf, other]

Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts

Authors: Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh

Abstract: Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowl… ▽ More Retrieval-augmented generation (RAG) mitigates many problems of fully parametric language models, such as temporal degradation, hallucinations, and lack of grounding. In RAG, the model's knowledge can be updated from documents provided in context. This leads to cases of conflict between the model's parametric knowledge and the contextual information, where the model may not always update its knowledge. Previous work studied knowledge conflicts by creating synthetic documents that contradict the model's correct parametric answers. We present a framework for studying knowledge conflicts in a realistic setup. We update incorrect parametric knowledge using real conflicting documents. This reflects how knowledge conflicts arise in practice. In this realistic scenario, we find that knowledge updates fail less often than previously reported. In cases where the models still fail to update their answers, we find a parametric bias: the incorrect parametric answer appearing in context makes the knowledge update likelier to fail. These results suggest that the factual parametric knowledge of LLMs can negatively influence their reading abilities and behaviors. Our code is available at https://github.com/kortukov/realistic_knowledge_conflicts/. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15409 [pdf, ps, other]

Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Authors: Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith

Abstract: We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our ne… ▽ More We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 42 pages, 3 figures

arXiv:2404.15374 [pdf, other]

Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning

Authors: Myeung Suk Oh, Anindya Bijoy Das, Taejoon Kim, David J. Love, Christopher G. Brinton

Abstract: Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neura… ▽ More Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for the publication in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2402.09580

arXiv:2404.13790 [pdf, ps, other]

On illposedness of the Hall and electron magnetohydrodynamic equations without resistivity on the whole space

Authors: In-Jee Jeong, Sung-** Oh

Abstract: It has been shown in our previous work that the incompressible and irresistive Hall- and electron-magnetohydrodynamic (MHD) equations are illposed on flat domains $M = \mathbb{R}^k \times \mathbb{T}^{3-k}$ for $0 \le k \le 2$. The data and solutions therein were assumed to be independent of one coordinate, which not only significantly simplifies the systems but also allows for a large class of ste… ▽ More It has been shown in our previous work that the incompressible and irresistive Hall- and electron-magnetohydrodynamic (MHD) equations are illposed on flat domains $M = \mathbb{R}^k \times \mathbb{T}^{3-k}$ for $0 \le k \le 2$. The data and solutions therein were assumed to be independent of one coordinate, which not only significantly simplifies the systems but also allows for a large class of steady states. In this work, we remove the assumption of independence and conclude strong illposedness for compactly supported data in $\mathbb{R}^3$. This is achieved by constructing degenerating wave packets for linearized systems around time-dependent axisymmetric magnetic fields. A few main additional ingredients are: a more systematic application of the generalized energy estimate, use of the Bogovskiǐ operator, and a priori estimates for axisymmetric solutions to the Hall- and electron-MHD systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 36 pages

arXiv:2404.10486 [pdf, other]

doi 10.1051/0004-6361/202449763

Discovery of a dormant 33 solar-mass black hole in pre-release Gaia astrometry

Authors: Gaia Collaboration, P. Panuzzo, T. Mazeh, F. Arenou, B. Holl, E. Caffau, A. Jorissen, C. Babusiaux, P. Gavras, J. Sahlmann, U. Bastian, Ł. Wyrzykowski, L. Eyer, N. Leclerc, N. Bauchet, A. Bombrun, N. Mowlavi, G. M. Seabroke, D. Teyssier, E. Balbinot, A. Helmi, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne , et al. (390 additional authors not shown)

Abstract: Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is exp… ▽ More Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is expected to uncover many Galactic wide-binary systems containing dormant BHs, which may not have been detected before. The study of this population will provide new information on the BH-mass distribution in binaries and shed light on their formation mechanisms and progenitors. As part of the validation efforts in preparation for the fourth Gaia data release (DR4), we analysed the preliminary astrometric binary solutions, obtained by the Gaia Non-Single Star pipeline, to verify their significance and to minimise false-detection rates in high-mass-function orbital solutions. The astrometric binary solution of one source, Gaia BH3, implies the presence of a 32.70 \pm 0.82 M\odot BH in a binary system with a period of 11.6 yr. Gaia radial velocities independently validate the astrometric orbit. Broad-band photometric and spectroscopic data show that the visible component is an old, very metal-poor giant of the Galactic halo, at a distance of 590 pc. The BH in the Gaia BH3 system is more massive than any other Galactic stellar-origin BH known thus far. The low metallicity of the star companion supports the scenario that metal-poor massive stars are progenitors of the high-mass BHs detected by gravitational-wave telescopes. The Galactic orbit of the system and its metallicity indicate that it might belong to the Sequoia halo substructure. Alternatively, and more plausibly, it could belong to the ED-2 stream, which likely originated from a globular cluster that had been disrupted by the Milky Way. △ Less

Submitted 19 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 23 pages, accepted fro publication in A&A Letters. New version with small fixes

arXiv:2404.10308 [pdf, other]

Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

Authors: Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, **woo Shin

Abstract: Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address… ▽ More Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context. Code is available at https://github.com/alinlab/HOMER. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted to ICLR 2024. The first two authors contributed equally

arXiv:2404.05767 [pdf, other]

CSA-Trans: Code Structure Aware Transformer for AST

Authors: Saeyoon Oh, Shin Yoo

Abstract: When applying the Transformer architecture to source code, designing a good self-attention mechanism is critical as it affects how node relationship is extracted from the Abstract Syntax Trees (ASTs) of the source code. We present Code Structure Aware Transformer (CSA-Trans), which uses Code Structure Embedder (CSE) to generate specific PE for each node in AST. CSE generates node Positional Encodi… ▽ More When applying the Transformer architecture to source code, designing a good self-attention mechanism is critical as it affects how node relationship is extracted from the Abstract Syntax Trees (ASTs) of the source code. We present Code Structure Aware Transformer (CSA-Trans), which uses Code Structure Embedder (CSE) to generate specific PE for each node in AST. CSE generates node Positional Encoding (PE) using disentangled attention. To further extend the self-attention capability, we adopt Stochastic Block Model (SBM) attention. Our evaluation shows that our PE captures the relationships between AST nodes better than other graph-related PE techniques. We also show through quantitative and qualitative analysis that SBM attention is able to generate more node specific attention coefficients. We demonstrate that CSA-Trans outperforms 14 baselines in code summarization tasks for both Python and Java, while being 41.92% faster and 25.31% memory efficient in Java dataset compared to AST-Trans and SG-Trans respectively. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04913 [pdf, other]

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

Authors: Gyeong** Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park

Abstract: Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encod… ▽ More Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encoding and decoding time, compact model sizes, and high-quality renderings. Despite significant advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of a novel encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we develop a novel finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 150x and 20x reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets, such as ShapeNet and Objaverse. △ Less

Submitted 28 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Project page: https://gynjn.github.io/Codec-NeRF/

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02231 [pdf]

Tunability of charge density wave in a magnetic kagome metal

Authors: Ji Seop Oh, Ananya Biswas, Mason Klemm, Hengxin Tan, Makoto Hashimoto, Donghui Lu, Binghai Yan, Pengcheng Dai, Robert J. Birgeneau, Ming Yi

Abstract: The discovery of the charge density wave order (CDW) within a magnetically ordered phase in the kagome lattice FeGe has provided a promising platform to investigate intertwined degrees of freedom in kagome lattices. Recently, a method based on post-annealing has been suggested to manipulate the CDW order in kagome FeGe towards either long-range or suppressed orders. Here, we provide a comprehensiv… ▽ More The discovery of the charge density wave order (CDW) within a magnetically ordered phase in the kagome lattice FeGe has provided a promising platform to investigate intertwined degrees of freedom in kagome lattices. Recently, a method based on post-annealing has been suggested to manipulate the CDW order in kagome FeGe towards either long-range or suppressed orders. Here, we provide a comprehensive comparison of the experimentally measured electronic structures of FeGe crystals that have undergone different post-annealing procedures and demonstrate the remarkable effectiveness on tuning the CDW gap without strong perturbation on the underlying electronic structure. Moreover, we observe an additional low temperature transition that only appears in crystals with a long-range CDW order, which we associate with a lattice-spin coupled order. Our work indicates a likely strong sensitivity of the CDW order to disorder in FeGe, and provides evidence for strong coupling between the electronic, lattice, and spin degrees of freedom in this kagome magnet. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.02220 [pdf, ps, other]

Late time tail of waves on dynamic asymptotically flat spacetimes of odd space dimensions

Authors: Jonathan Luk, Sung-** Oh

Abstract: We introduce a general method for understanding the late time tail for solutions to wave equations on asymptotically flat spacetimes with odd space dimensions. In particular, for a large class of equations, we prove that the precise late time tail is determined by the limits of higher radiation field at future null infinity. In the setting of stationary linear equations, we recover and generaliz… ▽ More We introduce a general method for understanding the late time tail for solutions to wave equations on asymptotically flat spacetimes with odd space dimensions. In particular, for a large class of equations, we prove that the precise late time tail is determined by the limits of higher radiation field at future null infinity. In the setting of stationary linear equations, we recover and generalize the Price law decay rates. In particular, in addition to reproving known results on $(3+1)$-dimensional black holes, this allows one to obtain the sharp decay rate for the wave equation on higher dimensional black hole spacetimes, which exhibits an anomalous rate due to subtle cancellations. More interesting, our method goes beyond the stationary linear case and applies to both equations on dynamical background and nonlinear equations. In this case, our results can be used to show that in general there is a correction to the Price law rates. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 180 pages

arXiv:2404.02212 [pdf, other]

Complex Velocity Structure of Nebular Gas in Active Galaxies Centred in Cooling X-ray Atmospheres

Authors: Marie-Joëlle Gingras, Alison L. Coil, B. R. McNamara, Serena Perrotta, Fabrizio Brighenti, H. R. Russell, S. Peng Oh

Abstract: [OII] emission maps obtained with the Keck Cosmic Web Imager (KCWI) are presented for four galaxies lying at the centers of cooling X-ray cluster atmospheres. Nebular emission reaching altitudes of tens of kpc is found in systems covering a broad range of atmospheric cooling rates, cluster masses, and dynamical states. The central galaxy in Abell 262 hosts high angular momentum gas in a kpc-scale… ▽ More [OII] emission maps obtained with the Keck Cosmic Web Imager (KCWI) are presented for four galaxies lying at the centers of cooling X-ray cluster atmospheres. Nebular emission reaching altitudes of tens of kpc is found in systems covering a broad range of atmospheric cooling rates, cluster masses, and dynamical states. The central galaxy in Abell 262 hosts high angular momentum gas in a kpc-scale disk. The nebular gas in RXJ0820.9+0752 is offset and redshifted with respect to the central galaxy by $10-20$ kpc and $150$ km s$^{-1}$, respectively. The nebular gas in PKS 0745-191 and Abell 1835, both experiencing strong radio-mechanical feedback, is being churned to higher velocity dispersion by the buoyantly rising bubbles and jets. Churning gas flows, likely outflows behind the rising radio bubbles, are likely driven by buoyancy and ram pressure due to the galaxies' motion with respect to the gas. The churned gas is enveloped by larger scale, lower velocity dispersion quiescent nebular emission. The mean radial speeds of the churned gas, quiescent gas, and the central galaxy each differ by up to $\sim 150$ km s$^{-1}$, although speeds upward of $800$ km s$^{-1}$ are found. Nebular gas is dynamically complex due to feedback, motion of the central galaxy, and perhaps relative motion of the hot atmosphere from which it presumably condensed. These motions will affect thermally unstable cooling models, the dispersal of jet energy, and the angular momentum of gas accreting onto the galaxy and its nuclear black hole. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 37 pages, 26 figures, submitted to ApJ

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01774 [pdf, other]

MHONGOOSE -- A MeerKAT Nearby Galaxy HI Survey

Authors: W. J. G. de Blok, J. Healy, F. M. Maccagni, D. J. Pisano, A. Bosma, J. English, T. Jarrett, A. Marasco, G. R. Meurer, S. Veronese, F. Bigiel, L. Chemin, F. Fraternali, B. W. Holwerda, P. Kamphuis, H. R. Klöckner, D. Kleiner, A. K. Leroy, M. Mogotsi, K. A. Oman, E. Schinnerer, L. Verdes-Montenegro, T. Westmeier, O. I. Wong, N. Zabel , et al. (35 additional authors not shown)

Abstract: The MHONGOOSE (MeerKAT HI Observations of Nearby Galactic Objects: Observing Southern Emitters) survey maps the distribution and kinematics of the neutral atomic hydrogen (HI) gas in and around 30 nearby star-forming spiral and dwarf galaxies to extremely low HI column densities. The HI column density sensitivity (3 sigma over 16 km/s) ranges from ~ 5 x 10^{17} cm^{-2} at 90'' resolution to ~4 x 1… ▽ More The MHONGOOSE (MeerKAT HI Observations of Nearby Galactic Objects: Observing Southern Emitters) survey maps the distribution and kinematics of the neutral atomic hydrogen (HI) gas in and around 30 nearby star-forming spiral and dwarf galaxies to extremely low HI column densities. The HI column density sensitivity (3 sigma over 16 km/s) ranges from ~ 5 x 10^{17} cm^{-2} at 90'' resolution to ~4 x 10^{19} cm^{-2} at the highest resolution of 7''. The HI mass sensitivity (3 sigma over 50 km/s) is ~5.5 X 10^5 M_sun at a distance of 10 Mpc (the median distance of the sample galaxies). The velocity resolution of the data is 1.4 km/s. One of the main science goals of the survey is the detection of cold, accreting gas in the outskirts of the sample galaxies. The sample was selected to cover a range in HI masses, from 10^7 M_sun to almost 10^{11} M_sun, to optimally sample possible accretion scenarios and environments. The distance to the sample galaxies ranges from 3 to 23 Mpc. In this paper, we present the sample selection, survey design, and observation and reduction procedures. We compare the integrated HI fluxes based on the MeerKAT data with those derived from single-dish measurement and find good agreement, indicating that our MeerKAT observations are recovering all flux. We present HI moment maps of the entire sample based on the first ten percent of the survey data, and find that a comparison of the zeroth- and second-moment values shows a clear separation between the physical properties of the HI in areas with star formation and areas without, related to the formation of a cold neutral medium. Finally, we give an overview of the HI-detected companion and satellite galaxies in the 30 fields, five of which have not previously been catalogued. We find a clear relation between the number of companion galaxies and the mass of the main target galaxy. △ Less

Submitted 6 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted for publication in Astronomy & Astrophysics

arXiv:2404.00060 [pdf, other]

Temporal Graph Networks for Graph Anomaly Detection in Financial Networks

Authors: Ye** Kim, Youngbin Lee, Minyoung Choe, Sungju Oh, Yongjae Lee

Abstract: This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Networ… ▽ More This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Network (GNN) baselines, as well as cutting-edge hypergraph neural network baselines using DGraph dataset for a realistic financial context. Our results demonstrate that TGN significantly outperforms other models in terms of AUC metrics. This superior performance underlines TGN's potential as an effective tool for detecting financial fraud, showcasing its ability to adapt to the dynamic and complex nature of modern financial systems. We also experimented with various graph embedding modules within the TGN framework and compared the effectiveness of each module. In conclusion, we demonstrated that, even with variations within TGN, it is possible to achieve good performance in the anomaly detection task. △ Less

Submitted 27 March, 2024; originally announced April 2024.

Comments: Presented at the AAAI 2024 Workshop on AI in Finance for Social Impact (https://sites.google.com/view/aifin-aaai2024)

arXiv:2403.17977 [pdf, ps, other]

Abelian Chern-Simons term as a Kaluza-Klein dimensional reduction of the Gibbons-Hawking surface term

Authors: Hongsu Kim, Jae Sok Oh

Abstract: It is suggested that the original, minimal Kaluza-Klein theory should be extended by adding a 5-dimensional version of the Gibbons-Hawking gravitational surface term. It is then demonstrated that the usual dimensional reduction of the newly added surface (boundary) term leads to the emergence of the famous Abelian Chern-Simons term. It is stressed that the advent of this Chern-Simons term is not m… ▽ More It is suggested that the original, minimal Kaluza-Klein theory should be extended by adding a 5-dimensional version of the Gibbons-Hawking gravitational surface term. It is then demonstrated that the usual dimensional reduction of the newly added surface (boundary) term leads to the emergence of the famous Abelian Chern-Simons term. It is stressed that the advent of this Chern-Simons term is not merely a parametrization artefact but a real thing. Finally, the issue of finite-ranged electromagnetic interaction due to massive photons on a plane has been interpreted in terms of the violation of the local gauge invariance of this extended version of Kaluza-Klein theory. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2403.16509 [pdf, other]

Human Understanding AI Paper Challenge 2024 -- Dataset Design

Authors: Se Won Oh, Hyuntae Jeong, Jeong Mook Lim, Seungeun Chung, Kyoung Ju Noh

Abstract: In 2024, we will hold a research paper competition (the third Human Understanding AI Paper Challenge) for the research and development of artificial intelligence technologies to understand human daily life. This document introduces the datasets that will be provided to participants in the competition, and summarizes the issues to consider in data processing and learning model development. In 2024, we will hold a research paper competition (the third Human Understanding AI Paper Challenge) for the research and development of artificial intelligence technologies to understand human daily life. This document introduces the datasets that will be provided to participants in the competition, and summarizes the issues to consider in data processing and learning model development. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 7 pages, 3 figures

ACM Class: J.7; E.m

arXiv:2403.14353 [pdf, other]

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Authors: Yoonsung Kim, Changhun Oh, **woo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park

Abstract: Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a l… ▽ More Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power. △ Less

Submitted 28 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Showing 1–50 of 1,764 results for author: OH, S