Search | arXiv e-print repository

Pair Density Waves and Supercurrent Diode Effect in Altermagnets

Abstract: Metallic altermagnets are unusual collinear magnets that feature zero net magnetization with momentum-dependent spin splitting. Here, we show that this spin splitting can induce pair density wave states even in the absence of external magnetic fields. Focusing on BCS-type attractive interactions, we find the stabilization of symmetrically distinct pair density wave states depending on the chemical… ▽ More Metallic altermagnets are unusual collinear magnets that feature zero net magnetization with momentum-dependent spin splitting. Here, we show that this spin splitting can induce pair density wave states even in the absence of external magnetic fields. Focusing on BCS-type attractive interactions, we find the stabilization of symmetrically distinct pair density wave states depending on the chemical potential. These states include Fulde-Ferrell and Fulde-Ferrell* states, both of which break inversion symmetry. We investigate the supercurrent properties and discover non-reciprocal supercurrents for both the Fulde-Ferrell and Fulde-Ferrell* states with distinct spatial dependencies. We propose that the supercurrent diode effect can serve as an experimental tool for distinguishing between different pair density waves in metallic altermagnets and discuss the relation to material candidates. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 9 pages, 5 figures

arXiv:2407.01398 [pdf, other]

doi 10.1093/mnras/stae1615

New Radiative and Collisional Atomic Data for Sr {\sc ii} and Y {\sc ii} with application to Kilonova modelling

Authors: Leo Mulholland, Niall McElroy, Fiona McNeill, Stuart Sim, Connor Ballance, Catherine Ramsbottom

Abstract: The spectra of singly ionised Strontium and Yttrium (Sr {\sc ii} and Y {\sc ii}) have been proposed as identifications of certain spectral features in the AT2017gfo spectrum. With the growing demand for NLTE simulations of Kilonovae, there is a increasing need for atomic data for these and other $r$-process elements. Our goal is to expand upon the current set of atomic data for $r$-process element… ▽ More The spectra of singly ionised Strontium and Yttrium (Sr {\sc ii} and Y {\sc ii}) have been proposed as identifications of certain spectral features in the AT2017gfo spectrum. With the growing demand for NLTE simulations of Kilonovae, there is a increasing need for atomic data for these and other $r$-process elements. Our goal is to expand upon the current set of atomic data for $r$-process elements, by presenting transition probabilities and Maxwellian-averaged effective collision strengths for Sr {\sc ii} and Y {\sc ii}. The Breit-Pauli and DARC $R$-matrix codes are employed to calculate the appropriate collision strengths, which are thermally averaged according to a Maxwellian distribution to calculate excitation and de-excitation rates. The {\sc tardis} and {\sc ColRadPy} packages are subsequently used to perform LTE and NLTE modelling respectively. A complete set of transition probabilities and effective collision strengths involving levels for Sr {\sc ii} and Y {\sc ii} have been calculated for temperature ranges compatible with kilonova plasma conditions. Forbidden transitions were found to disagree heavily with the Axelrod approximation, an approximation which is currently employed by other models within the literature. Theoretically important spectral lines are identified with both LTE and NLTE modelling codes. LTE simulations in {\sc tardis} reveal no new significant changes to the full synthetic spectra. NLTE simulations in {\sc ColRadPy} provide indications of which features are expected to be strong for a range of regimes, and we include luminosity estimates. Synthetic emission spectra over KNe densities and temperatures reveal potentially interesting spectral lines in the NIR. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19707 [pdf, other]

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Authors: Wonbeom Lee, Jungi Lee, Junghwan Seo, Jaewoong Sim

Abstract: Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks. Serving LLM inference for generating long contents, however, poses a challenge due to the enormous memory footprint of the transient state, known as the key-value (KV) cache, which scales with the sequence length and batch size. In this paper, we present InfiniGen, a… ▽ More Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks. Serving LLM inference for generating long contents, however, poses a challenge due to the enormous memory footprint of the transient state, known as the key-value (KV) cache, which scales with the sequence length and batch size. In this paper, we present InfiniGen, a novel KV cache management framework tailored for long-text generation, which synergistically works with modern offloading-based inference systems. InfiniGen leverages the key insight that a few important tokens that are essential for computing the subsequent attention layer in the Transformer can be speculated by performing a minimal rehearsal with the inputs of the current layer and part of the query weight and key cache of the subsequent layer. This allows us to prefetch only the essential KV cache entries (without fetching them all), thereby mitigating the fetch overhead from the host memory in offloading-based LLM serving systems. Our evaluation on several representative LLMs shows that InfiniGen improves the overall performance of a modern offloading-based system by up to 3.00x compared to prior KV cache management methods while offering substantially better model accuracy. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: OSDI 2024

arXiv:2406.19151 [pdf, other]

Trivariate Bicycle Codes

Authors: Lukas Voss, Sim Jian Xian, Tobias Haug, Kishor Bharti

Abstract: Quantum error correction suppresses noise in quantum systems to allow for high-precision computations. In this work, we introduce Trivariate Bicycle Quantum Low-Density Parity-Check (TB-QLDPC) codes, via an extension of the framework developed by Bravyi et al. [Nature, 627, 778-782 (2024)]. Unlike the weight-6 codes proposed in their study, our approach also offers weight-4 and weight-5 codes, whi… ▽ More Quantum error correction suppresses noise in quantum systems to allow for high-precision computations. In this work, we introduce Trivariate Bicycle Quantum Low-Density Parity-Check (TB-QLDPC) codes, via an extension of the framework developed by Bravyi et al. [Nature, 627, 778-782 (2024)]. Unlike the weight-6 codes proposed in their study, our approach also offers weight-4 and weight-5 codes, which promises to be more amenable to near-term experimental setups. We show that our TB-QLDPC codes up to weight-6 have a bi-planar structure. Further, most of our new codes can also be arranged in a two-dimensional toric layout, and have substantially better encoding rates than comparable surface codes while offering comparable error suppression capabilities. For example, we can encode 4 logical qubits with distance 5 into 30 physical qubits with weight-5 check measurements, while a surface code with comparable parameters requires 100 physical qubits. The high encoding rate and compact layout make our codes highly suitable candidates for near-term hardware implementations, paving the way for a realizable quantum error correction protocol. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages, 18 figures

arXiv:2406.16609 [pdf, other]

Evaluating the Robustness of Deep-Learning Algorithm-Selection Models by Evolving Adversarial Instances

Authors: Emma Hart, Quentin Renau, Kevin Sim, Mohamad Alissa

Abstract: Deep neural networks (DNN) are increasingly being used to perform algorithm-selection in combinatorial optimisation domains, particularly as they accommodate input representations which avoid designing and calculating features. Mounting evidence from domains that use images as input shows that deep convolutional networks are vulnerable to adversarial samples, in which a small perturbation of an in… ▽ More Deep neural networks (DNN) are increasingly being used to perform algorithm-selection in combinatorial optimisation domains, particularly as they accommodate input representations which avoid designing and calculating features. Mounting evidence from domains that use images as input shows that deep convolutional networks are vulnerable to adversarial samples, in which a small perturbation of an instance can cause the DNN to misclassify. However, it remains unknown as to whether deep recurrent networks (DRN) which have recently been shown promise as algorithm-selectors in the bin-packing domain are equally vulnerable. We use an evolutionary algorithm (EA) to find perturbations of instances from two existing benchmarks for online bin packing that cause trained DRNs to misclassify: adversarial samples are successfully generated from up to 56% of the original instances depending on the dataset. Analysis of the new misclassified instances sheds light on the `fragility' of some training instances, i.e. instances where it is trivial to find a small perturbation that results in a misclassification and the factors that influence this. Finally, the method generates a large number of new instances misclassified with a wide variation in confidence, providing a rich new source of training data to create more robust models. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: To appear in the proceedings of the 18th International Conference on Parallel Problem Solving from Nature (PPSN 2024)

arXiv:2406.14473 [pdf, other]

Data-Centric AI in the Age of Large Language Models

Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, **gtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.12930 [pdf, other]

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

Authors: Jungi Lee, Wonbeom Lee, Jaewoong Sim

Abstract: Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape. However, deploying LLM inference poses challenges due to the high compute and memory requirements stemming from the enormous model size and the difficulty of running it in the integer pipelines. In this paper,… ▽ More Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape. However, deploying LLM inference poses challenges due to the high compute and memory requirements stemming from the enormous model size and the difficulty of running it in the integer pipelines. In this paper, we present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision. Based on our analysis of outlier values in LLMs, we propose a decomposed quantization technique in which the scale factors of decomposed matrices are powers of two apart. The proposed scheme allows us to avoid explicit requantization (i.e., dequantization/quantization) when accumulating the partial sums from the decomposed matrices, with a minimal extension to the commodity tensor compute hardware. Our evaluation shows that Tender achieves higher accuracy and inference performance compared to the state-of-the-art methods while also being significantly less intrusive to the existing accelerators. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: To appear at the 51st International Symposium on Computer Architecture (ISCA 2024)

arXiv:2406.12208 [pdf, other]

Knowledge Fusion By Evolving Weights of Language Models

Authors: Guodong Du, **g Li, Hanting Liu, Runhua Jiang, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Abstract: Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to… ▽ More Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2406.11379 [pdf, other]

On existence of Sadovskii vortex patch: A touching pair of symmetric counter-rotating uniform vortex

Authors: Kyudong Choi, In-Jee Jeong, Young-** Sim

Abstract: The Sadovskii vortex patch is a traveling wave for the two-dimensional incompressible Euler equations consisting of an odd symmetric pair of vortex patches touching the symmetry axis. Its existence was first suggested by numerical computations of Sadovskii in [J. Appl. Math. Mech., 1971], and has gained significant interest due to its relevance in inviscid limit of planar flows via Prandtl--Batche… ▽ More The Sadovskii vortex patch is a traveling wave for the two-dimensional incompressible Euler equations consisting of an odd symmetric pair of vortex patches touching the symmetry axis. Its existence was first suggested by numerical computations of Sadovskii in [J. Appl. Math. Mech., 1971], and has gained significant interest due to its relevance in inviscid limit of planar flows via Prandtl--Batchelor theory and as the asymptotic state for vortex ring dynamics. In this work, we prove the existence of a Sadovskii vortex patch, by solving the energy maximization problem under the exact impulse condition and an upper bound on the circulation. △ Less

Submitted 29 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 42 pages, 1 figure

arXiv:2406.11135 [pdf, other]

doi 10.1145/3656156.3663694

Towards Understanding Emotions for Engaged Mental Health Conversations

Authors: Kellie Yu Hui Sim, Kohleen Ti**g Fortuno, Kenny Tsu Wei Choo

Abstract: Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--p… ▽ More Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--particularly for crisis care--we are develo** a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard ty** patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 5 pages, 1 figure, to be published in DIS Companion '24

ACM Class: H.5.2; I.2.7

arXiv:2406.08682 [pdf, other]

FIP-GNN: Graph neural networks for scalable prediction of grain-level fatigue indicator parameters

Authors: Gyu-Jang Sim, Myoung-Gyu Lee, Marat I. Latypov

Abstract: High-cycle fatigue is a critical performance metric of structural alloys for many applications. The high cost, time, and labor involved in experimental fatigue testing call for efficient and accurate computer models of fatigue life. We present graph neural networks for polycrystals that, for the first time, can (i) predict fatigue indicator parameters -- grain-level responses to cyclic loading wel… ▽ More High-cycle fatigue is a critical performance metric of structural alloys for many applications. The high cost, time, and labor involved in experimental fatigue testing call for efficient and accurate computer models of fatigue life. We present graph neural networks for polycrystals that, for the first time, can (i) predict fatigue indicator parameters -- grain-level responses to cyclic loading well beyond monotonic elastic and inelastic regimes reported in literature; and (ii) generalize these predictions to large microstructure volume elements with grain populations well beyond those used in training. These advances can make significant contributions to statistically rigorous and computationally efficient modeling of high-cycle fatigue -- a long-standing challenge in the field. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.05411 [pdf, other]

Generalized symmetry in non-Hermitian systems

Authors: Karin Sim, Nicolò Defenu, Paolo Molignini, R. Chitra

Abstract: Despite acute interest in the dynamics of non-Hermitian systems, there is a lack of consensus in the mathematical formulation of non-Hermitian quantum mechanics in the community. Different methodologies are used in the literature to study non-Hermitian dynamics. This ranges from consistent frameworks like biorthogonal quantum mechanics and metric approach characterized by modified inner products,… ▽ More Despite acute interest in the dynamics of non-Hermitian systems, there is a lack of consensus in the mathematical formulation of non-Hermitian quantum mechanics in the community. Different methodologies are used in the literature to study non-Hermitian dynamics. This ranges from consistent frameworks like biorthogonal quantum mechanics and metric approach characterized by modified inner products, to normalization by time-dependent norms inspired by open quantum systems. In this work, we systematically explore the similarities and differences among these various methods. Utilizing illustrative models with exact solutions, we demonstrate that these methods produce not only quantitatively different results but also distinct physical interpretations. For dissipative systems where non-Hermiticity arises as an approximation, we find that the normalization method in the $\mathcal{PT}$-broken regime closely aligns with the full master equation solutions. In contrast, for quantum systems where non-Hermiticity can be engineered exactly, incorporating metric dynamics is crucial for the probabilistic interpretation of quantum mechanics, necessitating the generalization of unitary symmetry to non-Hermitian systems. This study lays the groundwork for further exploration of non-Hermitian Hamiltonians, potentially leveraging generalized symmetries for novel physical phenomena. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 11 pages, 1 figure

arXiv:2406.04576 [pdf, other]

Metasurfaces for infrared multi-modal microscopy: phase contrast and bright field

Authors: Shaban B. Sulejman, Lukas Wesemann, Mikkaela McCormack, Jiajun Meng, James A. Hutchison, Niken Priscilla, Gawain McColl, Katrina Read, Wilson Sim, Andrey A. Sukhorukov, Kenneth B. Crozier, Ann Roberts

Abstract: Different imaging modalities are used to extract the diverse information carried in an optical field. Two prominent modalities include bright field and phase contrast microscopy that can visualize the amplitude and phase features of a sample, respectively. However, capturing both of these images on the same camera typically requires interchanging optical components. Metasurfaces are ultra-thin nan… ▽ More Different imaging modalities are used to extract the diverse information carried in an optical field. Two prominent modalities include bright field and phase contrast microscopy that can visualize the amplitude and phase features of a sample, respectively. However, capturing both of these images on the same camera typically requires interchanging optical components. Metasurfaces are ultra-thin nanostructures that can merge both of these operations into a single miniaturized device. Here, a silicon-based metasurface that supports a Mie resonance is demonstrated to perform near-infrared phase contrast and bright field multi-modal microscopy that can be tuned by changing the polarization of the illumination. We performed experiments using optical fields with phase variations synthesized by a spatial light modulator and introduced by propagation through semi-transparent samples, including C. elegans, unstained human prostate cancer cells and breast tissue. The results demonstrate the potential of metasurfaces for label-free point-of-care testing. △ Less

Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Main text 18 pages, 5 main figures, Supplementary information 19 pages, 19 supplementary figures, 2 supplementary videos

arXiv:2406.02349 [pdf, other]

CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Authors: Runhua Jiang, Guodong Du, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Abstract: Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate… ▽ More Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.18832 [pdf, other]

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

Authors: Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim

Abstract: Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter move… ▽ More Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter movement by transferring only the $\textit{hot}$ experts to the GPU, while computing the remaining $\textit{cold}$ experts inside the host memory device. By replacing the transfers of massive expert parameters with the ones of small activations, MoNDE enables far more communication-efficient MoE inference, thereby resulting in substantial speedups over the existing parameter offloading frameworks for both encoder and decoder operations. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted to DAC 2024

arXiv:2405.13596 [pdf, other]

SN 2023zaw: the low-energy explosion of an ultra-stripped star, with non-radioactive heating

Authors: Thomas Moore, James Gillanders, Matt Nicholl, Mark Huber, Stephen Smartt, Shubham Srivastav, Heloise Stevance, Ting-Wan Chen, Kenneth Chambers, Joseph Anderson, Michael Fulton, Samantha Oates, Charlotte Angus, Giuliano Pignata, Nicolas Erasmus, Hua Gao, Joanna Bulger, Chien-Cheng Lin, Thomas Lowe, Eugene Magnier, Paloma Minguez, Chow-Choong Ngeow, Xinyue Sheng, Stuart A. Sim, Ken Smith , et al. (4 additional authors not shown)

Abstract: Most stripped envelope supernova progenitors are formed through binary interaction, losing hydrogen and/or helium from their outer layers. An emerging class of supernovae with the highest degree of envelope-strip** are thought to be the product of strip** by a NS companion. However, relatively few examples are known and the outcomes of such systems can be diverse and are poorly understood at p… ▽ More Most stripped envelope supernova progenitors are formed through binary interaction, losing hydrogen and/or helium from their outer layers. An emerging class of supernovae with the highest degree of envelope-strip** are thought to be the product of strip** by a NS companion. However, relatively few examples are known and the outcomes of such systems can be diverse and are poorly understood at present. Here, we present spectroscopic observations and high cadence multi-band photometry of SN 2023zaw, a low ejecta mass and rapidly evolving supernova. SN 2023zaw was discovered in a nearby spiral galaxy at D = 39.7 Mpc, with significant Milky Way extinction, $E(B-V) = 0.21$, and significant (but uncertain) host extinction. Bayesian evidence comparison reveals that nickel is not the only power source and an additional energy source is required to explain our observations. Our models suggest an ejecta mass of $M_{\rm ej} \sim 0.07\,\rm M_\odot$ and a synthesised nickel mass of $M_{\rm ej} \sim 0.007\,\rm M_\odot$ is required to explain the explosion. However an additional heating from a magnetar or interaction with circumstellar material is required to power the early light curve. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11436 [pdf, other]

Quantum sampling algorithms for quantum state preparation and matrix block-encoding

Authors: Jessica Lemieux, Matteo Lostaglio, Sam Pallister, William Pol, Karthik Seetharam, Sukin Sim, Burak Şahinoğlu

Abstract: The problems of quantum state preparation and matrix block-encoding are ubiquitous in quantum computing: they are crucial parts of various quantum algorithms for the purpose for initial state preparation as well as loading problem relevant data. We first present an algorithm based on QRS that prepares a quantum state $|ψ_f\rangle \propto \sum^N_{x=1} f(x)|x\rangle$. When combined with efficient re… ▽ More The problems of quantum state preparation and matrix block-encoding are ubiquitous in quantum computing: they are crucial parts of various quantum algorithms for the purpose for initial state preparation as well as loading problem relevant data. We first present an algorithm based on QRS that prepares a quantum state $|ψ_f\rangle \propto \sum^N_{x=1} f(x)|x\rangle$. When combined with efficient reference states the algorithm reduces the cost of quantum state preparation substantially, if certain criteria on $f$ are met. When the preparation of the reference state is not the dominant cost, and the function $f$ and relevant properties are efficiently computable or provided otherwise with cost $o(N)$, the QRS-based method outperforms the generic state preparation algorithm, which has cost $O(N)$. We demonstrate the detailed performance (in terms of the number of Toffoli gates) of the QRS-based algorithm for quantum states commonly appearing in quantum applications, e.g., those with coefficients that obey power law decay, Gaussian, and hyperbolic tangent, and compare it with other methods. Then, we adapt QRS techniques to the matrix block-encoding problem and introduce a QRS-based algorithm for block-encoding a given matrix $A = \sum_{ij} A_{ij} |i\rangle \langle j|$. We work out rescaling factors for different access models, which encode how the information about the matrix is provided to the quantum computer. We exemplify these results for a particular Toeplitz matrix with elements $A_{\mathbf{ij}}= 1/\|{\mathbf{i}}-{\mathbf{j}}\|^2$, which appears in quantum chemistry, and PDE applications, e.g., when the Coulomb interaction is involved. Our work unifies, and in certain ways goes beyond, various quantum state preparation and matrix block-encoding methods in the literature, and gives detailed performance analysis of important examples that appear in quantum applications. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 58 pages, 28 figures, 5 tables

arXiv:2405.07457 [pdf, other]

Magnetoelectric domain engineering from micrometer to Ångstrøm scales

Authors: Marcela Giraldo, Arkadiy Simonov, Hasung Sim, Ahmed Samir Lotfy, Martin Lilienblum, Lea Forster, Elzbieta Gradauskaite, Morgan Trassin, Je-Geun Park, Thomas Lottermoser, Manfred Fiebig

Abstract: The functionality of magnetoelectric multiferroics depends on the formation, size, and coupling of their magnetic and electric domains. Knowing the parameters guiding these criteria is a key effort in the emerging field of magnetoelectric domain engineering. Here we show, using a combination of piezoresponse-force microscopy, non-linear optics, and x-ray scattering, that the correlation length set… ▽ More The functionality of magnetoelectric multiferroics depends on the formation, size, and coupling of their magnetic and electric domains. Knowing the parameters guiding these criteria is a key effort in the emerging field of magnetoelectric domain engineering. Here we show, using a combination of piezoresponse-force microscopy, non-linear optics, and x-ray scattering, that the correlation length setting the size of the ferroelectric domains in the multiferroic hexagonal manganites can be engineered from the micron range down to a few unit cells under the substitution of Mn$^{3+}$ ions with Al$^{3+}$ ions. The magnetoelectric coupling mechanism between the antiferromagnetic Mn$^{3+}$ order and the distortive-ferroelectric order remains intact even at substantial replacement of Mn$^{3+}$ by Al$^{3+}$. Hence, chemical substitution proves to be an effective tool for domain-size engineering in one of the most studied classes of multiferroics. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures

arXiv:2405.07414 [pdf, other]

Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Authors: Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

Abstract: The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self… ▽ More The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, map** from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grou** similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in https://github.com/kyungeun-lee/tabularbinning. △ Less

Submitted 13 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

Comments: ICML 2024, 18 pages (including supplementary materials)

arXiv:2404.19611 [pdf, other]

Radio Resource Management Design for RSMA: Optimization of Beamforming, User Admission, and Discrete/Continuous Rates with Imperfect SIC

Authors: L. F. Abanto-Leon, A. Krishnamoorthy, A. Garcia-Saavedra, G. H. Sim, R. Schober, M. Hollick

Abstract: This paper investigates the radio resource management (RRM) design for multiuser rate-splitting multiple access (RSMA), accounting for various characteristics of practical wireless systems, such as the use of discrete rates, the inability to serve all users, and the imperfect successive interference cancellation (SIC). Specifically, failure to consider these characteristics in RRM design may lead… ▽ More This paper investigates the radio resource management (RRM) design for multiuser rate-splitting multiple access (RSMA), accounting for various characteristics of practical wireless systems, such as the use of discrete rates, the inability to serve all users, and the imperfect successive interference cancellation (SIC). Specifically, failure to consider these characteristics in RRM design may lead to inefficient use of radio resources. Therefore, we formulate the RRM of RSMA as optimization problems to maximize respectively the weighted sum rate (WSR) and weighted energy efficiency (WEE), and jointly optimize the beamforming, user admission, discrete/continuous rates, accounting for imperfect SIC, which result in nonconvex mixed-integer nonlinear programs that are challenging to solve. Despite the difficulty of the optimization problems, we develop algorithms that can find high-quality solutions. We show via simulations that carefully accounting for the aforementioned characteristics, can lead to significant gains. Precisely, by considering that transmission rates are discrete, the transmit power can be utilized more intelligently, allocating just enough power to guarantee a given discrete rate. Additionally, we reveal that user admission plays a crucial role in RSMA, enabling additional gains compared to random admission by facilitating the servicing of selected users with mutually beneficial channel characteristics. Furthermore, provisioning for possibly imperfect SIC makes RSMA more robust and reliable. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.14618 [pdf, other]

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Authors: Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah

Abstract: Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) devices, tend to lag behind in terms of response quality. Therefore in this work we propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality. Our ap… ▽ More Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) devices, tend to lag behind in terms of response quality. Therefore in this work we propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality. Our approach uses a router that assigns queries to the small or large model based on the predicted query difficulty and the desired quality level. The desired quality level can be tuned dynamically at test time to seamlessly trade quality for cost as per the scenario requirements. In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted to ICLR 2024 (main conference)

arXiv:2404.11955 [pdf]

Electrical control of a Kondo spin screening cloud

Authors: Ngoc Han Tu, Donghoon Kim, Minsoo Kim, Jeongmin Shim, Ryo Ito, David Pomaranski, Ivan V. Borzenets, Arne Ludwig, Andreas D. Wieck, Heung-Sun Sim, Michihisa Yamamoto

Abstract: In metals and semiconductors, an impurity spin is quantum entangled with and thereby screened by surrounding conduction electrons at low temperatures, called the Kondo screening cloud. Quantum confinement of the Kondo screening cloud in a region, called a Kondo box, with a length smaller than the original cloud extension length strongly deforms the screening cloud and provides a way of controlling… ▽ More In metals and semiconductors, an impurity spin is quantum entangled with and thereby screened by surrounding conduction electrons at low temperatures, called the Kondo screening cloud. Quantum confinement of the Kondo screening cloud in a region, called a Kondo box, with a length smaller than the original cloud extension length strongly deforms the screening cloud and provides a way of controlling the entanglement. Here we realize such a Kondo box and develop an approach to controlling and monitoring the entanglement. It is based on a spin localized in a semiconductor quantum dot, which is screened by conduction electrons along a quasi-one-dimensional channel. The box is formed between the dot and a quantum point contact placed on a channel. As the quantum point contact is tuned to make the confinement stronger, electron conductance through the dot as a function of temperature starts to deviate from the known universal function of the single energy scale, the Kondo temperature. Nevertheless, the entanglement is monitored by the measured conductance according to our theoretical development. The dependence of the monitored entanglement on the confinement strength and temperature implies that the Kondo screening is controlled by tuning the quantum point contact. Namely, the Kondo cloud is deformed by the Kondo box in the region across the original cloud length. Our findings offer a way of manipulating and detecting spatially extended quantum many-body entanglement in solids by electrical means. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11099 [pdf]

Interplay between magnetic and lattice excitations and emergent multiple phase transitions in MnPSe3-xSx

Authors: Deepu Kumar, Nguyen The Hoang, Yumin Sim, Youngsu Choi, Kalaivanan Raju, Rajesh Kumar Ulaganathan, Raman Sankar, Maeng-Je Seong, Kwang-Yong Choi

Abstract: The intricate interplay between spin and lattice degrees of freedom in two-dimensional magnetic materials plays a pivotal role in modifying their magnetic characteristics, engendering hybrid quasiparticles, and implementing functional devices. Herein, we present our comprehensive and in-depth investigations on magnetic and lattice excitations of MnPSe3-xSx (x = 0, 0.5, and 1.5) alloys, utilizing t… ▽ More The intricate interplay between spin and lattice degrees of freedom in two-dimensional magnetic materials plays a pivotal role in modifying their magnetic characteristics, engendering hybrid quasiparticles, and implementing functional devices. Herein, we present our comprehensive and in-depth investigations on magnetic and lattice excitations of MnPSe3-xSx (x = 0, 0.5, and 1.5) alloys, utilizing temperature- and polarization-dependent Raman scattering. Our experimental results reveal the occurrence of multiple phase transitions, evidenced by notable changes in phonon self-energy and the appearance or splitting of phonon modes. These emergent phases are tied to the development of long and short-range spin-spin correlations, as well as to spin reorientations or magnetic instabilities. Our analysis of two-magnon excitations as a function of temperature and composition showcases their hybridization with phonons whose degree weakens with increasing x. Moreover, the suppression of spin-dependent phonon intensity in chemically most-disordered MnPSe3-xSx (x = 1.5) suggests that chalcogen substitution offers a control knob of tuning spin and phonon dynamics by modulating concurrently superexchange pathways and a degree of trigonal distortions. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09173 [pdf, other]

TransformerFAM: Feedback attention is working memory

Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

Abstract: While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, a… ▽ More While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length. △ Less

Submitted 7 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: 26 pages, 12 figures, 14 tables

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01752 [pdf, other]

Safe Interval RRT* for Scalable Multi-Robot Path Planning in Continuous Space

Authors: Joonyeol Sim, Joonkyung Kim, Changjoo Nam

Abstract: In this paper, we consider the problem of Multi-Robot Path Planning (MRPP) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple robots leads to combinatorial decision-making, which escalates the search space exponentially. Second, the continuous space presents potentially infinite states and actions. For… ▽ More In this paper, we consider the problem of Multi-Robot Path Planning (MRPP) in continuous space to find conflict-free paths. The difficulty of the problem arises from two primary factors. First, the involvement of multiple robots leads to combinatorial decision-making, which escalates the search space exponentially. Second, the continuous space presents potentially infinite states and actions. For this problem, we propose a two-level approach where the low level is a sampling-based planner Safe Interval RRT* (SI-RRT*) that finds a collision-free trajectory for individual robots. The high level can use any method that can resolve inter-robot conflicts where we employ two representative methods that are Prioritized Planning (SI-CPP) and Conflict Based Search (SI-CCBS). Experimental results show that SI-RRT* can find a high-quality solution quickly with a small number of samples. SI-CPP exhibits improved scalability while SI-CCBS produces higher-quality solutions compared to the state-of-the-art planners for continuous space. Compared to the most scalable existing algorithm, SI-CPP achieves a success rate that is up to 94% higher with 100 robots while maintaining solution quality (i.e., flowtime, the sum of travel times of all robots) without significant compromise. SI-CPP also decreases the makespan up to 45%. SI-CCBS decreases the flowtime by 9% compared to the competitor, albeit exhibiting a 14% lower success rate. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01676 [pdf, other]

Incentives in Private Collaborative Machine Learning

Authors: Rachael Hwee Ling Sim, Yehong Zhang, Trong Nghia Hoang, Xinyi Xu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and… ▽ More Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to NeurIPS 2023

arXiv:2404.01396 [pdf, other]

A case study against QSVT: assessment of quantum phase estimation improved by signal processing techniques

Authors: Sean Greenaway, William Pol, Sukin Sim

Abstract: In recent years, quantum algorithms have been proposed which use quantum phase estimation (QPE) coherently as a subroutine without measurement. In order to do this effectively, the routine must be able to distinguish eigenstates with success probability close to unity. In this paper, we provide the first systematic comparison between two approaches towards maximizing this success probability, one… ▽ More In recent years, quantum algorithms have been proposed which use quantum phase estimation (QPE) coherently as a subroutine without measurement. In order to do this effectively, the routine must be able to distinguish eigenstates with success probability close to unity. In this paper, we provide the first systematic comparison between two approaches towards maximizing this success probability, one using the quantum singular value transform and the other leveraging window functions, which have been previously studied as priors of the phase value distribution. We find that the quantum singular value transform is significantly outclassed by the window function approach, with the latter able to achieve between 3 and 5 orders of magnitude improvement in the success probability with approximately 1/4 the query cost. Our circuit simulation results indicate that QPE is not a domain which benefits from the integration of QSVT and we show that the use of the Kaiser window function is currently the most practical choice for realizing QPE with high success probability. △ Less

Submitted 17 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: References fixed and added

arXiv:2404.00963 [pdf, other]

doi 10.1039/D4CP00517A

Inversion and Tunability of Van Hove Singularities in $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs) kagome metals

Authors: Sangjun Sim, Min Yong Jeong, Hyunggeun Lee, Dong Hyun David Lee, Myung Joon Han

Abstract: To understand the alkali-metal-dependent material properties of recently discovered $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs), we conducted a detailed electronic structure analysis based on first-principles density functional theory calculations. Contrary to the case of $A$ = K and Rb, the energetic positions of the low-lying Van Hove singularities are reversed in CsV$_{3}$Sb$_{5}$, and the charact… ▽ More To understand the alkali-metal-dependent material properties of recently discovered $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, and Cs), we conducted a detailed electronic structure analysis based on first-principles density functional theory calculations. Contrary to the case of $A$ = K and Rb, the energetic positions of the low-lying Van Hove singularities are reversed in CsV$_{3}$Sb$_{5}$, and the characteristic higher-order Van Hove point gets closer to the Fermi level. We found that this notable difference can be attributed to the chemical effect, apart from structural differences. Due to their different orbital compositions, Van Hove points show qualitatively different responses to the structure changes. A previously unnoticed highest lying point can be lowered, locating close to or even below the other ones in response to a reasonable range of bi- and uni-axial strain. Our results can be useful in better understanding the material-dependent features reported in this family and in realizing experimental control of exotic quantum phases. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Physical Chemistry Chemical Physics (PCCP) in press

arXiv:2404.00626 [pdf, other]

Domain Generalizable Person Search Using Unreal Dataset

Authors: Minyoung Oh, Duhyun Kim, Jae-Young Sim

Abstract: Collecting and labeling real datasets to train the person search networks not only requires a lot of time and effort, but also accompanies privacy issues. The weakly-supervised and unsupervised domain adaptation methods have been proposed to alleviate the labeling burden for target datasets, however, their generalization capability is limited. We introduce a novel person search method based on the… ▽ More Collecting and labeling real datasets to train the person search networks not only requires a lot of time and effort, but also accompanies privacy issues. The weakly-supervised and unsupervised domain adaptation methods have been proposed to alleviate the labeling burden for target datasets, however, their generalization capability is limited. We introduce a novel person search method based on the domain generalization framework, that uses an automatically labeled unreal dataset only for training but is applicable to arbitrary unseen real datasets. To alleviate the domain gaps when transferring the knowledge from the unreal source dataset to the real target datasets, we estimate the fidelity of person instances which is then used to train the end-to-end network adaptively. Moreover, we devise a domain-invariant feature learning scheme to encourage the network to suppress the domain-related features. Experimental results demonstrate that the proposed method provides the competitive performance to existing person search methods even though it is applicable to arbitrary unseen datasets without any prior knowledge and re-training burdens. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: AAAI2024 accepted

arXiv:2403.19709 [pdf, other]

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Authors: Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

Abstract: Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how… ▽ More Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how the adapter parameters are allocated. The adapter consists of a single shared controller network and multiple task-level adapter heads to reduce the per-task parameter overhead without performance regression on downstream tasks. The adapter is also recurrent so the entire adapter parameters are reused across different layers of the pre-trained model. Our Hierarchical Recurrent Adapter (HRA) outperforms the previous adapter-based approaches as well as full model fine-tuning baseline in both single and multi-task adaptation settings when evaluated on automatic speech recognition tasks. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures, 5 tables

arXiv:2403.15084 [pdf, other]

doi 10.1093/mnras/stae847

Including a Luminous Central Remnant in Radiative Transfer Simulations for Type Iax Supernovae

Authors: F. P. Callan, S. A. Sim, C. E. Collins, L. J. Shingles, F. Lach, F. K. Roepke, R. Pakmor, M. Kromer, S. Srivastav

Abstract: Type Iax supernovae (SNe Iax) are proposed to arise from deflagrations of Chandrasekhar mass white dwarfs (WDs). Previous deflagration simulations have achieved good agreement with the light curves and spectra of intermediate-luminosity and bright SNe Iax. However, the model light curves decline too quickly after peak, particularly in red optical and near-infrared (NIR) bands. Deflagration models… ▽ More Type Iax supernovae (SNe Iax) are proposed to arise from deflagrations of Chandrasekhar mass white dwarfs (WDs). Previous deflagration simulations have achieved good agreement with the light curves and spectra of intermediate-luminosity and bright SNe Iax. However, the model light curves decline too quickly after peak, particularly in red optical and near-infrared (NIR) bands. Deflagration models with a variety of ignition configurations do not fully unbind the WD, leaving a remnant polluted with $^{56}\mathrm{Ni}$. Emission from such a remnant may contribute to the luminosity of SNe Iax. Here we investigate the impact of adding a central energy source, assuming instantaneous powering by $^{56}\mathrm{Ni}$ decay in the remnant, in radiative transfer calculations of deflagration models. Including the remnant contribution improves agreement with the light curves of SNe Iax, particularly due to the slower post-maximum decline of the models. Spectroscopic agreement is also improved, with intermediate-luminosity and faint models showing greatest improvement. We adopt the full remnant $^{56}\mathrm{Ni}$ mass predicted for bright models, but good agreement with intermediate-luminosity and faint SNe Iax is only possible for remnant $^{56}\mathrm{Ni}$ masses significantly lower than those predicted. This may indicate that some of the $^{56}\mathrm{Ni}$ decay energy in the remnant does not contribute to the radiative luminosity but instead drives mass ejection, or that escape of energy from the remnant is significantly delayed. Future work should investigate the structure of remnants predicted by deflagration models and the potential roles of winds and delayed energy escape, as well as extend radiative transfer simulations to late times. △ Less

Submitted 19 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: 17 pages, 6 figures. Lightcurves and spectra available at https://hesma.h-its.org

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 530, Issue 2, May 2024, Pages 1457 to 1473

arXiv:2403.11793 [pdf, other]

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

Authors: Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Se** Kim, Sundong Kim

Abstract: The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logica… ▽ More The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 25 pages

arXiv:2403.10948 [pdf, other]

Real-to-Sim Adaptation via High-Fidelity Simulation to Control a Wheeled-Humanoid Robot with Unknown Dynamics

Authors: Donghoon Baek, Youngwoo Sim, Amartya Purushottam, Saurabh Gupta, Joao Ramos

Abstract: Model-based controllers using a linearized model around the system's equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper pres… ▽ More Model-based controllers using a linearized model around the system's equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper presents a framework designed for predicting the new equilibrium point explicitly to control a wheeled-legged robot with unknown dynamics. We estimated the total mass and center of mass of the system from its response to initially unknown dynamics, then calculated the new equilibrium point accordingly. To avoid using additional sensors (e.g., force torque sensor) and reduce the effort of obtaining expensive real data, a data-driven approach is utilized with a novel real-to-sim adaptation. A more accurate nonlinear dynamics model, offering a closer representation of real-world physics, is injected into a rigid-body simulation for real-to-sim adaptation. The nonlinear dynamics model parameters were optimized using Particle Swarm Optimization. The efficacy of this framework was validated on a physical wheeled inverted pendulum, a simplified model of a wheeled-legged robot. The experimental results indicate that employing a more precise analytical model with optimized parameters significantly reduces the gap between simulation and reality, thus improving the efficiency of a model-based controller in controlling a wheeled robot with unknown dynamics. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10022 [pdf, other]

Lifelong Person Re-Identification with Backward-Compatibility

Authors: Minyoung Oh, Jae-Young Sim

Abstract: Lifelong person re-identification (LReID) assumes a practical scenario where the model is sequentially trained on continuously incoming datasets while alleviating the catastrophic forgetting in the old datasets. However, not only the training datasets but also the gallery images are incrementally accumulated, that requires a huge amount of computational complexity and storage space to extract the… ▽ More Lifelong person re-identification (LReID) assumes a practical scenario where the model is sequentially trained on continuously incoming datasets while alleviating the catastrophic forgetting in the old datasets. However, not only the training datasets but also the gallery images are incrementally accumulated, that requires a huge amount of computational complexity and storage space to extract the features at the inference phase. In this paper, we address the above mentioned problem by incorporating the backward-compatibility to LReID for the first time. We train the model using the continuously incoming datasets while maintaining the model's compatibility toward the previously trained old models without re-computing the features of the old gallery images. To this end, we devise the cross-model compatibility loss based on the contrastive learning with respect to the replay features across all the old datasets. Moreover, we also develop the knowledge consolidation method based on the part classification to learn the shared representation across different datasets for the backward-compatibility. We suggest a more practical methodology for performance evaluation as well where all the gallery and query images are considered together. Experimental results demonstrate that the proposed method achieves a significantly higher performance of the backward-compatibility compared with the existing methods. It is a promising tool for more practical scenarios of LReID. △ Less

Submitted 17 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 17 pages, 5 figures, 7 tables

arXiv:2403.06381 [pdf, other]

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Authors: Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi

Abstract: Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus… ▽ More Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus on certain tokens during the generation process, thereby undermining semantic fidelity. To address the issue of dominant attention, we introduce attention regulation, a computation-efficient on-the-fly optimization approach at inference time to align attention maps with the input text prompt. Notably, our method requires no additional training or fine-tuning and serves as a plug-in module on a model. Hence, the generation capacity of the original model is fully preserved. We compare our approach with alternative approaches across various datasets, evaluation metrics, and diffusion models. Experiment results show that our method consistently outperforms other baselines, yielding images that more faithfully reflect the desired concepts with reduced computation overhead. Code is available at https://github.com/YaNgZhAnG-V5/attention_regulation. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.11010 [pdf, other]

doi 10.1051/0004-6361/202449637

Type Ia supernova explosion models are inherently multidimensional

Authors: R. Pakmor, I. R. Seitenzahl, A. J. Ruiter, S. A. Sim, F. K. Roepke, S. Taubenberger, R. Bieri, S. Blondin

Abstract: Theoretical and observational approaches to settling the important questions surrounding the progenitor systems and the explosion mechanism of normal Type Ia supernovae have thus far failed. With its unique capability to obtain continuous spectra through the near- and mid-infrared, JWST now offers completely new insights into Type Ia supernovae. In particular, observing them in the nebular phase a… ▽ More Theoretical and observational approaches to settling the important questions surrounding the progenitor systems and the explosion mechanism of normal Type Ia supernovae have thus far failed. With its unique capability to obtain continuous spectra through the near- and mid-infrared, JWST now offers completely new insights into Type Ia supernovae. In particular, observing them in the nebular phase allows us to directly see the central ejecta and thereby constrain the explosion mechanism. We aim to understand and quantify differences in the structure and composition of the central ejecta of various Type Ia supernova explosion models. We examined the currently most popular explosion scenarios using self-consistent multidimensional explosion simulations of delayed-detonation and pulsationally assisted, gravitationally confined delayed detonation Chandrasekhar-mass models and double-detonation sub-Chandrasekhar-mass and violent merger models. We find that the distribution of radioactive and stable nickel in the final ejecta, both observable in nebular spectra, are significantly different between different explosion scenarios. Therefore, comparing synthetic nebular spectra with JWST observations should allow us to distinguish between explosion models. We show that the explosion ejecta are inherently multidimensional for all models, and the Chandrasekhar-mass explosions simulated in spherical symmetry in particular lead to a fundamentally unphysical ejecta structure. Moreover, we show that radioactive and stable nickel cover a significant range of densities at a fixed velocity of the homologously expanding ejecta. Any radiation transfer postprocessing has to take these variations into account to obtain faithful synthetic observables; this will likely require multidimensional radiation transport simulations. △ Less

Submitted 26 April, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 7 pages, 2 figures, accepted by A&A, comments welcome

Journal ref: A&A 686, A227 (2024)

arXiv:2402.10517 [pdf, other]

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Authors: Yeonhong Park, Jake Hyun, SangLyul Cho, Bonggeun Sim, Jae W. Lee

Abstract: Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance.… ▽ More Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance. Thus, this paper introduces \emph{any-precision LLM}, extending the concept of any-precision DNN to LLMs. Addressing challenges in any-precision LLM, we propose a lightweight method for any-precision quantization of LLMs, leveraging a post-training quantization framework, and develop a specialized software engine for its efficient serving. As a result, our solution significantly reduces the high costs of deploying multiple, different-sized LLMs by overlaying LLMs quantized to varying bit-widths, such as 3, 4, ..., $n$ bits, into a memory footprint comparable to a single $n$-bit LLM. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput, proving itself to be a compelling option for deployment of multiple, different-sized LLMs. Our code is open-sourced and available online. △ Less

Submitted 21 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: To appear at ICML 2024. Code is available at https://github.com/SNU-ARC/any-precision-llm

arXiv:2402.07334 [pdf, other]

Differentially Private Training of Mixture of Experts Models

Authors: Pierre Tholoniat, Huseyin A. Inan, Janardhan Kulkarni, Robert Sim

Abstract: This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational an… ▽ More This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational and privacy concerns. Our study addresses these issues by exploring the potential of MoE models, known for their computational efficiency, and the application of DP, a standard for privacy preservation. We present the first known attempt to train MoE models under the constraints of DP, addressing the unique challenges posed by their architecture and the complexities of DP integration. Our initial experimental studies demonstrate that MoE models can be effectively trained with DP, achieving performance that is competitive with their non-private counterparts. This initial study aims to provide valuable insights and ignite further research in the domain of privacy-preserving MoE models, softly laying the groundwork for prospective developments in this evolving field. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: Preliminary work presented as a poster at the 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI 24)

arXiv:2402.04850 [pdf, other]

Muon $g-2$ and Proton Lifetime in SUSY SU(5) GUTs with Split Superpartners

Authors: Seong-Sik Kim, Hyun Min Lee, Sung-Bo Sim

Abstract: We consider the interplay of the muon $g-2$ anomaly and the proton decay in the SUSY SU(5) GUTs with generation-independent scalar soft masses. In these scenarios, we introduce a number of $\bf 5+{\bar 5}$ messenger fields with doublet-triplet splitting in general gauge mediation to transmit SUSY breaking to the visible sector by gauge loops. As a result, squarks and sleptons receive generation-in… ▽ More We consider the interplay of the muon $g-2$ anomaly and the proton decay in the SUSY SU(5) GUTs with generation-independent scalar soft masses. In these scenarios, we introduce a number of $\bf 5+{\bar 5}$ messenger fields with doublet-triplet splitting in general gauge mediation to transmit SUSY breaking to the visible sector by gauge loops. As a result, squarks and sleptons receive generation-independent soft SUSY breaking masses, which are split already at the messenger scale. Taking into account the perturbative unification of gauge couplings as well as the bounds from electroweak precision and vacuum stability bounds, we showed the parameter space in general gauge mediation to explain the muon $g-2$ anomaly with smuon and sneutrino loops while evading the strong bounds on squarks and gluinos from the Large Hadron Collider. We also obtained the dominant Higgsino contributions to the proton decay mode, $p\to K^+{\barν}$, with general generation-independent sparticle masses for squarks and sleptons. Even for split scalar soft masses in our model, however, we found that the bounds from the proton decay are satisfied only if the effective Yukawa couplings of the colored Higgsinos are suppressed further by a factor of order $10^{-4}-10^{-3}$. We illustrated how such a suppression factor is realized in orbifold GUTs in the extra dimension where the colored Higgsinos in the bulk are not coupled to the matter fields localized at the orbifold fixed points at the leading order. △ Less

Submitted 29 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 35 pages, 8 figures, v2: typos fixed and reference updated, v3: version to appear in Phys. Rev. D

arXiv:2402.02674 [pdf, ps, other]

Modeling X-ray and gamma-ray emission from redback pulsar binaries

Authors: Minju Sim, Hongjun An, Zorawar Wadiasingh

Abstract: We investigated the multiband emission from the pulsar binaries XSS J12270-4859, PSR J2039-5617, and PSR J2339-0533, which exhibit orbital modulation in the X-ray and gamma-ray bands. We constructed the sources' broadband spectral energy distributions and multiband orbital light curves by supplementing our X-ray measurements with published gamma-ray results, and we modeled the data using intra-bin… ▽ More We investigated the multiband emission from the pulsar binaries XSS J12270-4859, PSR J2039-5617, and PSR J2339-0533, which exhibit orbital modulation in the X-ray and gamma-ray bands. We constructed the sources' broadband spectral energy distributions and multiband orbital light curves by supplementing our X-ray measurements with published gamma-ray results, and we modeled the data using intra-binary shock (IBS) scenarios. While the X-ray data were well explained by synchrotron emission from electrons/positrons in the IBS, the gamma-ray data were difficult to explain with the IBS components alone. Therefore, we explored other scenarios that had been suggested for gamma-ray emission from pulsar binaries: (1) inverse-Compton emission in the upstream unshocked wind zone and (2) synchrotron radiation from electrons/positrons interacting with a kilogauss magnetic field of the companion. Scenario (1) requires that the bulk motion of the wind substantially decelerates to ~1000km/s before reaching the IBS for increased residence time, in which case formation of a strong shock is untenable, inconsistent with the X-ray phenomenology. Scenario (2) can explain the data if we assume the presence of electrons/positrons with a Lorentz factor of ~$10^8$ (~0.1 PeV) that pass through the IBS and tap a substantial portion of the pulsar voltage drop. These findings raise the possibility that the orbitally-modulating gamma-ray signals from pulsar binaries can provide insights into the flow structure and energy conversion within pulsar winds and particle acceleration nearing PeV energies in pulsars. These signals may also yield greater understanding of kilogauss magnetic fields potentially hosted by the low-mass stars in these systems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: accepted for the publication of ApJ (20 pages, 11 figures)

arXiv:2401.16559 [pdf, other]

IEEE BigData 2023 Keystroke Verification Challenge (KVC)

Authors: Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Ivan DeAndres-Tame, Naser Damer, Julian Fierrez, Javier-Ortega Garcia, Nahuel Gonzalez, Andrei Shadrikov, Dmitrii Gordin, Leon Schmitt, Daniel Wimmer, Christoph Grossmann, Joerdis Krieger, Florian Heinz, Ron Krestel, Christoffer Mayer, Simon Haberl, Helena Gschrey, Yosuke Yamagishi, Sanjay Saha, Sanka Rasnayaka, Sandareka Wickramanayake, Terence Sim , et al. (4 additional authors not shown)

Abstract: This paper describes the results of the IEEE BigData 2023 Keystroke Verification Challenge (KVC), that considers the biometric verification performance of Keystroke Dynamics (KD), captured as tweet-long sequences of variable transcript text from over 185,000 subjects. The data are obtained from two of the largest public databases of KD up to date, the Aalto Desktop and Mobile Keystroke Databases,… ▽ More This paper describes the results of the IEEE BigData 2023 Keystroke Verification Challenge (KVC), that considers the biometric verification performance of Keystroke Dynamics (KD), captured as tweet-long sequences of variable transcript text from over 185,000 subjects. The data are obtained from two of the largest public databases of KD up to date, the Aalto Desktop and Mobile Keystroke Databases, guaranteeing a minimum amount of data per subject, age and gender annotations, absence of corrupted data, and avoiding excessively unbalanced subject distributions with respect to the considered demographic attributes. Several neural architectures were proposed by the participants, leading to global Equal Error Rates (EERs) as low as 3.33% and 3.61% achieved by the best team respectively in the desktop and mobile scenario, outperforming the current state of the art biometric verification performance for KD. Hosted on CodaLab, the KVC will be made ongoing to represent a useful tool for the research community to compare different approaches under the same experimental conditions and to deepen the knowledge of the field. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 9 pages, 10 pages, 2 figures. arXiv admin note: text overlap with arXiv:2311.06000

arXiv:2401.15481 [pdf, other]

doi 10.1145/3368089.3417943

BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies

Authors: Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qi** Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong ** Kang, Thong Hoang, David Lo, Eng Lieh Ouh

Abstract: The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the a… ▽ More The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the abundance of testing and debugging tools for Java. Thus, there is a need to push research on tools that can help Python developers. One factor that contributed to the rapid growth of Java testing and debugging tools is the availability of benchmarks. A popular benchmark is the Defects4J benchmark; its initial version contained 357 real bugs from 5 real-world Java programs. Each bug comes with a test suite that can expose the bug. Defects4J has been used by hundreds of testing and debugging studies and has helped to push the frontier of research in these directions. In this project, inspired by Defects4J, we create another benchmark database and tool that contain 493 real bugs from 17 real-world Python programs. We hope our benchmark can help catalyze future work on testing and debugging tools that work on Python programs. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020) 1556-1560

arXiv:2401.15313 [pdf, other]

Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose estimation can be achieved through range-only or bearing-only measurements, provided both robots have non-zero linear velocities. In cases where odometry data from a target robot are not directly transmitted but estimated by the ego robot, both range and bearing measurements are necessary to ensure observability of relative pose estimation. For ROS/Gazebo simulations, we explore four sensing and communication structures. We compare extended Kalman filtering (EKF) and pose graph optimization (PGO) estimation using different robust loss functions (filtering and smoothing with varying batch sizes of sliding windows) in terms of estimation accuracy. In hardware experiments, two Turtlebot3 equipped with UWB modules are used for real-world inter-robot relative pose estimation, applying both EKF and PGO and comparing their performance. △ Less

Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 20 pages, 21 figures

MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;

arXiv:2401.11840 [pdf, other]

Learning to Approximate Adaptive Kernel Convolution on Graphs

Authors: Jaeyoon Sim, Sooyeon Jeon, InJun Choi, Guorong Wu, Won Hwa Kim

Abstract: Various Graph Neural Networks (GNNs) have been successful in analyzing data in non-Euclidean spaces, however, they have limitations such as oversmoothing, i.e., information becomes excessively averaged as the number of hidden layers increases. The issue stems from the intrinsic formulation of conventional graph convolution where the nodal features are aggregated from a direct neighborhood per laye… ▽ More Various Graph Neural Networks (GNNs) have been successful in analyzing data in non-Euclidean spaces, however, they have limitations such as oversmoothing, i.e., information becomes excessively averaged as the number of hidden layers increases. The issue stems from the intrinsic formulation of conventional graph convolution where the nodal features are aggregated from a direct neighborhood per layer across the entire nodes in the graph. As setting different number of hidden layers per node is infeasible, recent works leverage a diffusion kernel to redefine the graph structure and incorporate information from farther nodes. Unfortunately, such approaches suffer from heavy diagonalization of a graph Laplacian or learning a large transform matrix. In this regards, we propose a diffusion learning framework, where the range of feature aggregation is controlled by the scale of a diffusion kernel. For efficient computation, we derive closed-form derivatives of approximations of the graph convolution with respect to the scale, so that node-wise range can be adaptively learned. With a downstream classifier, the entire framework is made trainable in an end-to-end manner. Our model is tested on various standard datasets for node-wise classification for the state-of-the-art performance, and it is also validated on a real-world brain network data for graph classifications to demonstrate its practicality for Alzheimer classification. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 15 pages, Accepted to AAAI 2024

arXiv:2312.11413 [pdf, other]

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

Authors: Xiao Tian, Rachael Hwee Ling Sim, Jue Fan, Bryan Kian Hsiang Low

Abstract: Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by exis… ▽ More Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by existing works: Are the data valuation scores still fair with deletions? Must the scores be expensively recomputed? The answer is no. To avoid recomputations, we propose using our data valuation framework DeRDaVa upfront for valuing each data source's contribution to preserving robust model performance after anticipated data deletions. DeRDaVa can be efficiently approximated and will assign higher values to data that are more useful or less likely to be deleted. We further generalize DeRDaVa to Risk-DeRDaVa to cater to risk-averse/seeking model owners who are concerned with the worst/best-cases model utility. We also empirically demonstrate the practicality of our solutions. △ Less

Submitted 21 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.07399 [pdf, other]

Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Authors: Taeyoon Kwon, Kai Tzu-iunn Ong, Dong** Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Yongsik Sim, Beomseok Sohn, Dongha Lee, **young Yeo

Abstract: Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framew… ▽ More Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framework that rationalizes the diagnostic process via prompt-based learning in a time- and labor-efficient manner, and learns to reason over the prompt-generated rationales. Specifically, we address the clinical reasoning for disease diagnosis, where the LLM generates diagnostic rationales providing its insight on presented patient data and the reasoning path towards the diagnosis, namely Clinical Chain-of-Thought (Clinical CoT). We empirically demonstrate LLMs/LMs' ability of clinical reasoning via extensive experiments and analyses on both rationale generation and disease diagnosis in various settings. We further propose a novel set of criteria for evaluating machine-generated rationales' potential for real-world clinical settings, facilitating and benefiting future research in this area. △ Less

Submitted 10 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.06042 [pdf, other]

State-of-the-art simulations of line-driven accretion disc winds: realistic radiation-hydrodynamics leads to weaker outflows

Authors: Nick Higginbottom, Nicolas Scepi, Christian Knigge, Knox S. Long, James H. Matthews, Stuart A. Sim

Abstract: Disc winds are a common feature in accreting astrophysical systems on all scales. In active galactic nuclei (AGN) and accreting white dwarfs (AWDs), specifically, radiation pressure mediated by spectral lines is a promising mechanism for driving these outflows. Previous hydrodynamical simulations have largely supported this idea, but relied on highly approximate treatments of ionization and radiat… ▽ More Disc winds are a common feature in accreting astrophysical systems on all scales. In active galactic nuclei (AGN) and accreting white dwarfs (AWDs), specifically, radiation pressure mediated by spectral lines is a promising mechanism for driving these outflows. Previous hydrodynamical simulations have largely supported this idea, but relied on highly approximate treatments of ionization and radiative transfer. Given the sensitivity of line driving to the ionization state and radiation field in the outflow, here we present a new method for carrying out 2.5D radiation-hydrodynamic simulations that takes full account of the frequency-dependent radiative transfer through the wind, the corresponding ionization state and the resulting radiative accelerations. Applying our method to AWDs, we find that it is much harder to drive a powerful line-driven outflow when the interaction between matter and radiation is treated self-consistently. This conclusion is robust to changes in the adopted system parameters. The fundamental difficulty is that discs luminous enough to drive such a wind are also hot enough to over-ionize it. As a result, the mass-loss rates in our simulations are much lower than those found in earlier, more approximate calculations. We also show that the ultraviolet spectra produced by our simulations do not match those observed in AWDs. We conclude that, unless the over-ionization problem can be mitigated (e.g. by sub-grid clum** or a softer-than-expected radiation field), line driving may not be a promising mechanism for powering the outflows from AWDs. These conclusions are likely to have significant implications for disc winds in AGN also. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted for publication in MNRAS. 14 pages, 10 figures + 3 figures in Appendix

Showing 1–50 of 740 results for author: Sim