Search | arXiv e-print repository

doi 10.1063/5.0150927

Structure of crystalline water ice formed through neon matrix sublimation under cryogenic and vacuum conditions

Authors: Reo Sato, So Taniguchi, Naoki Numadate, Tetsuya Hama

Abstract: Ice I has three forms depending on the stacking arrangements of its layers: hexagonal ice Ih, cubic ice Ic, and stacking disordered ice Isd. Below ~60 K, amorphous water becomes metastable, and the formation of any form of ice I is often implicitly precluded. Using a newly developed low-temperature reflection high-energy electron diffraction (RHEED) technique, we show that crystalline ice with cub… ▽ More Ice I has three forms depending on the stacking arrangements of its layers: hexagonal ice Ih, cubic ice Ic, and stacking disordered ice Isd. Below ~60 K, amorphous water becomes metastable, and the formation of any form of ice I is often implicitly precluded. Using a newly developed low-temperature reflection high-energy electron diffraction (RHEED) technique, we show that crystalline ice with cubic stacking sequences (i.e., ice Ic) formed through Ne sublimation from a solid H2O/Ne (1:1000 ratio) matrix at 13 K. The extent of staking disorder (disordered cubic and hexagonal stacking sequences) in the ice formed by Ne matrix sublimation is smaller than that in vapor-deposited ice Isd prepared at 143 K and below the limit of detection of low-temperature RHEED. Dependence of the resulting ice structures on the thickness of the H2O/Ne matrix shows that amorphous water first forms in the early stages of Ne sublimation and the cubic stacking sequence subsequently takes place. As the cubic ice Ic formed here at a much lower temperature (13 K) than previously observed (typically above 78 K), Ne matrix sublimation represents a novel route to the formation of cubic ice Ic under low-temperature and low-pressure conditions. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Journal ref: The Journal of Chemical Physics 158 (2023) 211101

arXiv:2407.07141 [pdf, other]

Nuclear Spin Metrology with Nitrogen Vacancy Center in Diamond for Axion Dark Matter Detection

Authors: So Chigusa, Masashi Hazumi, Ernst David Herbschleb, Yuichiro Matsuzaki, Norikazu Mizuochi, Kazunori Nakayama

Abstract: We present a method to directly detect the axion dark matter using nitrogen vacancy centers in diamonds. In particular, we use metrology leveraging the nuclear spin of nitrogen to detect axion-nucleus couplings. This is achieved through protocols designed for dark matter searches, which introduce a novel approach of quantum sensing techniques based on the nitrogen vacancy center. Although the coup… ▽ More We present a method to directly detect the axion dark matter using nitrogen vacancy centers in diamonds. In particular, we use metrology leveraging the nuclear spin of nitrogen to detect axion-nucleus couplings. This is achieved through protocols designed for dark matter searches, which introduce a novel approach of quantum sensing techniques based on the nitrogen vacancy center. Although the coupling strength of the magnetic fields with nuclear spins is three orders of magnitude smaller than that with electron spins for conventional magnetometry, the axion interaction strength with nuclear spins is the same order of magnitude as that with electron spins. Furthermore, we can take advantage of the long coherence time by using the nuclear spins for the axion dark matter detection. We show that our method is sensitive to a broad frequency range $\lesssim 100\,\mathrm{Hz}$ corresponding to the axion mass $m_a \lesssim 4\times 10^{-13}\,\mathrm{eV}$. We present the detection limit of our method for both the axion-neutron and the axion-proton couplings and discuss its significance in comparison with other proposed ideas. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 17 pages, 7 figures

Report number: KEK-QUP-2024-0013, TU-1233

arXiv:2407.05951 [pdf]

Purcell enhancement and spin spectroscopy of silicon vacancy centers in silicon carbide using an ultra-small mode-volume plasmonic cavity

Authors: Jae-Pil So, Jialun Luo, Jaehong Choi, Brendan McCullian, Gregory D. Fuchs

Abstract: Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quan… ▽ More Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quantum networking. Employing a simple fabrication process, we demonstrate plasmonic cavities that support a nanoscale mode volume and exhibit an increase in the spontaneous emission rate with a measured Purcell factor of up to 48. In addition to investigating the optical resonance modes, we demonstrate that an improvement in the optical stability of the spin-preserving resonant optical transitions relative to the radiation-limited value. The results highlight the potential of nanophotonic structures for advancing quantum networking technologies and emphasizes the importance of optimizing emitter-cavity interactions for efficient quantum photonic applications. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 27 pages in manuscript format including supplement

arXiv:2407.05618 [pdf, other]

Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.04171 [pdf, other]

Holography of Transmission Lines: Insights of Continuous MERA and AdS/CFT

Authors: So Katagiri

Abstract: This study examines the holographic representation of the quantum theory of transmission lines, which play a crucial role in quantum computing and quantum information. Utilizing Yurke and Denker's quantum circuit network theory within the framework of continuous MERA (cMERA) in AdS space, we analyze the quantization and interactions of transmission lines. The metric is revealed to be described by… ▽ More This study examines the holographic representation of the quantum theory of transmission lines, which play a crucial role in quantum computing and quantum information. Utilizing Yurke and Denker's quantum circuit network theory within the framework of continuous MERA (cMERA) in AdS space, we analyze the quantization and interactions of transmission lines. The metric is revealed to be described by the inductance of the quantum circuit, which is AdS-space in its 0-limit. These results provide new insights into handling and controlling complex phenomena in quantum circuits, potentially advancing the understanding of quantum computing and quantum communication. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 24 pages, 4 figures

Report number: OUJ-FTC-13

arXiv:2406.19228 [pdf, other]

Tools Fail: Detecting Silent Errors in Faulty Tools

Authors: Jimin Sun, So Yeon Min, Yingshan Chang, Yonatan Bisk

Abstract: Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model's ability to detect "silent" tool errors, a… ▽ More Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model's ability to detect "silent" tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages, 12 figures

arXiv:2406.18898 [pdf, other]

360 in the Wild: Dataset for Depth Prediction and View Synthesis

Authors: Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

Abstract: The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale… ▽ More The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{\circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis. △ Less

Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17792 [pdf, other]

Applications of interpretable deep learning in neuroimaging: a comprehensive review

Authors: Lindsay Munroe, Mariana da Silva, Faezeh Heidari, Irina Grigorescu, Simon Dahan, Emma C. Robinson, Maria Deprez, Po-Wah So

Abstract: Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learn… ▽ More Clinical adoption of deep learning models has been hindered, in part, because the black-box nature of neural networks leads to concerns regarding their trustworthiness and reliability. These concerns are particularly relevant in the field of neuroimaging due to the complex brain phenotypes and inter-subject heterogeneity often encountered. The challenge can be addressed by interpretable deep learning (iDL) methods that enable the visualisation and interpretation of the inner workings of deep learning models. This study systematically reviewed the literature on neuroimaging applications of iDL methods and critically analysed how iDL explanation properties were evaluated. Seventy-five studies were included, and ten categories of iDL methods were identified. We also reviewed five properties of iDL explanations that were analysed in the included studies: biological validity, robustness, continuity, selectivity, and downstream task performance. We found that the most popular iDL approaches used in the literature may be sub-optimal for neuroimaging data, and we discussed possible future directions for the field. △ Less

Submitted 30 May, 2024; originally announced June 2024.

arXiv:2406.17310 [pdf, other]

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target voice to generate acoustic tokens from semantic tokens, enriching speech reconstruction. The Interpreting stage employs a transducer for its robustness in aligning text to speech. In contrast, the Speaking stage utilizes a Conformer-based architecture integrated with a Grouped Masked Language Model (G-MLM) to boost computational efficiency. Our experiments verify that this innovative structure surpasses the conventional models in the zero-shot scenario in terms of speech quality and speaker similarity. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech2024

arXiv:2406.16206 [pdf, other]

Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics

Authors: Banghee So, Emiliano A. Valdez

Abstract: In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims.… ▽ More In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims. Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods in order to help leverage an iterative process to enhance predictive accuracy. Despite the inherent slowdown in learning algorithms due to this iteration, several efficient implementation techniques that also help precise tuning of parameter like XGBoost, LightGBM, and CatBoost have emerged. Nonetheless, we chose to utilize CatBoost, a efficient boosting approach that effectively handles categorical and other special types of data. The core contribution of our paper is the assembly of separate modeling for zero claims and the application of tree-based boosting ensemble methods within a CatBoost framework, assuming that the inflated probability of zero is a function of the mean parameter. The efficacy of our enhanced Tweedie model is demonstrated through the application of an insurance telematics dataset, which presents the additional complexity of compositional feature variables. Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions suitable for insurance claim analytics. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15582 [pdf, other]

Graphical copula GARCH modeling with dynamic conditional dependence

Authors: Lupe Shun Hin Chan, Amanda Man Ying Chu, Mike Ka Pui So

Abstract: Modeling returns on large portfolios is a challenging problem as the number of parameters in the covariance matrix grows as the square of the size of the portfolio. Traditional correlation models, for example, the dynamic conditional correlation (DCC)-GARCH model, often ignore the nonlinear dependencies in the tail of the return distribution. In this paper, we aim to develop a framework to model t… ▽ More Modeling returns on large portfolios is a challenging problem as the number of parameters in the covariance matrix grows as the square of the size of the portfolio. Traditional correlation models, for example, the dynamic conditional correlation (DCC)-GARCH model, often ignore the nonlinear dependencies in the tail of the return distribution. In this paper, we aim to develop a framework to model the nonlinear dependencies dynamically, namely the graphical copula GARCH (GC-GARCH) model. Motivated from the capital asset pricing model, to allow modeling of large portfolios, the number of parameters can be greatly reduced by introducing conditional independence among stocks given some risk factors. The joint distribution of the risk factors is factorized using a directed acyclic graph (DAG) with pair-copula construction (PCC) to enhance the modeling of the tails of the return distribution while offering the flexibility of having complex dependent structures. The DAG induces topological orders to the risk factors, which can be regarded as a list of directions of the flow of information. The conditional distributions among stock returns are also modeled using PCC. Dynamic conditional dependence structures are incorporated to allow the parameters in the copulas to be time-varying. Three-stage estimation is used to estimate parameters in the marginal distributions, the risk factor copulas, and the stock copulas. The simulation study shows that the proposed estimation procedure can estimate the parameters and the underlying DAG structure accurately. In the investment experiment of the empirical study, we demonstrate that the GC-GARCH model produces more precise conditional value-at-risk prediction and considerably higher cumulative portfolio returns than the DCC-GARCH model. △ Less

Submitted 21 June, 2024; originally announced June 2024.

MSC Class: 62F15 ACM Class: G.3

arXiv:2406.09716 [pdf, ps, other]

Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performance in ML/STAT algorithms within encrypted domains. This technique, independent of underlying HE mechanisms and complementing existing optimizations, notably reduces costly HE multiplications, offering near constant time complexity relative to data dimension. Aimed at accessibility, this method is tailored for data scientists and developers with limited cryptography background, facilitating advanced data analysis in secure environments. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Submitted as a preprint

arXiv:2406.09698 [pdf, other]

Projected background and sensitivity of AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09388 [pdf, other]

Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

Authors: Youngtaek Oh, Pyunghwan Ahn, **hyung Kim, Gwangmo Song, Soonyoung Lee, In So Kweon, Junmo Kim

Abstract: Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignment. This paper explores the intricate relationship between compositionality and recognition -- two pivotal aspects of VLM capability. We conduct a comprehensive… ▽ More Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignment. This paper explores the intricate relationship between compositionality and recognition -- two pivotal aspects of VLM capability. We conduct a comprehensive evaluation of existing VLMs, covering both pre-training approaches aimed at recognition and the fine-tuning methods designed to improve compositionality. Our evaluation employs 12 benchmarks for compositionality, along with 21 zero-shot classification and two retrieval benchmarks for recognition. In our analysis from 274 CLIP model checkpoints, we reveal patterns and trade-offs that emerge between compositional understanding and recognition accuracy. Ultimately, this necessitates strategic efforts towards develo** models that improve both capabilities, as well as the meticulous formulation of benchmarks for compositionality. We open our evaluation framework at https://github.com/ytaek-oh/vl_compo. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted to CVPRW 2024 on 'What is Next in Multimodal Foundation Models?'. Code: https://github.com/ytaek-oh/vl_compo

arXiv:2406.08465 [pdf, other]

Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data

Authors: Jiaojiao Zhang, Jiang Hu, Anthony Man-Cho So, Mikael Johansson

Abstract: Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated l… ▽ More Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08140 [pdf]

Functional voxel hierarchy and afferent capacity revealed mental state transition on dynamic correlation resting-state fMRI

Authors: Dong Soo Lee, Hyun Joo Kim, Youngmin Huh, Yeon Koo Kang, Wonseok Whi, Hyekyoung Lee, Hye** Kang

Abstract: Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on s… ▽ More Voxel hierarchy on dynamic brain graphs is produced by k core percolation on functional dynamic amplitude correlation of resting-state fMRI. Directed graphs and their afferent/efferent capacities are produced by Markov modeling of the universal cover of undirected graphs simultaneously with the calculation of volume entropy. Positive and unsigned negative brain graphs were analyzed separately on sliding-window representation to underpin the visualization and quantitation of mental dynamic states with their transitions. Voxel hierarchy animation maps of positive graphs revealed abrupt changes in coreness k and kmaxcore, which we called mental state transitions. Afferent voxel capacities of the positive graphs also revealed transient modules composed of dominating voxels/independent components and their exchanges representing mental state transitions. Animation and quantification plots of voxel hierarchy and afferent capacity corroborated each other in underpinning mental state transitions and afferent module exchange on the positive directed functional connectivity graphs. We propose the use of spatiotemporal trajectories of voxels on positive dynamic graphs to construct hierarchical structures by k core percolation and quantified in- and out-flows of information of voxels by volume entropy/directed graphs to subserve diverse resting mental state transitions on resting-state fMRI graphs in normal human individuals. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06117 [pdf, other]

Exclusion of the Cosmological Triangle in Reactor-Based Search for Axion-Like Particles

Authors: Byung Ju Park, Jae ** Choi, Eunju Jeon, **yu Kim, Kyungwon Kim, Sung Hyun Kim, Sun Kee Kim, Yeongduk Kim, Young Ju Ko, Byoung-Cheol Koh, Chang Hyon Ha, Seo Hyun Lee, In Soo Lee, Hyunseok Lee, Hyun Su Lee, Jaison Lee, Yoomin Oh, Doo** Kim

Abstract: We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the… ▽ More We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the energy spectra of data collected during reactor-on (1596 kg$\cdot$days exposure) and reactor-off (1467 kg$\cdot$days exposure) periods. No signal consistent with ALP interaction was identified, allowing us to set exclusion limits at the 95% confidence level. Our limits cover previously unexplored regions for both photon couplings (${g_{aγ}}$) and electron couplings (${g_{ae}}$) for axion masses around 1 MeV/c$^2$. Notably, the NEON data excludes the unconstrained region identified by laboratory-based searches for photon couplings within the "cosmological triangle" for the first time. The observed 95\% confidence level limits reach as low as ${g_{aγ}}$ of 4.33$\times$ 10$^{-8}$ GeV$^{-1}$ and ${g_{ae}}$ of 1.10$\times$ 10$^{-9}$ for axion masses of 1.7 MeV/c$^2$ and 1.0 MeV/c$^2$, respectively. △ Less

Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05965 [pdf, other]

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung ** Choi, Nam Soo Kim

Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancing the quality of generated voices with large amount of unlabeled data. At inference, our novel dual guiding mechanism gives text and pitch guidance on the reverse diffusion step by estimating the score of masked input. Experimental results show that the model trained in a semi-supervised manner outperforms other baselines trained only on the labeled data in terms of pronunciation, pitch accuracy and overall quality. Furthermore, we demonstrate that by adding Text-to-Speech (TTS) data in training, the model can synthesize the singing voices of TTS speakers even without their singing voices. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.05431 [pdf]

MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTableGPT features key strategies of table data representation and table splitting for better GPT comprehension and filtering hallucinated information through follow-up questions. When applied to a vast volume of water splitting catalysis literature, MaTableGPT achieved an extraction accuracy (total F1 score) of up to 96.8%. Through comprehensive evaluations of the GPT usage cost, labeling cost, and extraction accuracy for the learning methods of zero-shot, few-shot and fine-tuning, we present a Pareto-front map** where the few-shot learning method was found to be the most balanced solution owing to both its high extraction accuracy (total F1 score>95%) and low cost (GPT usage cost of 5.97 US dollars and labeling cost of 10 I/O paired examples). The statistical analyses conducted on the database generated by MaTableGPT revealed valuable insights into the distribution of the overpotential and elemental utilization across the reported catalysts in the water splitting literature. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04175 [pdf, other]

Confabulation: The Surprising Value of Large Language Model Hallucinations

Authors: Peiqi Sui, Eamon Duede, Sophie Wu, Richard Jean So

Abstract: This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulation… ▽ More This paper presents a systematic defense of large language model (LLM) hallucinations or 'confabulations' as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication. In other words, it has potential value. Specifically, we analyze popular hallucination benchmarks and reveal that hallucinated outputs display increased levels of narrativity and semantic coherence relative to veridical outputs. This finding reveals a tension in our usually dismissive understandings of confabulation. It suggests, counter-intuitively, that the tendency for LLMs to confabulate may be intimately associated with a positive capacity for coherent narrative-text generation. △ Less

Submitted 25 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Forthcoming at ACL2024 main conference. 1 figure

arXiv:2406.02943 [pdf]

The Task-oriented Queries Benchmark (ToQB)

Authors: Keun Soo Yim

Abstract: Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on ta… ▽ More Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public. We further discuss new domains that can be added to ToQB by community contributors and its practical applications. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Data available on GitHub, https://github.com/google/task-oriented-queries

arXiv:2406.02541 [pdf, other]

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (x1.9, x4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos. △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Project page at https://video-3dgs-project.github.io/

arXiv:2406.02223 [pdf, other]

doi 10.1109/ICASSP49357.2023.10097143

SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

Authors: Sanglee Park, Seung-won Hwang, Jungmin So

Abstract: Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked… ▽ More Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: accepted at ICASSP 2023

arXiv:2406.02211 [pdf]

Novel pre-emptive control solutions for V2X connected electric vehicles

Authors: Kai Man So, Gaetano Tavolo, Davide Tavernini, Marco Grosso, Sergio Pozzato, Pietro Perlo, Aldo Sorniotti

Abstract: V2X technologies will become widespread in the next generation of passenger cars, and enable the development of novel vehicle control functionalities. Although a wide literature describes the energy efficiency benefits of V2X connectivity, e.g., in terms of vehicle speed profiling and platooning, there is a gap in the analysis of the potential of vehicle connectivity in enhancing the performance o… ▽ More V2X technologies will become widespread in the next generation of passenger cars, and enable the development of novel vehicle control functionalities. Although a wide literature describes the energy efficiency benefits of V2X connectivity, e.g., in terms of vehicle speed profiling and platooning, there is a gap in the analysis of the potential of vehicle connectivity in enhancing the performance of active safety control systems. To highlight the impact vehicle connectivity could have on future active safety systems, this paper presents two novel control functions for connected vehicles, benefitting from the precise knowledge of the expected path and tire-road friction conditions ahead, as well as the current position of the ego vehicle. These functions, developed within recent and ongoing European projects, are: i) pre-emptive traction control; and ii) pre-emptive braking control. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures, Published in the Transport Research Arena (TRA) Conference, Lisbon, Portugal, 2022

arXiv:2406.02206 [pdf]

Nonlinear Model Predictive Control for Preview-Based Traction Control

Authors: Gaetano Tavolo, Kai Man So, Davide Tavernini, Pietro Perlo, Aldo Sorniotti

Abstract: This study presents a nonlinear model predictive control (NMPC) formulation for preview-based traction control, which uses the information on the expected tire-road friction coefficient ahead to enhance the wheel slip control performance, in the context of connected vehicles with V2X features. Proof-of-concept experiments on an electric vehicle prototype highlight the real-time capability of the c… ▽ More This study presents a nonlinear model predictive control (NMPC) formulation for preview-based traction control, which uses the information on the expected tire-road friction coefficient ahead to enhance the wheel slip control performance, in the context of connected vehicles with V2X features. Proof-of-concept experiments on an electric vehicle prototype highlight the real-time capability of the controller, and the wheel slip control performance improvement brought by the tire-road friction coefficient preview. Finally, an experimentally validated simulation model is used in sensitivity analyses, to evaluate the performance benefit of the preview-based controller for different dynamic characteristics (e.g., time constant and pure time delays) of the electric powertrains. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 6 pages, 7 figures, Published in the 15th International Symposium on Advanced Vehicle Control (AVEC'22), Kanagawa, Japan, 2022

arXiv:2406.01801 [pdf, other]

Fearless Stochasticity in Expectation Propagation

Authors: Jonathan So, Richard E. Turner

Abstract: Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in d… ▽ More Expectation propagation (EP) is a family of algorithms for performing approximate inference in probabilistic models. The updates of EP involve the evaluation of moments -- expectations of certain functions -- which can be estimated from Monte Carlo (MC) samples. However, the updates are not robust to MC noise when performed naively, and various prior works have attempted to address this issue in different ways. In this work, we provide a novel perspective on the moment-matching updates of EP; namely, that they perform natural-gradient-based optimisation of a variational objective. We use this insight to motivate two new EP variants, with updates that are particularly well-suited to MC estimation; they remain stable and are most sample-efficient when estimated with just a single sample. These new variants combine the benefits of their predecessors and address key weaknesses. In particular, they are easier to tune, offer an improved speed-accuracy trade-off, and do not rely on the use of debiasing estimators. We demonstrate their efficacy on a variety of probabilistic inference tasks. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18064 [pdf]

Automated Real-World Sustainability Data Generation from Images of Buildings

Authors: Peter J Bentley, Soo Ling Lim, Rajat Mathur, Sid Narang

Abstract: When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data… ▽ More When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data method with a ground truth comprising real building data for 47 apartments and achieve accuracy better than a human performing the same task. We also demonstrate that the method can generate tailored recommendations to the owner on how best to improve their properties and discuss methods to scale the approach. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 6 pages

MSC Class: 68T07; 94A08

arXiv:2405.16655 [pdf]

Predicting Likely-Vulnerable Code Changes: Machine Learning-based Vulnerability Protections for Android Open Source Project

Authors: Keun Soo Yim

Abstract: This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a c… ▽ More This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a classifier trained to identify code changes with a high likelihood of vulnerabilities. The online classifier leverages various types of input features to analyze the review patterns, track the software engineering process, and mine specific text patterns within given code changes. The classifier and its features are meticulously chosen and optimized using data from the submitted code changes and reported vulnerabilities in Android Open Source Project (AOSP). The evaluation results demonstrate that our Vulnerability Prevention (VP) framework identifies approximately 80% of the vulnerability-inducing code changes in the dataset with a precision ratio of around 98% and a false positive rate of around 1.7%. We discuss the implications of deploying the VP framework in multi-project settings and future directions for Android security research. This paper explores and validates our approach to code change-granularity vulnerability prediction, offering a preventive technique for software security by preemptively detecting vulnerable code changes before submission. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: This is a preprint of an article that has been submitted to a journal for publication

arXiv:2405.16088 [pdf, ps, other]

Estimating the normal-inverse-Wishart distribution

Authors: Jonathan So

Abstract: The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performi… ▽ More The normal-inverse-Wishart (NIW) distribution is commonly used as a prior distribution for the mean and covariance parameters of a multivariate normal distribution. The family of NIW distributions is also a minimal exponential family. In this short note we describe a convergent procedure for converting from mean parameters to natural parameters in the NIW family, or -- equivalently -- for performing maximum likelihood estimation of the natural parameters given observed sufficient statistics. This is needed, for example, when using a NIW base family in expectation propagation. △ Less

Submitted 3 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15978 [pdf, ps, other]

Exploring Age-of-Information Weighting in Federated Learning under Data Heterogeneity

Authors: Kaidi Wang, Zhiguo Ding, Daniel K. C. So, Zhi Ding

Abstract: This paper investigates federated learning in a wireless communication system, where random device selection is employed with non-independent and identically distributed (non-IID) data. The analysis indicates that while training deep learning networks using federated stochastic gradient descent (FedSGD) on non-IID datasets, device selection can generate gradient errors that accumulate, leading to… ▽ More This paper investigates federated learning in a wireless communication system, where random device selection is employed with non-independent and identically distributed (non-IID) data. The analysis indicates that while training deep learning networks using federated stochastic gradient descent (FedSGD) on non-IID datasets, device selection can generate gradient errors that accumulate, leading to potential weight divergence. To mitigate training divergence, we design an age-weighted FedSGD to scale local gradients according to the previous state of devices. To further improve learning performance by increasing device participation under the maximum time consumption constraint, we formulate an energy consumption minimization problem by including resource allocation and sub-channel assignment. By transforming the resource allocation problem into convex and utilizing KKT conditions, we derived the optimal resource allocation solution. Moreover, this paper develops a matching based algorithm to generate the enhanced sub-channel assignment. Simulation results indicate that i) age-weighted FedSGD is able to outperform conventional FedSGD in terms of convergence rate and achievable accuracy, and ii) the proposed resource allocation and sub-channel assignment strategies can significantly reduce energy consumption and improve learning performance by increasing the number of selected devices. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.12934 [pdf]

Address-Specific Sustainable Accommodation Choice Through Real-World Data Integration

Authors: Peter J. Bentley, Rajat Mathur, Soo Ling Lim, Sid Narang

Abstract: Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice… ▽ More Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice of sustainable accommodation. We develop a data-driven address-specific metric called EcoGrade, which integrates government approved datasets and uses interpolation where data is sparse. We validate the metric on 10,000 UK addresses in 10 cities, showing the match of our interpolations to reality is statistically significant. We show how the metric has been embedded into a decision support system for a global accommodation marketplace and tested by real users over several months with positive user feedback. In the EU, forty percent of final energy consumption is from buildings. We need to encourage all building owners to make their accommodation more efficient. The rental sector is one area where change can occur rapidly, as rented accommodation is renovated frequently. We anticipate our decision support system using EcoGrade will encourage this positive change. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 8 pages

MSC Class: 68U35 ACM Class: E.m; H.m

arXiv:2405.11490 [pdf, other]

doi 10.1088/1538-3873/ad3b39

Solar image quality assessment: a proof of concept using Variance of Laplacian method and its application to optical atmospheric condition monitoring

Authors: Chu Wing So, Edwin Lok Hei Yuen, Edgar Heung Fat Leung, Jason Chun Shing Pun

Abstract: Here we present a proof of concept for the application of the Variance of Laplacian (VL) method in quantifying the sharpness of optical solar images. We conducted a comprehensive study using over 65,000 individual solar images acquired on more than 160 days. Each image underwent processing using a VL image processing algorithm, which assigns a 'score' based on the sharpness of the solar disk's edg… ▽ More Here we present a proof of concept for the application of the Variance of Laplacian (VL) method in quantifying the sharpness of optical solar images. We conducted a comprehensive study using over 65,000 individual solar images acquired on more than 160 days. Each image underwent processing using a VL image processing algorithm, which assigns a 'score' based on the sharpness of the solar disk's edges. We studied the scores obtained from images acquired at different conditions. Our findings demonstrate that the sharpness of the images exhibits daily trends that are closely linked to the altitude of the Sun at the observation site. We observed a significant degradation in image quality only below a certain altitude threshold. Furthermore, we compared airmass formulae from the literature with our sharpness observations and concluded that the degradation could be modeled as an Image Sharpness Function (ISF), which exhibits similarities to airmass variations. In addition to assessing image quality, our method has the potential to evaluate the optical atmospheric conditions during daytime observations. Moreover, this technique can be easily and cost-effectively applied to archival or real-time images of other celestial bodies, such as the Moon, bright planets and defocused stars. Given that ISF is unique to each location and sensitive to sky conditions, the development of an ISF is not only beneficial for routine observation preparation but also essential for long-term site monitoring. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 15 pages, 12 figures, 1 table

Journal ref: PASP, 136, 2024, 044504

arXiv:2405.10368 [pdf, other]

Trapped-Ion Quantum Simulation of Electron Transfer Models with Tunable Dissipation

Authors: Visal So, Midhuna Duraisamy Suganthi, Abhishek Menon, Mingjian Zhu, Roman Zhuravel, Han Pu, Peter G. Wolynes, José N. Onuchic, Guido Pagano

Abstract: Electron transfer is at the heart of many fundamental physical, chemical, and biochemical processes essential for life. Exact simulation of reactions in these systems is often hindered by the large number of degrees of freedom and by the essential role of quantum effects. In this work, we experimentally simulate a paradigmatic model of molecular electron transfer using a multi-species trapped-ion… ▽ More Electron transfer is at the heart of many fundamental physical, chemical, and biochemical processes essential for life. Exact simulation of reactions in these systems is often hindered by the large number of degrees of freedom and by the essential role of quantum effects. In this work, we experimentally simulate a paradigmatic model of molecular electron transfer using a multi-species trapped-ion crystal, where the donor-acceptor gap, the electronic and vibronic couplings, and the bath relaxation dynamics can all be controlled independently. We employ the ground-state qubit of one ion to simulate the electronic degree of freedom and the optical qubit of another ion to perform reservoir engineering on a collective mode encoding a reaction coordinate. We observe the real-time dynamics of the spin excitation, measuring the transfer rate in several regimes of adiabaticity and relaxation dynamics. The setup allows access to the electron transfer dynamics in the non-perturbative regime, where there is no clear hierarchy among the energy scales in the model, as has been suggested to be optimal for many rate phenomena, including photosynthesis. Our results provide a testing ground for increasingly rich models of molecular excitation transfer processes that are relevant for molecular electronics and light-harvesting systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08424 [pdf, other]

Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More

Authors: Fanchen Bu, Hyeonsoo Jo, Soo Yong Lee, Sungsoo Ahn, Kijung Shin

Abstract: Combinatorial optimization (CO) is naturally discrete, making machine learning based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each co… ▽ More Combinatorial optimization (CO) is naturally discrete, making machine learning based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each component confronts unique challenges. First, deriving objectives under various conditions (e.g., cardinality constraints and minimum) is nontrivial. Second, the derandomization process is underexplored, and the existing derandomization methods are either random sampling or naive rounding. In this work, we aim to tackle prevalent (i.e., commonly involved) conditions in unsupervised CO. First, we concretize the targets for objective construction and derandomization with theoretical justification. Then, for various conditions commonly involved in different CO problems, we derive nontrivial objectives and derandomization to meet the targets. Finally, we apply the derivations to various CO problems. Via extensive experiments on synthetic and real-world graphs, we validate the correctness of our derivations and show our empirical superiority w.r.t. both optimization quality and speed. △ Less

Submitted 23 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.07543 [pdf]

Accelerating the Evolution of Personalized Automated Lane Change through Lesson Learning

Authors: Jia Hu, Mingyue Lei, Duo Li, Zhenning Li, Jaehyun, So, Haoran Wang

Abstract: Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: lea… ▽ More Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: learning from driver's takeover interventions. By leveraging online takeover data, the driving zone is generated to ensure perceived safety using Gaussian discriminant analysis. Real-time corrections to trajectory planning rewards are enacted through apprenticeship learning. Guided by the objective of optimizing rewards within the constraints of the driving zone, this approach employs model predictive control for trajectory planning. This lesson learning framework is highlighted for its faster evolution capability, adeptness at experience accumulating, assurance of perceived safety, and computational efficiency. Simulation results demonstrate that the proposed system consistently achieves a successful customization without further takeover interventions. Accumulated experience yields a 24% enhancement in evolution efficiency. The average number of learning iterations is only 13.8. The average computation time is 0.08 seconds. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.05581 [pdf, other]

doi 10.1145/3630106.3662681

One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations

Authors: Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, Juho Kim

Abstract: As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a… ▽ More As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted to FAccT 2024

arXiv:2405.04921 [pdf, other]

The simplest model of a scalarized black hole in the Einstein-Klein-Gordon theory

Authors: Xiao Yan Chew, Yun Soo Myung

Abstract: We investigate scalarized black holes in the Einstein-minimally coupled scalar theory with a negative potential $V(φ)=-α^2φ^6$. The tachyonic instability is absent from analyzing the linearized scalar equation, which could not allow for spontaneous scalarization. However, we obtain the black hole solutions with scalar hair by solving three full equations because this scalar potential violates the… ▽ More We investigate scalarized black holes in the Einstein-minimally coupled scalar theory with a negative potential $V(φ)=-α^2φ^6$. The tachyonic instability is absent from analyzing the linearized scalar equation, which could not allow for spontaneous scalarization. However, we obtain the black hole solutions with scalar hair by solving three full equations because this scalar potential violates the weak energy condition. This shows clearly that scalarized black holes can be obtained without introducing a non-minimal scalar coupling term. We perform the stability analysis for scalarized black holes by adopting radial perturbations, implying that all scalarized black holes belonging to a single branch are unstable. △ Less

Submitted 19 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 18 pages with 8 figures

arXiv:2405.04752 [pdf, other]

HILCodec: High Fidelity and Lightweight Neural Audio Codec

Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, \textit{HILCodec}, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04497 [pdf, other]

Unveiling Disparities in Web Task Handling Between Human and Web Agent

Authors: Kihoon Son, **hyeon Kwon, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim

Abstract: With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizabili… ▽ More With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizability of these agents. This study investigates the disparities between human and web agents' performance in web tasks (e.g., information search) by concentrating on planning, action, and reflection aspects during task execution. We conducted a web task study with a think-aloud protocol, revealing distinct cognitive actions and operations on websites employed by humans. Comparative examination of existing agent structures and human behavior with thought processes highlighted differences in knowledge updating and ambiguity handling when performing the task. Humans demonstrated a propensity for exploring and modifying plans based on additional information and investigating reasons for failure. These findings offer insights into designing planning, reflection, and information discovery modules for web agents and designing the capturing method for implicit human knowledge in a web task. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00932 [pdf, ps, other]

Homotopy rigidity for quasitoric manifolds over a product of $d$-simplices

Authors: Xin Fu, Tseleung So, Jongbaek Song, Stephen Theriault

Abstract: For a fixed integer $d\geq 1$, we show that two quasitoric manifolds over a product of $d$-simplices are homotopy equivalent after appropriate localization, provided that their integral cohomology rings are isomorphic. For a fixed integer $d\geq 1$, we show that two quasitoric manifolds over a product of $d$-simplices are homotopy equivalent after appropriate localization, provided that their integral cohomology rings are isomorphic. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 14 pages

MSC Class: 55P15; 57S12

arXiv:2405.00344 [pdf, other]

Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation

Authors: Zhichuan Wang, Kinhei Lee, Qiao Deng, Tiffany Y. So, Wan Hang Chiu, Yeung Yu Hui, Bing**g Zhou, Edward S. Hui

Abstract: A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the auth… ▽ More A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors' knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up summary. In this study, we therefore propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of summary generation, we introduce two mechanisms to bestow expert insight to our model, namely expert soft guidance and masked entity modeling loss. The former mechanism employs a pretrained expert disease classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model is competitive with or exceeds the state-of-the-art. △ Less

Submitted 6 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: accepted by 22nd International Conference on Artificial Intelligence in medicine (AIME2024)

ACM Class: I.2.1

arXiv:2404.19521 [pdf, other]

Nonlinear scalarization of Schwarzschild black holes in Einstein-scalar-Gauss-Bonnet gravity

Authors: Chao-Ming Zhang, Zhen-Hao Yang, Meng-Yun Lai, Yun Soo Myung, De-Cheng Zou

Abstract: In this paper, we propose a fully nonlinear mechanism for obtaining scalarized black holes in Einstein-scalar-Gauss-Bonnet (EsGB) gravity which is beyond the spontaneous scalarization. Introducing three coupling functions $f(\varphi)$ satisfying $f''(0) = 0$, we find that Schwarzschild black hole is linearly stable against scalar perturbation, whereas it is unstable against nonlinear scalar pertur… ▽ More In this paper, we propose a fully nonlinear mechanism for obtaining scalarized black holes in Einstein-scalar-Gauss-Bonnet (EsGB) gravity which is beyond the spontaneous scalarization. Introducing three coupling functions $f(\varphi)$ satisfying $f''(0) = 0$, we find that Schwarzschild black hole is linearly stable against scalar perturbation, whereas it is unstable against nonlinear scalar perturbation if the coupling function includes term higher than $\varphi^6$. For a specific choice of coupling function $f(\varphi)=α(\varphi^4-β\varphi^6)$, we obtain new black holes with scalar hair in the EsGB gravity. In this case, the coupling parameter $α$ plays a major role in making different nonlinear scalarized black holes, while the other parameter $β$ plays a supplementary role. Furthermore, we study thermodynamic aspects of these scalarized black holes and prove the first-law of thermodynamics. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures

arXiv:2404.14279 [pdf, other]

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Authors: Baoheng Zhang, Yizhao Gao, **gyuan Li, Hayden Kwok-Hay So

Abstract: Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between s… ▽ More Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://github.com/CASR-HKU/ESDA/tree/eye_tracking. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024 workshop, AIS: Vision, Graphics, and AI for Streaming

arXiv:2404.13759 [pdf]

Map** Phonon Polaritons with Visible Light

Authors: Kiernan E. Arledge, Chase T. Ellis, Nazli Rasouli Sarabi, Vincent R. Whiteside, Chul Soo Kim, Mi** Kim, Daniel C. Ratchford, Michael A Meeker, Binbin Weng, Joseph G. Tischler

Abstract: Phonon polaritons (PhPs) are hybrid photon-phonon waves which enable strong light-matter interactions and subdiffractional confinement, potentially empowering applications in sensing, nonlinear optics and nanoscale energy manipulation. In this work, we use confocal Raman microscopy to investigate the coupling between bulk phonon modes and localized surface phonon polariton (SPhP) modes in indium p… ▽ More Phonon polaritons (PhPs) are hybrid photon-phonon waves which enable strong light-matter interactions and subdiffractional confinement, potentially empowering applications in sensing, nonlinear optics and nanoscale energy manipulation. In this work, we use confocal Raman microscopy to investigate the coupling between bulk phonon modes and localized surface phonon polariton (SPhP) modes in indium phosphide (InP) nanopillars and 4H-silicon carbide (4H-SiC) gratings. The Raman intensity within the nanostructures is described in terms of the SPhP eigenmodes and used to reconstruct the field intensity, providing a method to map SPhP eigenmodes using visible and near-IR light. Our results indicate that, contrary to expectation, all Raman-active bulk phonon modes of InP and 4H-SiC couple to the localized SPhP modes. Further, we confirm that polarizability selection rules form the predominant coupling mechanism between phonons and SPhP modes, with electron-phonon coupling playing a role for certain phonon modes (A1(LO) and E1(TO) in 4H-SiC). These observations provide a method for extending Raman studies of PhP modes to achieve full 3D reconstruction of the PhP eigenmodes and visualize light-matter interactions within nanostructures, thus advancing Raman scattering as a technique for understanding PhP modes. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.12968 [pdf, other]

Scalable Data Assimilation with Message Passing

Authors: Oscar Key, So Takao, Daniel Giles, Marc Peter Deisenroth

Abstract: Data assimilation is a core component of numerical weather prediction systems. The large quantity of data processed during assimilation requires the computation to be distributed across increasingly many compute nodes, yet existing approaches suffer from synchronisation overhead in this setting. In this paper, we exploit the formulation of data assimilation as a Bayesian inference problem and appl… ▽ More Data assimilation is a core component of numerical weather prediction systems. The large quantity of data processed during assimilation requires the computation to be distributed across increasingly many compute nodes, yet existing approaches suffer from synchronisation overhead in this setting. In this paper, we exploit the formulation of data assimilation as a Bayesian inference problem and apply a message-passing algorithm to solve the spatial inference problem. Since message passing is inherently based on local computations, this approach lends itself to parallel and distributed computation. In combination with a GPU-accelerated implementation, we can scale the algorithm to very large grid sizes while retaining good accuracy and compute and memory requirements. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12490 [pdf, other]

Tuning exciton emission via ferroelectric polarization at a heterogeneous interface between a monolayer transition metal dichalcogenide and a perovskite oxide membrane

Authors: Jaehong Choi, Kevin J. Crust, Lizhong Li, Kihong Lee, Jialun Luo, Jae-Pil So, Kenji Watanabe, Takashi Taniguchi, Harold Y. Hwang, Kin Fai Mak, Jie Shan, Gregory D. Fuchs

Abstract: We demonstrate the integration of a thin BaTiO$_3$ (BTO) membrane with monolayer MoSe$_2$ in a dual gate device that enables in-situ manipulation of the BTO ferroelectric polarization with a voltage pulse. While two-dimensional (2D) transition metal dichalcogenides (TMDs) offer remarkable adaptability, their hybrid integration with other families of functional materials beyond the realm of 2D mate… ▽ More We demonstrate the integration of a thin BaTiO$_3$ (BTO) membrane with monolayer MoSe$_2$ in a dual gate device that enables in-situ manipulation of the BTO ferroelectric polarization with a voltage pulse. While two-dimensional (2D) transition metal dichalcogenides (TMDs) offer remarkable adaptability, their hybrid integration with other families of functional materials beyond the realm of 2D materials has been challenging. Released functional oxide membranes offer a solution for 2D/3D integration via stacking. 2D TMD excitons can serve as a local probe of the ferroelectric polarization in BTO at a heterogeneous interface. Using photoluminescence (PL) of MoSe$_2$ excitons to optically readout the do** level, we find that the relative population of charge carriers in MoSe$_2$ depends sensitively on the ferroelectric polarization. This finding points to a promising avenue for future-generations versatile sensing devices with high sensitivity, fast read-out, and diverse applicability for advanced signal processing. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 14 pages with supplementary information in manuscript format

arXiv:2404.11770 [pdf, other]

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, **ze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, **gyuan Li , et al. (14 additional authors not shown)

Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Qinyu Chen is the corresponding author

arXiv:2404.11483 [pdf, other]

AgentKit: Flow Engineering with Graphs, not Coding

Authors: Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. Th… ▽ More We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.08716 [pdf, other]

Securing Monolithic Kernels using Compartmentalization

Authors: Soo Yee Lim, Sidhartha Agrawal, Xueyuan Han, David Eyers, Dan O'Keeffe, Thomas Pasquier

Abstract: Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities… ▽ More Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 24 pages, 7 figures

arXiv:2404.08073 [pdf, other]

Spurious Stationarity and Hardness Results for Mirror Descent

Authors: He Chen, Jia** Li, Anthony Man-Cho So

Abstract: Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence… ▽ More Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Showing 1–50 of 2,095 results for author: So