Search | arXiv e-print repository

Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Authors: Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael L. Littman, Stephen H. Bach

Abstract: Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First,… ▽ More Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First, generated PDDL code is typically evaluated using planning validators that check whether the problem can be solved with a planner. This method is insufficient because a language model might generate valid PDDL code that does not align with the natural language description of the task. Second, existing evaluation sets often have natural language descriptions of the planning task that closely resemble the ground truth PDDL, reducing the challenge of the task. To bridge this gap, we introduce \benchmarkName, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. We begin by creating a PDDL equivalence algorithm that rigorously evaluates the correctness of PDDL code generated by language models by flexibly comparing it against a ground truth PDDL. Then, we present a dataset of $132,037$ text-to-PDDL pairs across 13 different tasks, with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, $87.6\%$ of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, $82.2\%$ are valid, solve-able problems, but only $35.1\%$ are semantically correct, highlighting the need for a more rigorous benchmark for this problem. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03320 [pdf, other]

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, **gwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

arXiv:2407.03316 [pdf, other]

An Upper Limit on the Photoproduction Cross Section of the Spin-Exotic $π_1(1600)$

Authors: F. Afzal, C. S. Akondi, M. Albrecht, M. Amaryan, S. Arrigo, V. Arroyave, A. Asaturyan, A. Austregesilo, Z. Baldwin, F. Barbosa, J. Barlow, E. Barriga, R. Barsotti, D. Barton, V. Baturin, V. V. Berdnikov, T. Black, W. Boeglin, M. Boer, W. J. Briscoe, T. Britton, S. Cao, E. Chudakov, G. Chung, P. L. Cole , et al. (124 additional authors not shown)

Abstract: The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction c… ▽ More The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction cross sections of the $π^{0}_{1}(1600)$ and $π^{-}_{1}(1600)$. We combine these limits with lattice calculations of decay widths and find that photoproduction of $η'π$ is the most sensitive two-body system to search for the $π_1(1600)$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures plus supplemental materials

arXiv:2407.03314 [pdf, other]

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, **yu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimum elements and presents them in a graph structure. Element-wise style enables easy understanding, and structural composition liberates difficult locating. Careful prompt design births the BACON captions with the help of public-available VLMs and segmentation methods. In this way, we gather a dataset with 100K annotated images, which endow VLMs with remarkable capabilities, such as accurately generating BACON, transforming prompts into BACON format, envisioning scenarios in the style of BACONr, and dynamically modifying elements within BACON through interactive dialogue and more. Wide representative experiments, including detection, VQA, and image generation tasks, tell BACON as a lifeline to achieve previous out-of-reach tasks or excel in their current cutting-edge solutions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03307 [pdf, other]

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03303 [pdf, other]

doi 10.28924/APJM/11-63

A Graded Mesh Refinement for 2D Poisson's Equation on Non Convex Polygonal Domains

Authors: Charuka D. Wickramasinghe, Priyanka Ahire

Abstract: This work delves into solving the two dimensional Poisson problem through the Finite Element Method which is relevant in various physical scenarios including heat conduction, electrostatics, gravity potential, and fluid dynamics. However, finding exact solutions to these problems can be complicated and challenging due to complexities in the domains such as re-entrant corners, cracks, and discontin… ▽ More This work delves into solving the two dimensional Poisson problem through the Finite Element Method which is relevant in various physical scenarios including heat conduction, electrostatics, gravity potential, and fluid dynamics. However, finding exact solutions to these problems can be complicated and challenging due to complexities in the domains such as re-entrant corners, cracks, and discontinuities of the solution along the boundaries, and due to the singular source function. Our focus in this work is to solve the Poisson equation in the presence of re entrant corners at the vertices of domain where some of the interior angles are greater than 180 degrees. When the domain features a re entrant corner, the numerical solution can display singular behavior near the corners. To address this, we propose a graded mesh algorithm that helps us to tackle the solution near singular points. We derive H1 and L2 error estimate results, and we use MATLAB to present numerical results that validate our theoretical findings. By exploring these concepts, we hope to provide new insights into the Poisson problem and inspire future research into the application of numerical methods to solve complex physical scenarios △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 18 pages,10 figures

Journal ref: Asia Pac. J. Math. 2024 11:63

arXiv:2407.03293 [pdf, other]

Microscopic theory for electron-phonon coupling in twisted bilayer graphene

Authors: Ziyan Zhu, Thomas P. Devereaux

Abstract: The origin of superconductivity in twisted bilayer graphene -- whether phonon-driven or electron-driven -- remains unresolved. The answer to this question is hindered by the absence of a quantitative and efficient model for electron-phonon coupling (EPC). In this work, we develop a first-principles-based microscopic theory to calculate EPC in twisted bilayer graphene for arbitrary twist angles wit… ▽ More The origin of superconductivity in twisted bilayer graphene -- whether phonon-driven or electron-driven -- remains unresolved. The answer to this question is hindered by the absence of a quantitative and efficient model for electron-phonon coupling (EPC). In this work, we develop a first-principles-based microscopic theory to calculate EPC in twisted bilayer graphene for arbitrary twist angles without needing a periodic moiré supercell. We adopt a momentum-space model for the electronic and phonon structures and quantify the EPC using generalized Eliashberg-McMillan theory for superconductivity without an adiabatic approximation. Using this framework, we find that the EPC is significantly enhanced near the magic angle, and drops abruptly for larger twist angles. We show that the EPC strength of a phonon corresponds to the modification of the moiré potential. In particular, we identify several $Γ$-phonon branches that contribute most significantly to the EPC, including one layer breathing mode, three layer shearing modes, and one chiral mode. These phonons should be experimentally detectable via Raman spectroscopy. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03289 [pdf, other]

Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Authors: Sajani Vithana, Viveck R. Cadambe, Flavio P. Calmon, Haewon Jeong

Abstract: Differentially private distributed mean estimation (DP-DME) is a fundamental building block in privacy-preserving federated learning, where a central server estimates the mean of $d$-dimensional vectors held by $n$ users while ensuring $(ε,δ)$-DP. Local differential privacy (LDP) and distributed DP with secure aggregation (SecAgg) are the most common notions of DP used in DP-DME settings with an u… ▽ More Differentially private distributed mean estimation (DP-DME) is a fundamental building block in privacy-preserving federated learning, where a central server estimates the mean of $d$-dimensional vectors held by $n$ users while ensuring $(ε,δ)$-DP. Local differential privacy (LDP) and distributed DP with secure aggregation (SecAgg) are the most common notions of DP used in DP-DME settings with an untrusted server. LDP provides strong resilience to dropouts, colluding users, and malicious server attacks, but suffers from poor utility. In contrast, SecAgg-based DP-DME achieves an $O(n)$ utility gain over LDP in DME, but requires increased communication and computation overheads and complex multi-round protocols to handle dropouts and malicious attacks. In this work, we propose CorDP-DME, a novel DP-DME mechanism that spans the gap between DME with LDP and distributed DP, offering a favorable balance between utility and resilience to dropout and collusion. CorDP-DME is based on correlated Gaussian noise, ensuring DP without the perfect conditional privacy guarantees of SecAgg-based approaches. We provide an information-theoretic analysis of CorDP-DME, and derive theoretical guarantees for utility under any given privacy parameters and dropout/colluding user thresholds. Our results demonstrate that (anti) correlated Gaussian DP mechanisms can significantly improve utility in mean estimation tasks compared to LDP -- even in adversarial settings -- while maintaining better resilience to dropouts and attacks compared to distributed DP. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03282 [pdf, other]

LLM Internal States Reveal Hallucination Risk Faced With a Query

Authors: Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Ye** Bang, Bryan Wilie, Pascale Fung

Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadl… ▽ More The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadly both in terms of training data sources and across 15 diverse Natural Language Generation (NLG) tasks, spanning over 700 datasets. Our empirical analysis reveals two key insights: (1) LLM internal states indicate whether they have seen the query in training data or not; and (2) LLM internal states show they are likely to hallucinate or not regarding the query. Our study explores particular neurons, activation layers, and tokens that play a crucial role in the LLM perception of uncertainty and hallucination risk. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32\% at run time. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03281 [pdf, other]

Direct evidence of hybrid nature of EUV waves and the reflection of the fast-mode wave

Authors: Ramesh Chandra, P. F. Chen, Pooja Devi

Abstract: In current study, we perform the analysis of an extreme ultraviolet (EUV) wave on 2022 March 31. The event originated from the from NOAA active region (AR) 12975 (location: N13W52) in the Atmospheric imaging Assembly (AIA) onboard Solar Dynamics Observatory (SDO) satellite and exactly the west limb in Solar Terrestrial Relations Observatory-Ahead (STEREO-A) observations. The EUV wave was associate… ▽ More In current study, we perform the analysis of an extreme ultraviolet (EUV) wave on 2022 March 31. The event originated from the from NOAA active region (AR) 12975 (location: N13W52) in the Atmospheric imaging Assembly (AIA) onboard Solar Dynamics Observatory (SDO) satellite and exactly the west limb in Solar Terrestrial Relations Observatory-Ahead (STEREO-A) observations. The EUV wave was associated with a GOES medium class i.e. M9.6 eruptive flare. The event was also well observed by MLSO and COR1 coronagraph. For the first time, we found here clear simultaneous observations of two components of EUV wave in AIA as well as in STEREO-A images, which was predicted in EUV wave hybrid model. These components are fast-mode wave and non-wave counterparts. The speed of fast-mode EUV wave in AIA 193 A is ~658$\pm$4 km/s, while the non-wave component propagates with a speed of ~157$\pm$3 km/s. The computed speed in STEREO-A 195 A for the fast-mode wave and non-wave components are ~590$\pm$3 km/s and ~150$\pm$2 km/s, respectively. The EUV wave interaction with AR shows the reflection of it above the solar limb. The speed of the reflected and transmitted wave components are 140 and 180 km/s, which is slower than the incident wave. With the precise alignments, we found the fast-mode EUV wave is just ahead of the coronal mass ejection (CME) and the non-wave component is cospatial with the core of the accompanied CME. In addition to these, the event also shows the stationary fronts and the reflection from the AR located towards the south of the EUV wave origin site. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6 figures, 16 pages

arXiv:2407.03278 [pdf, ps, other]

A survey of the Hornich-Hlawka inequality

Authors: Dan-Ştefan Marinescu, Constantin P. Niculescu

Abstract: In this survey, we review the many faces of the Hornich-Hlawka inequality. Several open problems that seem of utmost interest are mentioned. In this survey, we review the many faces of the Hornich-Hlawka inequality. Several open problems that seem of utmost interest are mentioned. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 17 pages

MSC Class: 46B04; 46B20; 26D15

arXiv:2407.03271 [pdf, ps, other]

Timing of millisecond pulsars in NGC\,6752 -- III. On the presence of non-luminous matter in the cluster's core

Authors: A. Corongiu, A. Ridolfi, F. Abbate, M. Bailes, A. Possenti, M. Geyer, R. N. Manchester, M. Kramer, P. C. C. Freire, M. Burgay, S. Buchner, F. Camilo

Abstract: Millisecond pulsars are subject to accelerations in globular clusters that manifest themselves in both the first and second spin period time derivatives, and can be used to explore the mass distribution of the potentials they inhabit. Here we report on over 20 years of pulsar timing observations of five millisecond radio pulsars in the core of the core-collapse globular cluster NGC\,6752 with the… ▽ More Millisecond pulsars are subject to accelerations in globular clusters that manifest themselves in both the first and second spin period time derivatives, and can be used to explore the mass distribution of the potentials they inhabit. Here we report on over 20 years of pulsar timing observations of five millisecond radio pulsars in the core of the core-collapse globular cluster NGC\,6752 with the Parkes (Murriyang) and MeerKAT radio telescopes, that have allowed us to measure the proper motions, positions and first and second time derivatives of the pulsars. The pulsar timing parameters indicate that all the pulsars in the core experience accelerations and jerks that can be explained only if an amount of non-luminous mass of at least $2.56\times10^3M_\odot$ is resent in the core of NGC\,6752. On the other hand, our studies highly disfavour the presence of an intermediate mass black hole at the center of the cluster, with a mass equal to or greater than $\sim3000M_\odot$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted for publication in ApJ

arXiv:2407.03268 [pdf, other]

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

Authors: Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, Massimo Leone

Abstract: Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore th… ▽ More Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03267 [pdf]

Insulator-to-Metal Transition and Isotropic Gigantic Magnetoresistance in Layered Magnetic Semiconductors

Authors: Gokul Acharya, Bimal Neupane, Chia-Hsiu Hsu, Xian P. Yang, David Graf, Eun Sang Choi, Krishna Pandey, Md Rafique Un Nabi, Santosh Karki Chhetri, Rabindra Basnet, Sumaya Rahman, Jian Wang, Zhengxin Hu, Bo Da, Hugh Churchill, Guoqing Chang, M. Zahid Hasan, Yuanxi Wang, ** Hu

Abstract: Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology ap… ▽ More Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology applications such as omnidirectional sensing, is rarely seen, especially for pristine crystals. Here we propose a strategy to realize extremely strong modulation of electron conduction by magnetic field which is independent of field direction. GdPS, a layered antiferromagnetic semiconductor with resistivity anisotropies, supports a field-driven insulator-to-metal transition with a paradoxically isotropic gigantic negative magnetoresistance insensitive to magnetic field orientations. This isotropic magnetoresistance originates from the combined effects of a near-zero spin-orbit coupling of Gd3+-based half-filling f-electron system and the strong on-site f-d exchange coupling in Gd atoms. Our results not only provide a novel material system with extraordinary magnetotransport that offers a missing block for antiferromagnet-based ultrafast and efficient spintronic devices, but also demonstrate the key ingredients for designing magnetic materials with desired transport properties for advanced functionalities. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages, 18 figures

arXiv:2407.03262 [pdf, ps, other]

Nearly Linear Sparsification of $\ell_p$ Subspace Approximation

Authors: David P. Woodruff, Taisuke Yasuda

Abstract: The $\ell_p$ subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane problem ($p = 1$), principal component analysis ($p = 2$), and the center hyperplane problem ($p = \infty$). A popular approach to cope with the NP-hardness of this problem is to compute a strong coreset, which is a small weighted subset of the input points which simultan… ▽ More The $\ell_p$ subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane problem ($p = 1$), principal component analysis ($p = 2$), and the center hyperplane problem ($p = \infty$). A popular approach to cope with the NP-hardness of this problem is to compute a strong coreset, which is a small weighted subset of the input points which simultaneously approximates the cost of every $k$-dimensional subspace, typically to $(1+\varepsilon)$ relative error for a small constant $\varepsilon$. We obtain the first algorithm for constructing a strong coreset for $\ell_p$ subspace approximation with a nearly optimal dependence on the rank parameter $k$, obtaining a nearly linear bound of $\tilde O(k)\mathrm{poly}(\varepsilon^{-1})$ for $p<2$ and $\tilde O(k^{p/2})\mathrm{poly}(\varepsilon^{-1})$ for $p>2$. Prior constructions either achieved a similar size bound but produced a coreset with a modification of the original points [SW18, FKW21], or produced a coreset of the original points but lost $\mathrm{poly}(k)$ factors in the coreset size [HV20, WY23]. Our techniques also lead to the first nearly optimal online strong coresets for $\ell_p$ subspace approximation with similar bounds as the offline setting, resolving a problem of [WY23]. All prior approaches lose $\mathrm{poly}(k)$ factors in this setting, even when allowed to modify the original points. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03260 [pdf]

Improvement of the perovskite photodiodes performance via advanced interface engineering with polymer dielectric

Authors: A. P. Morozov, L. O. Luchnikov, S. Yu. Yurchuk, A. R. Ishteev, P. A. Gostishchev, S. I. Didenko, N. S. Saratovsky, S. S. Kozlov, D. S. Muratov, Yu. N. Luponosov, D. S. Saranin

Abstract: Halide perovskite-based photodiodes are promising for efficient detection across a broad spectral range. Perovskite absorber thin-films have a microcrystalline morphology, characterized by a high density of surface states and defects at inter-grain interfaces. In this work, we used dielectric-ferroelectric poly(vinylidene-fluoride-trifluoroethylene-P(VDF-TrFE) to modify the bulk interfaces and ele… ▽ More Halide perovskite-based photodiodes are promising for efficient detection across a broad spectral range. Perovskite absorber thin-films have a microcrystalline morphology, characterized by a high density of surface states and defects at inter-grain interfaces. In this work, we used dielectric-ferroelectric poly(vinylidene-fluoride-trifluoroethylene-P(VDF-TrFE) to modify the bulk interfaces and electron transport junction in p-i-n perovskite photodiodes. Our complex work demonstrates that interface engineering with P(VDF-TrFE) induces significant Fermi level pinning, reducing from 4.85 eV for intrinsic perovskite to 4.28 eV for the configuration with dielectric interlayers. The integration of P(VDF-TrFE) into the perovskite film did not affect the morphology and crystal structure, but significantly changed the charge transport and device performance. IV curve analysis and 2-diode model calculations showed enhanced shunt properties, a decreased non-ideality factor, and reduced saturation dark current. We have shown that the complex introduction of P(VDF-TrFE) into the absorbers bulk and on its surface is essential to reduce the impact of the trap** processes. For P(VDF-TrFE) containing devices, we increased the specific detectivity from 10^11 to 10^12 Jones, expanded the linear dynamic range up to 100 dB, and reduced the equivalent noise power to 10^-13 W*Hz^-0.5. Reducing non-radiative recombination contributions significantly enhanced device performance, improving rise/fall times from 6.3/10.9 us to 4.6/6.5 us. The cut-off frequency (3dB) increased from 64.8 kHz to 74.8 kHz following the introduction of the dielectric. These results provide new insights into the use of organic dielectrics and an improved understanding of trap-states and ion defect compensation for detectors based on perovskite heterostructures. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03249 [pdf, other]

Quantum coarsening and collective dynamics on a programmable quantum simulator

Authors: Tom Manovitz, Sophie H. Li, Sepehr Ebadi, Rhine Samajdar, Alexandra A. Geim, Simon J. Evered, Dolev Bluvstein, Hengyun Zhou, Nazli Uğur Köylüoğlu, Johannes Feldmeier, Pavel E. Dolgirev, Nishad Maskara, Marcin Kalinowski, Subir Sachdev, David A. Huse, Markus Greiner, Vladan Vuletić, Mikhail D. Lukin

Abstract: Understanding the collective quantum dynamics of nonequilibrium many-body systems is an outstanding challenge in quantum science. In particular, dynamics driven by quantum fluctuations are important for the formation of exotic quantum phases of matter \cite{altman2023quantum}, fundamental high-energy processes \cite{bauer2023highenergy}, quantum metrology \cite{degen2017sensing, li2023scrambling},… ▽ More Understanding the collective quantum dynamics of nonequilibrium many-body systems is an outstanding challenge in quantum science. In particular, dynamics driven by quantum fluctuations are important for the formation of exotic quantum phases of matter \cite{altman2023quantum}, fundamental high-energy processes \cite{bauer2023highenergy}, quantum metrology \cite{degen2017sensing, li2023scrambling}, and quantum algorithms \cite{ebadi2022quantum}. Here, we use a programmable quantum simulator based on Rydberg atom arrays to experimentally study collective dynamics across a (2+1)D Ising quantum phase transition. After crossing the quantum critical point, we observe a gradual growth of correlations through coarsening of antiferromagnetically ordered domains~\cite{Samajdar2024}. By deterministically preparing and following the evolution of ordered domains, we show that the coarsening is driven by the curvature of domain boundaries, and find that the dynamics accelerate with proximity to the quantum critical point. We quantitatively explore these phenomena and further observe long-lived oscillations of the order parameter, corresponding to an amplitude (Higgs) mode \cite{pekker2015amplitude}. These observations offer a unique viewpoint into emergent collective dynamics in strongly correlated quantum systems and nonequilibrium quantum processes. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 25 pages, 14 figures

arXiv:2407.03227 [pdf, other]

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

Authors: Zhili Shen, Pavlos Vougiouklis, Chenxin Diao, Kaustubh Vyas, Yuanyi Ji, Jeff Z. Pan

Abstract: We focus on Text-to-SQL semantic parsing from the perspective of Large Language Models. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose an approach that dynamically retrieves input database information and uses abstract syntax trees to select few-shot examples for in-context learning. Furthermore, we… ▽ More We focus on Text-to-SQL semantic parsing from the perspective of Large Language Models. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose an approach that dynamically retrieves input database information and uses abstract syntax trees to select few-shot examples for in-context learning. Furthermore, we investigate the extent to which an in-parallel semantic parser can be leveraged for generating $\textit{approximated}$ versions of the expected SQL queries, to support our retrieval. We take this approach to the extreme--we adapt a model consisting of less than $500$M parameters, to act as an extremely efficient approximator, enhancing it with the ability to process schemata in a parallelised manner. We apply our approach to monolingual and cross-lingual benchmarks for semantic parsing, showing improvements over state-of-the-art baselines. Comprehensive experiments highlight the contribution of modules involved in this retrieval-augmented generation setting, revealing interesting directions for future work. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03225 [pdf, other]

The large-scale structure around the Fornax-Eridanus Complex

Authors: Maria Angela Raj, Petra Awad, Reynier F. Peletier, Rory Smith, Ulrike Kuchner, Rien van de Weygaert, Noam I. Libeskind, Marco Canducci, Peter Tino, Kerstin Bunte

Abstract: Our objectives are to map the filamentary network around the Fornax-Eridanus Complex and probe the influence of the local environment on galaxy morphology. We employ the novel machine-learning tool, 1-DREAM (1-Dimensional, Recovery, Extraction, and Analysis of Manifolds) to detect and model filaments around the Fornax cluster. We then use the morphology-density relation of galaxies to examine the… ▽ More Our objectives are to map the filamentary network around the Fornax-Eridanus Complex and probe the influence of the local environment on galaxy morphology. We employ the novel machine-learning tool, 1-DREAM (1-Dimensional, Recovery, Extraction, and Analysis of Manifolds) to detect and model filaments around the Fornax cluster. We then use the morphology-density relation of galaxies to examine the variation in the galaxies' morphology with respect to their distance from the central axis of the detected filaments. We detect 27 filaments that vary in length and galaxy-number density around the Fornax-Eridanus Complex. These filaments showcase a variety of environments; some filaments encompass groups/clusters, while others are only inhabited by galaxies in pristine filamentary environments. We also reveal a well-known structure -- the Fornax Wall, that passes through the Dorado group, Fornax cluster, and Eridanus supergroup. Regarding the morphology of galaxies, we find that early-type galaxies (ETGs) populate high-density filaments and high-density regions of the Fornax Wall. Furthermore, the fraction of ETGs decreases as the distance to the filament spine increases. Of the total galaxy population in filaments, ~7% are ETGs and ~24% are late-type galaxies (LTGs) located in pristine environments of filaments, while ~27% are ETGs and ~42% are LTGs in groups/clusters within filaments. This study reveals the Cosmic Web around the Fornax Cluster and asserts that filamentary environments are heterogeneous in nature. When investigating the role of the environment on galaxy morphology, it is essential to consider both, the local number-density and a galaxy's proximity to the filament spine. Within this framework, we ascribe the observed morphological segregation in the Fornax Wall to pre-processing of galaxies within groups embedded in it. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted for publication in A&A. 21 pages with 15 figures

arXiv:2407.03220 [pdf, other]

Impact of planar defects on the reversal time of single magnetic domain nanoparticles

Authors: Hugo Bocquet, Armin Kleibert, Peter M. Derlet

Abstract: Recent experimental investigations of individual magnetic nanoparticles reveal a diverse range of magnetic relaxation times which cannot be explained by considering their size, shape, and surface anisotropy, suggesting other factors associated with the internal microstructure of the particles are at play. In this letter, we apply Langer's theory of thermal activation to single magnetic domain fcc… ▽ More Recent experimental investigations of individual magnetic nanoparticles reveal a diverse range of magnetic relaxation times which cannot be explained by considering their size, shape, and surface anisotropy, suggesting other factors associated with the internal microstructure of the particles are at play. In this letter, we apply Langer's theory of thermal activation to single magnetic domain fcc Co nanoparticles, whose experimental microstructures are characterized by planar defects, and derive an analytic expression for the relaxation time. The obtained Arrhenius exponential and its prefactor, which is often assumed to be a constant, are here found to both depend exponentially on system size and the number of defects. Together they provide a quantitative prediction of the experimental findings, and more generally highlight the importance of structural defects when considering magnetic stability. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 5 figures

arXiv:2407.03216 [pdf, other]

Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers

Authors: Sanket Gandhi, Atul, Samanyu Mahajan, Vishal Sharma, Rushil Gupta, Arnab Kumar Mondal, Parag Singla

Abstract: Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to le… ▽ More Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to learn such disentangled representations for the case of static images \citep{nsb}, to the best of our knowledge, ours is the first work which tries to do this in a general setting for video, without making any specific assumptions about the kind of attributes that an object might have. The key building block of our architecture is the notion of a {\em block}, where several blocks together constitute an object. Each block is represented as a linear combination of a given number of learnable concept vectors, which is iteratively refined during the learning process. The blocks in our model are discovered in an unsupervised manner, by attending over object masks, in a style similar to discovery of slots \citep{slot_attention}, for learning a dense object-centric representation. We employ self-attention via transformers over the discovered blocks to predict the next state resulting in discovery of visual dynamics. We perform a series of experiments on several benchmark 2-D, and 3-D datasets demonstrating that our architecture (1) can discover semantically meaningful blocks (2) help improve accuracy of dynamics prediction compared to SOTA object-centric models (3) perform significantly better in OOD setting where the specific attribute combinations are not seen earlier during training. Our experiments highlight the importance discovery of disentangled representation for visual dynamics prediction. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03215 [pdf, other]

Streaming Large-Scale Electron Microscopy Data to a Supercomputing Facility

Authors: Samuel S. Welborn, Chris Harris, Stephanie M. Ribet, Georgios Varnavides, Colin Ophus, Bjoern Enders, Peter Ercius

Abstract: Data management is a critical component of modern experimental workflows. As data generation rates increase, transferring data from acquisition servers to processing servers via conventional file-based methods is becoming increasingly impractical. The 4D Camera at the National Center for Electron Microscopy (NCEM) generates data at a nominal rate of 480 Gbit/s (87,000 frames/s) producing a 700 GB… ▽ More Data management is a critical component of modern experimental workflows. As data generation rates increase, transferring data from acquisition servers to processing servers via conventional file-based methods is becoming increasingly impractical. The 4D Camera at the National Center for Electron Microscopy (NCEM) generates data at a nominal rate of 480 Gbit/s (87,000 frames/s) producing a 700 GB dataset in fifteen seconds. To address the challenges associated with storing and processing such quantities of data, we developed a streaming workflow that utilizes a high-speed network to connect the 4D Camera's data acquisition (DAQ) system to supercomputing nodes at the National Energy Research Scientific Computing Center (NERSC), bypassing intermediate file storage entirely. In this work, we demonstrate the effectiveness of our streaming pipeline in a production setting through an hour-long experiment that generated over 10 TB of raw data, yielding high-quality datasets suitable for advanced analyses. Additionally, we compare the efficacy of this streaming workflow against the conventional file-transfer workflow by conducting a post-mortem analysis on historical data from experiments performed by real users. Our findings show that the streaming workflow significantly improves data turnaround time, enables real-time decision-making, and minimizes the potential for human error by eliminating manual user interactions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03207 [pdf, other]

Loss rate of ultracold neutrons due to the absorption by trap walls in large material traps

Authors: Pavel D. Grigoriev, Vladislav D. Kochev, Victor A. Tsyplukhin, Alexander M. Dyugaev, Ilya Ya. Polishchuk

Abstract: The most accurate neutron lifetime measurements now use the material or magnetic traps of ultracold neutrons (UCN). The precision of these experiments is determined by the accuracy of estimating the neutron loss rate. In material UCN traps the main source of neutron losses is the absorption by trap walls. In this paper we analyze the standard methods and their approximations for the calculation of… ▽ More The most accurate neutron lifetime measurements now use the material or magnetic traps of ultracold neutrons (UCN). The precision of these experiments is determined by the accuracy of estimating the neutron loss rate. In material UCN traps the main source of neutron losses is the absorption by trap walls. In this paper we analyze the standard methods and their approximations for the calculation of UCN absorption rate by the walls of material traps. We emphasize the approximations used both in the standard analytical formulas and in the numerical Monte-Carlo simulations. For the two simplest trap geometries, rectangular and cylindrical, we obtain precise analytical formulas for this absorption rate and compare them with the standard estimation methods. The difference turned out to be considerable and especially important for the size extrapolation procedure, always used in the standard estimates of UCN losses. Our results may partially resolve the puzzling four-second discrepancy between the magnetic and material-trap measurements of neutron lifetime. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 12 pages, 6 figures

arXiv:2407.03199 [pdf, other]

BOWIE-ALIGN: How formation and migration histories of giant planets impact atmospheric compositions

Authors: Anna B. T. Penzlin, Richard A. Booth, James Kirk, James E. Owen, Eva-Maria Ahrer, Duncan A. Christie, Alastair B. Claringbold, Emma Esparza-Borges, M. López-Morales, N. J. Mayne, Mason McCormack, Annabella Meech, Vatsal Panwar, Diana Powell, Denis E. Sergeev, Jake Taylor, Peter J. Wheatley, Maria Zamyatina

Abstract: Hot Jupiters present a unique opportunity for measuring how planet formation history shapes present-day atmospheric composition. However, due to the myriad pathways influencing composition, a well-constructed sample of planets is needed to determine whether formation history can be accurately traced back from atmospheric composition. To this end, the BOWIE-ALIGN survey will compare the composition… ▽ More Hot Jupiters present a unique opportunity for measuring how planet formation history shapes present-day atmospheric composition. However, due to the myriad pathways influencing composition, a well-constructed sample of planets is needed to determine whether formation history can be accurately traced back from atmospheric composition. To this end, the BOWIE-ALIGN survey will compare the compositions of 8 hot Jupiters around F stars, 4 with orbits aligned with the stellar rotation axis and 4 misaligned. Using the alignment as an indicator for planets that underwent disc migration or high-eccentricity migration, one can determine whether migration history produces notable differences in composition between the two samples of planets. This paper describes the planet formation model that motivates our observing programme. Our model traces the accretion of chemical components from the gas and dust in the disc over a broad parameter space to create a full, unbiased model sample from which we can estimate the range of final atmospheric compositions. For high metallicity atmospheres (O/H > 10 times solar), the C/O ratios of aligned and misaligned planets diverge, with aligned planets having lower C/O (< 0.25) due to the accretion of oxygen-rich silicates from the inner disc. However, silicates may rain out instead of releasing their oxygen into the atmosphere. This would significantly increase the C/O of aligned planets (C/O > 0.6), inverting the trend between the aligned and misaligned planets. Nevertheless, by comparing statistically significant samples of aligned and misaligned planets, we expect atmospheric composition to constrain how planets form. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 11pages 10 figures, (appendix: 6 page, 4 figures), submitted to mnras

arXiv:2407.03198 [pdf, other]

BOWIE-ALIGN: A JWST comparative survey of aligned vs misaligned hot Jupiters to test the dependence of atmospheric composition on migration history

Authors: James Kirk, Eva-Maria Ahrer, Anna B. T. Penzlin, James E. Owen, Richard A. Booth, Lili Alderson, Duncan A. Christie, Alastair B. Claringbold, Emma Esparza-Borges, Chloe E. Fisher, Mercedes López-Morales, N. J. Mayne, Mason McCormack, Annabella Meech, Vatsal Panwar, Diana Powell, Jake Taylor, Denis E. Sergeev, Daniel Valentine, Hannah R. Wakeford, Peter J. Wheatley, Maria Zamyatina

Abstract: A primary objective of exoplanet atmosphere characterisation is to learn about planet formation and evolution, however, this is challenged by degeneracies. To determine whether differences in atmospheric composition can be reliably traced to differences in evolution, we are undertaking a new survey with JWST to compare the compositions of a sample of hot Jupiters that orbit F stars above the Kraft… ▽ More A primary objective of exoplanet atmosphere characterisation is to learn about planet formation and evolution, however, this is challenged by degeneracies. To determine whether differences in atmospheric composition can be reliably traced to differences in evolution, we are undertaking a new survey with JWST to compare the compositions of a sample of hot Jupiters that orbit F stars above the Kraft break with different orbital alignments. Under the assumption that aligned planets migrate through the inner disc, while misaligned planets migrate after disc dispersal, the act of migrating through the inner disc should lead to a measurable difference in the C/O between aligned and misaligned planets. We expect the amplitude and sign of this difference to depend on the amount of planetesimal accretion and whether silicates accreted from the inner disc release their oxygen. Here, we identify all known exoplanets that are suitable for testing this hypothesis, describe our JWST survey, and use noise simulations and atmospheric retrievals to estimate our survey's sensitivity. With the selected sample of four aligned and four misaligned hot Jupiters, we will be sensitive to the predicted differences in C/O between aligned and misaligned hot Jupiters for a wide range of model scenarios. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 13 pages, 8 figures, submitted to RASTI

arXiv:2407.03192 [pdf, other]

CiteAssist: A System for Automated Preprint Citation and BibTeX Generation

Authors: Lars Benedikt Kaesberg, Terry Ruas, Jan Philip Wahle, Bela Gipp

Abstract: We present CiteAssist, a system to automate the generation of BibTeX entries for preprints, streamlining the process of bibliographic annotation. Our system extracts metadata, such as author names, titles, publication dates, and keywords, to create standardized annotations within the document. CiteAssist automatically attaches the BibTeX citation to the end of a PDF and links it on the first page… ▽ More We present CiteAssist, a system to automate the generation of BibTeX entries for preprints, streamlining the process of bibliographic annotation. Our system extracts metadata, such as author names, titles, publication dates, and keywords, to create standardized annotations within the document. CiteAssist automatically attaches the BibTeX citation to the end of a PDF and links it on the first page of the document so other researchers gain immediate access to the correct citation of the article. This method promotes platform flexibility by ensuring that annotations remain accessible regardless of the repository used to publish or access the preprint. The annotations remain available even if the preprint is viewed externally to CiteAssist. Additionally, the system adds relevant related papers based on extracted keywords to the preprint, providing researchers with additional publications besides those in related work for further reading. Researchers can enhance their preprints organization and reference management workflows through a free and publicly available web interface. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Published at SDProc @ ACL 2024

arXiv:2407.03191 [pdf, other]

Controlling Plasmonic Catalysis via Strong Coupling with Electromagnetic Resonators

Authors: Jakub Fojt, Paul Erhart, Christian Schäfer

Abstract: Plasmonic excitations decay within femtoseconds, leaving non-thermal (often referred to as "hot") charge carriers behind that can be injected into molecular structures to trigger chemical reactions that are otherwise out of reach -- a process known as plasmonic catalysis. In this Letter, we demonstrate that strong coupling between resonator structures and plasmonic nanoparticles can be used to con… ▽ More Plasmonic excitations decay within femtoseconds, leaving non-thermal (often referred to as "hot") charge carriers behind that can be injected into molecular structures to trigger chemical reactions that are otherwise out of reach -- a process known as plasmonic catalysis. In this Letter, we demonstrate that strong coupling between resonator structures and plasmonic nanoparticles can be used to control the spectral overlap between the plasmonic excitation energy and the charge injection energy into nearby molecules. Our atomistic description couples real-time density-functional theory self-consistently to Maxwell's equations via the radiation-reaction potential. Control over the resonator provides then an additional knob for non-intrusively enhancing plasmonic catalysis and dynamically reacting to deterioration of the catalyst -- a new facet of modern catalysis. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03187 [pdf]

Holistic view of the road transportation system based on real-time data sharing mechanism

Authors: Li Tao, Dong Xiang, Hao Junfeng, Yin **, Xu Xiaoxue, Lai Maokai, Li Yuan, Peng Ting

Abstract: Traditional manual driving and single-vehicle-based intelligent driving have limitations in real-time and accurate acquisition of the current driving status and intentions of surrounding vehicles, leading to vehicles typically maintaining appropriate safe distances from each other. Yet, accidents still frequently occur, especially in merging areas; meanwhile, it is difficult to comprehensively obt… ▽ More Traditional manual driving and single-vehicle-based intelligent driving have limitations in real-time and accurate acquisition of the current driving status and intentions of surrounding vehicles, leading to vehicles typically maintaining appropriate safe distances from each other. Yet, accidents still frequently occur, especially in merging areas; meanwhile, it is difficult to comprehensively obtain the conditions of road infrastructure. These limitations not only restrict the further improvement of road capacity but also result in irreparable losses of life and property. To overcome this bottleneck, this paper constructs a space-time global view of the road traffic system based on a real-time sharing mechanism, enabling both road users and managers to timely access the driving intentions of nearby vehicles and the real-time status of road infrastructure. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03183 [pdf, other]

A Formal Model for Artificial Intelligence Applications in Automation Systems

Authors: Marvin Schieseck, Philip Topalis, Lasse Reinpold, Felix Gehlhoff, Alexander Fay

Abstract: The integration of Artificial Intelligence (AI) into automation systems has the potential to enhance efficiency and to address currently unsolved existing technical challenges. However, the industry-wide adoption of AI is hindered by the lack of standardized documentation for the complex compositions of automation systems, AI software, production hardware, and their interdependencies. This paper p… ▽ More The integration of Artificial Intelligence (AI) into automation systems has the potential to enhance efficiency and to address currently unsolved existing technical challenges. However, the industry-wide adoption of AI is hindered by the lack of standardized documentation for the complex compositions of automation systems, AI software, production hardware, and their interdependencies. This paper proposes a formal model using standards and ontologies to provide clear and structured documentation of AI applications in automation systems. The proposed information model for artificial intelligence in automation systems (AIAS) utilizes ontology design patterns to map and link various aspects of automation systems and AI software. Validated through a practical example, the model demonstrates its effectiveness in improving documentation practices and aiding the sustainable implementation of AI in industrial settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03179 [pdf, other]

Motion meets Attention: Video Motion Prompts

Authors: Qixiang Chen, Lei Wang, Piotr Koniusz, Tom Gedeon

Abstract: Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms hav… ▽ More Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms have enhanced many computer vision tasks by effectively highlighting salient visual areas. Inspired by this, we propose using a modified Sigmoid function with learnable slope and shift parameters as an attention mechanism to activate and modulate motion signals derived from frame differencing maps. This approach generates a sequence of attention maps that enhance the processing of motion-related video content. To ensure temporally continuity and smoothness of the attention maps, we apply pair-wise temporal attention variation regularization to remove unwanted motions (e.g., noise) while preserving important ones. We then perform Hadamard product between each pair of attention maps and the original video frames to highlight the evolving motions of interest over time. These highlighted motions, termed video motion prompts, are subsequently used as inputs to the model instead of the original video frames. We formalize this process as a motion prompt layer and incorporate the regularization term into the loss function to learn better motion prompts. This layer serves as an adapter between the model and the video data, bridging the gap between traditional 'blind motion extraction' and the extraction of relevant motions of interest. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Research report

arXiv:2407.03168 [pdf, other]

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Authors: Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

Abstract: Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computa… ▽ More Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03146 [pdf, other]

Enhancing Class Fairness in Classification with A Two-Player Game Approach

Authors: Yunpeng Jiang, Paul Weng, Yutong Ban

Abstract: Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to… ▽ More Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose a FAir Classification approach with a Two-player game (FACT). We first formulate the training of a classifier with data augmentation as a fair optimization problem, which can be further written as an adversarial two-player game. Following this formulation, we propose a novel multiplicative weight optimization algorithm, for which we theoretically prove that it can converge to a solution that is fair over classes. Interestingly, our formulation also reveals that this fairness issue over classes is not due to data augmentation only, but is in fact a general phenomenon. Our empirical experiments demonstrate that the performance of our learned classifiers is indeed more fairly distributed over classes in five datasets, with only limited impact on the average accuracy. △ Less

Submitted 30 May, 2024; originally announced July 2024.

arXiv:2407.03140 [pdf, other]

Machine Learning Models for Improved Tracking from Range-Doppler Map Images

Authors: Elizabeth Hou, Ross Greenwood, Piyush Kumar

Abstract: Statistical tracking filters depend on accurate target measurements and uncertainty estimates for good tracking performance. In this work, we propose novel machine learning models for target detection and uncertainty estimation in range-Doppler map (RDM) images for Ground Moving Target Indicator (GMTI) radars. We show that by using the outputs of these models, we can significantly improve the perf… ▽ More Statistical tracking filters depend on accurate target measurements and uncertainty estimates for good tracking performance. In this work, we propose novel machine learning models for target detection and uncertainty estimation in range-Doppler map (RDM) images for Ground Moving Target Indicator (GMTI) radars. We show that by using the outputs of these models, we can significantly improve the performance of a multiple hypothesis tracker for complex multi-target air-to-ground tracking scenarios. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03138 [pdf, ps, other]

Superselection rules and bosonic quantum computational resources

Authors: Eloi Descamps, Nicolas Fabre, Astghik Saharyan, Arne Keller, Pérola Milman

Abstract: We present a method to systematically identify and classify quantum optical non-classical resources based on the computational power they generate in a bosonic quantum computer. To achieve this, we establish a one-to-one correspondence between arbitrary continuous variable states in a multimode Hilbert space and single photons occupying each a single mode, which are used to define a bosonic quantu… ▽ More We present a method to systematically identify and classify quantum optical non-classical resources based on the computational power they generate in a bosonic quantum computer. To achieve this, we establish a one-to-one correspondence between arbitrary continuous variable states in a multimode Hilbert space and single photons occupying each a single mode, which are used to define a bosonic quantum computer. Starting from a classical state in a representation that explicitly respects particle number super-selection rules, we apply universal gates to create arbitrary superposition of states with the same total particle number. The non-classicality of these states can then be directly related to the computational power they induce in the quantum computer. We also provide a correspondence between the adopted representation and the more conventional one in quantum optics, where superpositions of Fock states describe quantum optical states, and we identify how mode entanglement can lead to quantum advantage. In addition, our work contributes to establish a seamless transition from continuous to discrete properties of quantum optics while laying the grounds for a description of non-classicality and quantum computational advantage that is applicable to spin systems as well. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03137 [pdf, other]

X-Shooting ULLYSES: Massive Stars at low metallicity -- IV. Spectral analysis methods and exemplary results for O stars

Authors: A. A. C. Sander, J. -C. Bouret, M. Bernini-Peron, J. Puls, F. Backs, S. R. Berlanas, J. M. Bestenlehner, S. A. Brands, A. Herrero, F. Martins, O. Maryeva, D. Pauli, V. Ramachandran, P. A. Crowther, V. M. A. Gómez-González, A. C. Gormaz-Matamala, W. -R. Hamann, D. J. Hillier, R. Kuiper, C. J. K. Larkin, R. R. Lefever, A. Mehner, F. Najarro, L. M. Oskinova, E. C. Schösser , et al. (4 additional authors not shown)

Abstract: CONTEXT: The spectral analysis of hot, massive stars is a fundamental astrophysical method to obtain their intrinsic properties and their feedback. Quantitative spectroscopy for hot, massive stars requires detailed numerical modeling of the atmosphere and an iterative treatment to obtain the best solution within a given framework. AIMS: We present an overview of different techniques for the quanti… ▽ More CONTEXT: The spectral analysis of hot, massive stars is a fundamental astrophysical method to obtain their intrinsic properties and their feedback. Quantitative spectroscopy for hot, massive stars requires detailed numerical modeling of the atmosphere and an iterative treatment to obtain the best solution within a given framework. AIMS: We present an overview of different techniques for the quantitative spectroscopy of hot stars employed within the X-Shooting ULLYSES collaboration, from grid-based approaches to tailored fits. By performing a blind test, we gain an overview about the similarities and differences of the resulting parameters. Our study aims to provide an overview of the parameter spread caused by different approaches. METHODS: For three different stars from the sample (SMC O5 star AzV 377, LMC O7 star Sk -69 50, and LMC O9 star Sk -66 171), we employ different atmosphere codes (CMFGEN, Fastwind, PoWR) and strategies to determine their best-fitting model. For our analyses, UV and optical spectra are used to derive the properties with some methods relying purely on optical data for comparison. To determine the overall spectral energy distribution, we further employ additional photometry from the literature. RESULTS: Effective temperatures for each of three sample stars agree within 3 kK while the differences in log g can be up to 0.2 dex. Luminosity differences of up to 0.1 dex result from different reddening assumptions, which seem to be larger for the methods employing a genetic algorithm. All sample stars are nitrogen-enriched. CONCLUSIONS: We find a reasonable agreement between the different methods. Tailored fitting tends to be able to minimize discrepancies obtained with more course or automatized treatments. UV spectral data is essential for the determination of realistic wind parameters. For one target (Sk -69 50), we find clear indications of an evolved status. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 18+15 pages, 21+4 figures, under review at A&A, condensed abstract

arXiv:2407.03132 [pdf, other]

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: to be published in Interspeech 2024 proceedings

arXiv:2407.03128 [pdf]

Thorium doped strontium fluoride crystal: a unique candidate for solid nuclear optical clock material

Authors: Qiaorui Gong, Shanming Li, Shulong Zhang, Siliang Tao, Guoliang Deng, Peixiong Zhang, Chengchun Zhao, Yin Hang, Shining Zhu, Longsheng Ma

Abstract: We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform do** of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high do** concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not… ▽ More We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform do** of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high do** concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not be irradiated-colored under strong α radiation like CaF2 crystal, Th:SrF2 crystal is expected to fully unleash its high concentration do** characteristics while ensuring its transmission performance in nuclear transition band not be severely affected by 229Th radiation damage. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03119 [pdf, other]

Entanglement-assisted authenticated BB84 protocol

Authors: Pol Julià Farré, Vladlen Galetsky, Soham Ghosh, Janis Nötzel, Christian Deppe

Abstract: This work delivers a novel user-server authentication procedure exploiting the features of maximally entangled pairs in both an idealistic noiseless scenario and a moderately noisy one. Additionally, we leverage the specific features of our design, which are conveniently suited for inlaying it into the well known BB84 quantum communication protocol. We first define a trivial extension of our initi… ▽ More This work delivers a novel user-server authentication procedure exploiting the features of maximally entangled pairs in both an idealistic noiseless scenario and a moderately noisy one. Additionally, we leverage the specific features of our design, which are conveniently suited for inlaying it into the well known BB84 quantum communication protocol. We first define a trivial extension of our initial proposal allowing for such task (symmetric scheme) to then come up with what we denote as asymmetric scheme, better matching practicality. Furthermore, a realistic simulation of the user-server authentication protocol has been achieved by employing a noisy model for both transmission and storage, the latter relying on cavity-enhanced atomic-frequency comb (AFC) memories. While in a noiseless scenario our proposal is ensured to be airtight, considering a certain degree of noise poses a challenge when aiming to actually implement it. We have implemented a deep neural network to distinguish legitimate users from forgery attempts, outperforming a mere statistical approach designed for the same task. Such method achieved a success rate of 0.75 with storage times of $1$ $μs$ and a user-server distance of $10$ km. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures

arXiv:2407.03110 [pdf, other]

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio \& visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications. △ Less

Submitted 2 May, 2024; originally announced July 2024.

arXiv:2407.03093 [pdf, other]

Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets

Authors: Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Meiyappan Nagappan, Shane McIntosh

Abstract: The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper i… ▽ More The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper introduces Real-Vul, a comprehensive dataset representing real-world scenarios for evaluating vulnerability detection models. Evaluating DeepWukong, LineVul, ReVeal, and IVDetect shows a significant drop in performance, with precision decreasing by up to 95 percentage points and F1 scores by up to 91 points. Furthermore, Model performance fluctuates based on vulnerability characteristics, with better F1 scores for information leaks or code injection than for path resolution or predictable return values. The results highlight a significant performance gap that needs addressing before deploying deep learning-based vulnerability detection in practical settings. Overfitting is identified as a key issue, and an augmentation technique is proposed, potentially improving performance by up to 30%. Contributions include a dataset creation approach for better model evaluation, Real-Vul dataset, and empirical evidence of deep learning models struggling in real-world settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

ACM Class: D.2; I.2

Journal ref: 10.1109/TSE.2024.3423712

arXiv:2407.03091 [pdf, other]

Performance Comparison of ROS2 Middlewares for Multi-robot Mesh Networks in Planetary Exploration

Authors: Loïck Pierre Chovet, Gabriel Manuel Garcia, Abhishek Bera, Antoine Richard, Kazuya Yoshida, Miguel Angel Olivares-Mendez

Abstract: Recent advancements in Multi-Robot Systems (MRS) and mesh network technologies pave the way for innovative approaches to explore extreme environments. The Artemis Accords, a series of international agreements, have further catalyzed this progress by fostering cooperation in space exploration, emphasizing the use of cutting-edge technologies. In parallel, the widespread adoption of the Robot Operat… ▽ More Recent advancements in Multi-Robot Systems (MRS) and mesh network technologies pave the way for innovative approaches to explore extreme environments. The Artemis Accords, a series of international agreements, have further catalyzed this progress by fostering cooperation in space exploration, emphasizing the use of cutting-edge technologies. In parallel, the widespread adoption of the Robot Operating System 2 (ROS 2) by companies across various sectors underscores its robustness and versatility. This paper evaluates the performances of available ROS 2 MiddleWare (RMW), such as FastRTPS, CycloneDDS and Zenoh, over a mesh network with a dynamic topology. The final choice of RMW is determined by the one that would fit the most the scenario: an exploration of the extreme extra-terrestrial environment using a MRS. The conducted study in a real environment highlights Zenoh as a potential solution for future applications, showing a reduced delay, reachability, and CPU usage while being competitive on data overhead and RAM usage over a dynamic mesh topology △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: PrePrint

arXiv:2407.03087 [pdf, other]

Improved finite-size key rates for discrete-modulated continuous variable quantum key distribution under coherent attacks

Authors: Carlos Pascual-García, Stefan Bäuml, Mateus Araújo, Rotem Liss, Antonio Acín

Abstract: Continuous variable quantum key distribution (CVQKD) with discrete modulation combines advantages of CVQKD, such as the implementability using readily available technologies, with advantages of discrete variable quantum key distribution, such as easier error correction procedures. We consider a prepare-and-measure CVQKD protocol, where Alice chooses from a set of four coherent states and Bob perfo… ▽ More Continuous variable quantum key distribution (CVQKD) with discrete modulation combines advantages of CVQKD, such as the implementability using readily available technologies, with advantages of discrete variable quantum key distribution, such as easier error correction procedures. We consider a prepare-and-measure CVQKD protocol, where Alice chooses from a set of four coherent states and Bob performs a heterodyne measurement, the result of which is discretised in both key and test rounds. We provide a security proof against coherent attacks in the finite-size regime, and compute the achievable key rate. To this end, we employ the generalised entropy accumulation theorem, as well as recent advances in conic optimisation, yielding improved key rates compared to previous works. At metropolitan distances, our method can provide positive key rates for the order of $10^8$ rounds. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 24 pages, 5 figures

arXiv:2407.03080 [pdf, other]

Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios

Authors: Patricia A. Apellániz, Ana Jiménez, Borja Arroyo Galende, Juan Parras, Santiago Zazo

Abstract: While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on substantial training data, often unavailable in real-world applications. This paper addresses this challenge by proposing a novel methodology for generating realistic and reliable synthetic tabular data with DGMs in limited re… ▽ More While synthetic tabular data generation using Deep Generative Models (DGMs) offers a compelling solution to data scarcity and privacy concerns, their effectiveness relies on substantial training data, often unavailable in real-world applications. This paper addresses this challenge by proposing a novel methodology for generating realistic and reliable synthetic tabular data with DGMs in limited real-data environments. Our approach proposes several ways to generate an artificial inductive bias in a DGM through transfer learning and meta-learning techniques. We explore and compare four different methods within this framework, demonstrating that transfer learning strategies like pre-training and model averaging outperform meta-learning approaches, like Model-Agnostic Meta-Learning, and Domain Randomized Search. We validate our approach using two state-of-the-art DGMs, namely, a Variational Autoencoder and a Generative Adversarial Network, to show that our artificial inductive bias fuels superior synthetic data quality, as measured by Jensen-Shannon divergence, achieving relative gains of up to 50\% when using our proposed approach. This methodology has broad applicability in various DGMs and machine learning tasks, particularly in areas like healthcare and finance, where data scarcity is often a critical issue. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 19 pages, 6 Figures

MSC Class: I.2.0

arXiv:2407.03076 [pdf, other]

A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning

Authors: Ramakrishna Appicharla, Baban Gain, Santanu Pal, Asif Ekbal, Pushpak Bhattacharyya

Abstract: In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies \cite{li-etal-2020-multi-encoder} have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task lear… ▽ More In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies \cite{li-etal-2020-multi-encoder} have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context. We conduct experiments on cascade MTL architecture, which consists of one encoder and two decoders. Generation of the source from the context is considered an auxiliary task, and generation of the target from the source is the main task. We experimented with German--English language pairs on News, TED, and Europarl corpora. Evaluation results show that the proposed MTL approach performs better than concatenation-based and multi-encoder DocNMT models in low-resource settings and is sensitive to the choice of context. However, we observe that the MTL models are failing to generate the source from the context. These observations align with the previous studies, and this might suggest that the available document-level parallel corpora are not context-aware, and a robust sentence-level model can outperform the context-aware models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to EAMT 2024 (poster)

arXiv:2407.03058 [pdf, other]

Tensor Networks for Lattice Gauge Theories beyond one dimension: a Roadmap

Authors: Giuseppe Magnifico, Giovanni Cataldi, Marco Rigobello, Peter Majcen, Daniel Jaschke, Pietro Silvi, Simone Montangero

Abstract: Tensor network methods are a class of numerical tools and algorithms to study many-body quantum systems in and out of equilibrium, based on tailored variational wave functions. They have found significant applications in simulating lattice gauge theories approaching relevant problems in high-energy physics. Compared to Monte Carlo methods, they do not suffer from the sign problem, allowing them to… ▽ More Tensor network methods are a class of numerical tools and algorithms to study many-body quantum systems in and out of equilibrium, based on tailored variational wave functions. They have found significant applications in simulating lattice gauge theories approaching relevant problems in high-energy physics. Compared to Monte Carlo methods, they do not suffer from the sign problem, allowing them to explore challenging regimes such as finite chemical potentials and real-time dynamics. Further development is required to tackle fundamental challenges, such as accessing continuum limits or computations of large-scale quantum chromodynamics. In this work, we review the state-of-the-art of Tensor Network methods and discuss a possible roadmap for algorithmic development and strategies to enhance their capabilities and extend their applicability to open high-energy problems. We provide tailored estimates of the theoretical and computational resource scaling for attacking large-scale lattice gauge theories. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 14 pages, 6 figures

arXiv:2407.03034 [pdf, ps, other]

Attention Incorporated Network for Sharing Low-rank, Image and K-space Information during MR Image Reconstruction to Achieve Single Breath-hold Cardiac Cine Imaging

Authors: Siying Xu, Kerstin Hammernik, Andreas Lingg, Jens Kuebler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Kuestner

Abstract: Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities,… ▽ More Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities, including single-domain learning, reliance on a single regularization term, and equal feature contribution. To address these limitations, we propose to embed information from multiple domains, including low-rank, image, and k-space, in a novel deep learning network for MRI reconstruction, which we denote as A-LIKNet. A-LIKNet adopts a parallel-branch structure, enabling independent learning in the k-space and image domain. Coupled information sharing layers realize the information exchange between domains. Furthermore, we introduce attention mechanisms into the network to assign greater weights to more critical coils or important temporal frames. Training and testing were conducted on an in-house dataset, including 91 cardiovascular patients and 38 healthy subjects scanned with 2D cardiac Cine using retrospective undersampling. Additionally, we evaluated A-LIKNet on the real-time 8x prospectively undersampled data from the OCMR dataset. The results demonstrate that our proposed A-LIKNet outperforms existing methods and provides high-quality reconstructions. The network can effectively reconstruct highly retrospectively undersampled dynamic MR images up to 24x accelerations, indicating its potential for single breath-hold imaging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03033 [pdf]

ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation

Authors: Chang Li, Pengfei Zhang, Yu Wang

Abstract: Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i.e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remot… ▽ More Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i.e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remote sensing is not fully utilized. To solve the aforementioned issues, an index-space-wave state superposition Transformer (ISWSST) is the first to be proposed for MSRSI semantic segmentation by the inspiration from quantum mechanics, whose superiority is as follows: 1) index, space and wave states are superposed or fused to simulate quantum superposition by adaptively voting decision (i.e., ensemble learning idea) for being a stronger classifier and improving the segmentation accuracy; 2) a lossless wavelet pyramid encoder-decoder module is designed to losslessly reconstruct image and simulate quantum entanglement based on wavelet transform and inverse wavelet transform for avoiding the edge extraction loss; 3) combining multispectral features (i.e. remote sensing index and channel attention mechanism) is proposed to accurately extract ground objects from original resolution images; and 4) quantum mechanics are introduced to interpret the underlying superiority of ISWSST. Experiments show that ISWSST is validated and superior to the state-of-the-art architectures for the MSRSI segmentation task, which improves the segmentation and edge extraction accuracy effectively. Codes will be available publicly after our paper is accepted. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03032 [pdf, other]

Strategies for Arabic Readability Modeling

Authors: Juan Piñeros Liberato, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash

Abstract: Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability resources. In this paper, we present a set of experimental results on Arabic readability assessment using a diverse range of approaches, from rule-bas… ▽ More Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability resources. In this paper, we present a set of experimental results on Arabic readability assessment using a diverse range of approaches, from rule-based methods to Arabic pretrained language models. We report our results on a newly created corpus at different textual granularity levels (words and sentence fragments). Our results show that combining different techniques yields the best results, achieving an overall macro F1 score of 86.7 at the word level and 87.9 at the fragment level on a blind test set. We make our code, data, and pretrained models publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to ArabicNLP 2024, ACL

arXiv:2407.03031 [pdf, other]

Entangled pairs in evaporating black holes without event horizons

Authors: Ivan Agullo, Paula Calizaya Cabrera, Beatriz Elizaga Navascués

Abstract: Investigations into Hawking radiation often assume a black hole model featuring an event horizon, despite the growing consensus that such causal structures may not exist in nature. While this assumption is not crucial for deriving the local properties of radiation at future null infinity, it plays a significant role in discussions about Hawking partners -- the field modes that purify Hawking radia… ▽ More Investigations into Hawking radiation often assume a black hole model featuring an event horizon, despite the growing consensus that such causal structures may not exist in nature. While this assumption is not crucial for deriving the local properties of radiation at future null infinity, it plays a significant role in discussions about Hawking partners -- the field modes that purify Hawking radiation. This article aims to explore the definition and fate of Hawking partners in black hole scenarios where semiclassical mass loss due to Hawking radiation is considered. Our analysis avoids the assumption of event horizons and instead focuses on collapse processes that feature a trapped region bounded by a dynamical horizon. We derive the form of the partners, accounting for the effects of back-scattering. Furthermore, using these results and mild assumptions, we find that Hawking partners cannot "leak" out of the dynamical horizon to partially purify the Hawking radiation in the regime where general relativity coexists semiclassically with quantum field theory. This finding emphasizes the necessity for new physics, such as quantum gravity, to resolve the final fate of information. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 31 pages, 5 figures, 3 appendices

arXiv:2407.03027 [pdf]

doi 10.1007/s11042-024-19734-3

Differentially Processed Optimized Collaborative Rich Text Editor

Authors: Nishtha Jatana, Mansehej Singh, Charu Gupta, Geetika Dhand, Shaily Malik, Pankaj Dadheech, Nagender Aneja, Sandhya Aneja

Abstract: A collaborative real-time text editor is an application that allows multiple users to edit a document simultaneously and merge their contributions automatically. It can be made collaborative by implementing a conflict resolution algorithm either on the client side (in peer-to-peer collaboration) or on the server side (when using web sockets and a central server to monitor state changes). Although… ▽ More A collaborative real-time text editor is an application that allows multiple users to edit a document simultaneously and merge their contributions automatically. It can be made collaborative by implementing a conflict resolution algorithm either on the client side (in peer-to-peer collaboration) or on the server side (when using web sockets and a central server to monitor state changes). Although web sockets are ideal for real-time text editors, using multiple collaborative editors on one connection can create problems. This is because a single web connection cannot monitor which user is collaborating on which application state, leading to unnecessary network queries and data being delivered to the wrong state. To address this issue, the current solution is to open multiple web socket connections, with one web socket per collaboration application. However, this can add significant overhead proportional to the number of apps utilized. In this study, we demonstrate an algorithm that enables using a single web socket for multiple collaborative applications in a collaborative editor. Our method involves modifying the socket's code to track which application's shared state is being worked on and by whom. This allows for the simultaneous collaboration of multiple states in real-time, with infinite users, without opening a different socket for each application. Our optimized editor showed an efficiency improvement of over 96% in access time duration. This approach can be implemented in other collaborative editors and web applications with similar architecture to improve performance and eliminate issues arising from network overload. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Journal ref: Multimedia Tools and Applications (2024)

Showing 1–50 of 417,517 results for author: P.