Search | arXiv e-print repository

Encoding arbitrary Ising Hamiltonians on Spatial Photonic Ising Machines

Authors: Jason Sakellariou, Alexis Askitopoulos, Georgios Pastras, Symeon I. Tsintzos

Abstract: Photonic Ising Machines constitute an emergent new paradigm of computation, geared towards tackling combinatorial optimization problems that can be reduced to the problem of finding the ground state of an Ising model. Spatial Photonic Ising Machines have proven to be advantageous for simulating fully connected large-scale spin systems. However, fine control of a general interaction matrix $J$ has… ▽ More Photonic Ising Machines constitute an emergent new paradigm of computation, geared towards tackling combinatorial optimization problems that can be reduced to the problem of finding the ground state of an Ising model. Spatial Photonic Ising Machines have proven to be advantageous for simulating fully connected large-scale spin systems. However, fine control of a general interaction matrix $J$ has so far only been accomplished through eigenvalue decomposition methods that either limit the scalability or increase the execution time of the optimization process. We introduce and experimentally validate a SPIM instance that enables direct control over the full interaction matrix, enabling the encoding of Ising Hamiltonians with arbitrary couplings and connectivity. We demonstrate the conformity of the experimentally measured Ising energy with the theoretically expected values and then proceed to solve both the unweighted and weighted graph partitioning problems, showcasing a systematic convergence to an optimal solution via simulated annealing. Our approach greatly expands the applicability of SPIMs for real-world applications without sacrificing any of the inherent advantages of the system, and paves the way to encoding the full range of NP problems that are known to be equivalent to Ising models, on SPIM devices. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.09159 [pdf, other]

Weakly-supervised Autism Severity Assessment in Long Videos

Authors: Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, Séverine Dubuisson, Francois Bremond, Susanne Thümmler

Abstract: Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-tempor… ▽ More Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-temporal features of long videos to learn typical and atypical behaviors for autism detection. On top of that, we propose a shallow TCN-MLP network, which is designed to further categorize the severity score. We evaluate our method on actual evaluation videos of children with autism collected and annotated (for severity score) by clinical professionals. Experimental results demonstrate the effectiveness of behavioral biomarkers that could help clinicians in autism spectrum analysis. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Journal ref: https://cbmi2024.org/

arXiv:2407.09158 [pdf, ps, other]

A non-abelian tensor product of algebras with bracket

Authors: José Manuel Casas, Emzar Khmaladze, Manuel Ladra

Abstract: We introduce and study a non-abelian tensor product of two algebras with bracket with compatible actions on each other. We investigate its applications to the universal central extensions and the low-dimensional homology of perfect algebras with bracket. We introduce and study a non-abelian tensor product of two algebras with bracket with compatible actions on each other. We investigate its applications to the universal central extensions and the low-dimensional homology of perfect algebras with bracket. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 17 pages. arXiv admin note: text overlap with arXiv:2307.15636

MSC Class: 16E40; 16E99; 16W99; 16B50

arXiv:2407.09153 [pdf]

doi 10.1038/s41467-024-49841-6

Topological Fermi-arc surface state covered by floating electrons on a two-dimensional electride

Authors: Chan-young Lim, Min-Seok Kim, Dong Cheol Lim, Sunghun Kim, Yeonghoon Lee, Jaehoon Cha, Gyubin Lee, Sang Yong Song, Dinesh Thapa, Jonathan D. Denlinger, Seong-Gon Kim, Sung Wng Kim, Jungpil Seo, Yeongkwan Kim

Abstract: Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromag… ▽ More Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromagnetic electride $[Gd_{2}$C]^{2+}\cdot2e^{-}$. In particular, the presence of Weyl cones and Fermi-arc states is demonstrated through photon energy-dependent ARPES measurements, agreeing with theoretical band structure calculations. Notably, the STM measurements reveal that the Fermi-arc states exist underneath a floating quantum electron liquid on the top Gd layer, forming double-stacked surface states in a heterostructure. Our work thus not only unveils the non-trivial topology of the $[Gd_{2}$C]^{2+}\cdot2e^{-}$ electride but also realizes a surface heterostructure that can host phenomena distinct from the bulk. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 22 pages, 6 figures

Journal ref: Nat. Commun. 15 (2024) 5615

arXiv:2407.09146 [pdf, other]

Directed univalence in simplicial homotopy type theory

Authors: Daniel Gratzer, Jonathan Weinberger, Ulrik Buchholtz

Abstract: Simplicial type theory extends homotopy type theory with a directed path type which internalizes the notion of a homomorphism within a type. This concept has significant applications both within mathematics -- where it allows for synthetic (higher) category theory -- and programming languages -- where it leads to a directed version of the structure identity principle. In this work, we construct th… ▽ More Simplicial type theory extends homotopy type theory with a directed path type which internalizes the notion of a homomorphism within a type. This concept has significant applications both within mathematics -- where it allows for synthetic (higher) category theory -- and programming languages -- where it leads to a directed version of the structure identity principle. In this work, we construct the first types in simplicial type theory with non-trivial homomorphisms. We extend simplicial type theory with modalities and new reasoning principles to obtain triangulated type theory in order to construct the universe of discrete types $\mathcal{S}$. We prove that homomorphisms in this type correspond to ordinary functions of types i.e., that $\mathcal{S}$ is directed univalent. The construction of $\mathcal{S}$ is foundational for both of the aforementioned applications of simplicial type theory. We are able to define several crucial examples of categories and to recover important results from category theory. Using $\mathcal{S}$, we are also able to define various types whose usage is guaranteed to be functorial. These provide the first complete examples of the proposed directed structure identity principle. △ Less

Submitted 12 July, 2024; originally announced July 2024.

MSC Class: 03B38; 18N60; 18D30; 18B50; 18N45; 55U35; 18N50 ACM Class: F.4.1

arXiv:2407.09142 [pdf, other]

Securing Confidential Data For Distributed Software Development Teams: Encrypted Container File

Authors: Tobias J. Bauer, Andreas Aßmuth

Abstract: In the context of modern software engineering, there is a trend towards Cloud-native software development involving international teams with members from all over the world. Cloud-based version management services like GitHub are commonly used for source code and other files. However, a challenge arises when developers from different companies or organizations share the platform, as sensitive data… ▽ More In the context of modern software engineering, there is a trend towards Cloud-native software development involving international teams with members from all over the world. Cloud-based version management services like GitHub are commonly used for source code and other files. However, a challenge arises when developers from different companies or organizations share the platform, as sensitive data should be encrypted to restrict access to certain developers only. This paper discusses existing tools addressing this issue, highlighting their shortcomings. The authors propose their own solution, Encrypted Container Files, designed to overcome the deficiencies observed in other tools. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 18 pages, for associated implementation etc., see https://github.com/Hirnmoder/ECF

Journal ref: International Journal On Advances in Security, vol. 17, no. 1 and 2, pp. 11-28, 2024, ISSN 1942-2636

arXiv:2407.09140 [pdf, other]

FlyEye Ground-Based Telescope: Unveiling New Frontiers in Astronomical Science

Authors: Carmelo Arcidiacono, Matteo Simioni, Roberto Ragazzoni, Piero Gregori, Paolo Lorenzi, Francesco Cerutti, Roberto Ziano, Matteo Bisiani, Roberta Pellegrini, Andrea Guazzora, Silvano Pieri, Marco Dima, Silvio Di Rosa, Simone Zaggia, Jacopo Farinato, Demetrio Magrin, Andrea Grazian, Marco Gullieuszik

Abstract: The FlyEye design makes its debut in the ESA's NEOSTEL developed by OHB-Italia. This pioneering FlyEye telescope integrates a monolithic 1-meter class primary mirror feeding 16 CCD cameras for discovering Near-Earth Object (NEO) and any class of transient phenomena. OHB-Italia is the prime contractor, receiving extended support from the Italian National Institute for Astrophysics (INAF) in the ESA… ▽ More The FlyEye design makes its debut in the ESA's NEOSTEL developed by OHB-Italia. This pioneering FlyEye telescope integrates a monolithic 1-meter class primary mirror feeding 16 CCD cameras for discovering Near-Earth Object (NEO) and any class of transient phenomena. OHB-Italia is the prime contractor, receiving extended support from the Italian National Institute for Astrophysics (INAF) in the ESA's NEOSTED program's integration and testing. The FlyEye distinctive design splits the Field of View into 16 channels, creating a unique multi-telescope system with a panoramic 44 square degree Field of View and a seeing-size pixel-scale, enabling NEOs detection down to apparent magnitudes 21.5 insisting on a 1m diameter spherical mirror. The scientific products of a similar FlyEye telescope can complement facilities such as Vera Rubin (former LSST) and ZTF. The FlyEye has the ability to survey two-thirds of the visible sky about three times per night can revolutionize time-domain astronomy, enabling comprehensive studies of transient phenomena, placing FlyEye in a new era of exploration of the dynamic universe. Efforts to develop automated calibration and testing procedures are keys to realizing this transformative potential. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 9 pages, 1 figure, SPIE Astronomical Telescopes + Instrumentation, Ground-based and Airborne Instrumentation for Astronomy X, 16-21 June 2024

arXiv:2407.09139 [pdf, other]

Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (414 additional authors not shown)

Abstract: We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det… ▽ More We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

Report number: Belle II Preprint 2024-009, KEK Preprint 2024-1

arXiv:2407.09136 [pdf, other]

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Authors: Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

Abstract: Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspi… ▽ More Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Preprint. Nico Daheim and Jakub Macina contributed equally. Code and dataset can be found under: https://github.com/eth-lre/verify-then-generate

arXiv:2407.09132 [pdf, ps, other]

The MICADO first light imager for the ELT: the PSF Reconstruction Software

Authors: Andrea Grazian, Elisa Portaluri, Matteo Simioni, Carmelo Arcidiacono, Marco Gullieuszik, Johanna Hartke, Daniel Jodlbauer, Fernando Pedichini, Roberto Piazzesi, Piero Vaccari, Benedetta Vulcani, Roland Wagner, Anita Zanella

Abstract: MICADO is the first-light camera of the ESO ELT, allowing NIR imaging and long-slit spectroscopy assisted by adaptive optics. MICADO is now entering its construction phase, and the software for data reduction is reaching an adequate maturity level. The PSF Reconstruction (PSF-R) of MICADO is a software tool for the blind derivation of the PSF, only using adaptive optics telemetry data. An update o… ▽ More MICADO is the first-light camera of the ESO ELT, allowing NIR imaging and long-slit spectroscopy assisted by adaptive optics. MICADO is now entering its construction phase, and the software for data reduction is reaching an adequate maturity level. The PSF Reconstruction (PSF-R) of MICADO is a software tool for the blind derivation of the PSF, only using adaptive optics telemetry data. An update of the status of the PSF-R service is provided here. The PSF-R prototype has been tested on ERIS@VLT data in order to check the reconstruction of on- and off-axis PSFs. The on-axis PSF-R is accurate at a few percent level on Strehl, FWHM, Encircled Energy, and half light radius, while for the off-axis case the match is within 10-15 percent at a distance of half isoplanatic angle. The first version of the workflow for the PSF-R pipeline has been developed and verified using the latest release of the ESO data processing system. A set of simulations has been implemented on the morphological analysis of distant galaxies, showing that the accuracy of the PSF-R matches the goals needed to study their morphology. In summary, the PSF-R team is on the right track towards the ELT first light. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 5 pages, 3 figures, Proceedings for the SPIE Astronomical Telescopes and Instrumentation 2024, Adaptive Optics Systems IX, Paper No.13097-234

arXiv:2407.09130 [pdf, other]

On goodness-of-fit testing for self-exciting point processes

Authors: José C. F. Kling, Mathias Vetter

Abstract: Despite the wide usage of parametric point processes in theory and applications, a sound goodness-of-fit procedure to test whether a given parametric model is appropriate for data coming from a self-exciting point processes has been missing in the literature. In this work, we establish a bootstrap-based goodness-of-fit test which empirically works for all kinds of self-exciting point processes (an… ▽ More Despite the wide usage of parametric point processes in theory and applications, a sound goodness-of-fit procedure to test whether a given parametric model is appropriate for data coming from a self-exciting point processes has been missing in the literature. In this work, we establish a bootstrap-based goodness-of-fit test which empirically works for all kinds of self-exciting point processes (and even beyond). In an infill-asymptotic setting we also prove its asymptotic consistency, albeit only in the particular case that the underlying point process is inhomogeneous Poisson. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09127 [pdf, other]

Robustness of Explainable Artificial Intelligence in Industrial Process Modelling

Authors: Benedikt Kantz, Clemens Staudinger, Christoph Feilmayr, Johannes Wachlmayr, Alexander Haberl, Stefan Schuster, Franz Pernkopf

Abstract: eXplainable Artificial Intelligence (XAI) aims at providing understandable explanations of black box models. In this paper, we evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To this end, we used an Electric Arc Furnace (EAF) model to better understand the limits and robustness characteristics of XAI methods such as SHapley Additive exPlanat… ▽ More eXplainable Artificial Intelligence (XAI) aims at providing understandable explanations of black box models. In this paper, we evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To this end, we used an Electric Arc Furnace (EAF) model to better understand the limits and robustness characteristics of XAI methods such as SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), as well as Averaged Local Effects (ALE) or Smooth Gradients (SG) in a highly topical setting. These XAI methods were applied to various types of black-box models and then scored based on their correctness compared to the ground-truth sensitivity of the data-generating processes using a novel scoring evaluation methodology over a range of simulated additive noise. The resulting evaluation shows that the capability of the Machine Learning (ML) models to capture the process accurately is, indeed, coupled with the correctness of the explainability of the underlying data-generating process. We furthermore show the differences between XAI methods in their ability to correctly predict the true sensitivity of the modeled industrial process. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 11 pages, 3 figures, accepted at the ICML'24 Workshop ML4MS

arXiv:2407.09121 [pdf, other]

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09120 [pdf, other]

doi 10.1145/3637528.3671887

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

Authors: Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jie** Ye

Abstract: Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusi… ▽ More Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusing only on consensus between views or provide unreliable recovered views due to the absence of supervision. To address these limitations, we propose a novel Unified and Robust Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC). URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples. Firstly, to overcome the limitations of cross-view contrastive learning, URRL-IMVC incorporates an attention-based auto-encoder framework to fuse multi-view information and generate unified embeddings. Secondly, URRL-IMVC directly enhances the robustness of the unified embedding against view-missing conditions through KNN imputation and data augmentation techniques, eliminating the need for explicit missing view recovery. Finally, incremental improvements are introduced to further enhance the overall performance, such as the Clustering Module and the customization of the Encoder. We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance. Furthermore, comprehensive ablation studies are performed to validate the effectiveness of our design. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted by ACM SIGKDD 2024

arXiv:2407.09119 [pdf, other]

Enhanced quantum state transfer via feedforward cancellation of optical phase noise

Authors: Benjamin P. Maddox, Jonathan M. Mortlock, Tom R. Hepworth, Adarsh P. Raghuram, Philip D. Gregory, Alexander Guttridge, Simon L. Cornish

Abstract: Many experimental platforms for quantum science depend on state control via laser fields. Frequently, however, the control fidelity is limited by optical phase noise. This is exacerbated in stabilized laser systems where high-frequency phase noise is an unavoidable consequence of feedback. Here we implement an optical feedforward technique to suppress laser phase noise in the STIRAP state transfer… ▽ More Many experimental platforms for quantum science depend on state control via laser fields. Frequently, however, the control fidelity is limited by optical phase noise. This is exacerbated in stabilized laser systems where high-frequency phase noise is an unavoidable consequence of feedback. Here we implement an optical feedforward technique to suppress laser phase noise in the STIRAP state transfer of ultracold RbCs molecules, across 114 THz, from a weakly bound Feshbach state to the rovibrational ground state. By performing over 100 state transfers on single molecules, we measure a significantly enhanced transfer efficiency of 98.7(1)% limited only by available laser intensity. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09114 [pdf]

Spectroscopy of Single CdSe Magic-Sized Nanocrystals

Authors: Gabriel Nagamine, Julian Santen, Juri G. Crimmann, Aniket S. Mule, Andrew B. Pun, David J. Norris

Abstract: Chemical syntheses that provide nanocrystals (NCs) with narrow distributions in size and shape are critical for NC research. This has led to the investigation of magic-sized NCs (MSNCs), a class of semiconductor crystallites that grow in discrete steps, potentially offering a single size and shape (i.e., monodispersity). However, the photoluminescence (PL) spectra of CdSe MSNCs measured at room te… ▽ More Chemical syntheses that provide nanocrystals (NCs) with narrow distributions in size and shape are critical for NC research. This has led to the investigation of magic-sized NCs (MSNCs), a class of semiconductor crystallites that grow in discrete steps, potentially offering a single size and shape (i.e., monodispersity). However, the photoluminescence (PL) spectra of CdSe MSNCs measured at room temperature have been reported to be broader than those of state-of-the-art quantum dots. This difference could be due to the smaller size of MSNCs, which broadens their line widths, or due to their residual size dispersity. To better understand the optical performance of MSNCs, here we perform single-particle spectroscopy. Our results show that, while CdSe MSNCs do exhibit particle-to-particle variations that lead to modest broadening of their ensemble emission spectra, the largest contribution comes from the single-particle line width. By examining MSNCs with different sizes and shells, we conclude that this single-particle broadening is consistent with exciton coupling to acoustic phonons from the NC surface. Because of their small size, this coupling and the role of residual size dispersity have a larger impact on the ensemble emission line widths. Notably, when small (<2.7 nm diameter) MSNCs and quantum dots are compared, the ensemble PL line widths of MSNCs are actually sharper. Due to their small size, MSNCs also exhibit strong anti-bunching $[g^{(2)}(0) \sim 0.05]$ at room temperature. Thus, MSNCs represent a bright, spectrally pure class of quantum emitter, useful for applications in optoelectronic and quantum-information technologies where strong three-dimensional confinement is required. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09111 [pdf, other]

Inference Optimization of Foundation Models on AI Accelerators

Authors: Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis

Abstract: Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions… ▽ More Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Tutorial published at KDD 2024. Camera-ready version

arXiv:2407.09104 [pdf, other]

UserBoost: Generating User-specific Synthetic Data for Faster Enrolment into Behavioural Biometric Systems

Authors: George Webber, Jack Sturgess, Ivan Martinovic

Abstract: Behavioural biometric authentication systems entail an enrolment period that is burdensome for the user. In this work, we explore generating synthetic gestures from a few real user gestures with generative deep learning, with the application of training a simple (i.e. non-deep-learned) authentication model. Specifically, we show that utilising synthetic data alongside real data can reduce the numb… ▽ More Behavioural biometric authentication systems entail an enrolment period that is burdensome for the user. In this work, we explore generating synthetic gestures from a few real user gestures with generative deep learning, with the application of training a simple (i.e. non-deep-learned) authentication model. Specifically, we show that utilising synthetic data alongside real data can reduce the number of real datapoints a user must provide to enrol into a biometric system. To validate our methods, we use the publicly available dataset of WatchAuth, a system proposed in 2022 for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. We develop a regularised autoencoder model for generating synthetic user-specific wrist motion data representing these physical gestures, and demonstrate the diversity and fidelity of our synthetic gestures. We show that using synthetic gestures in training can improve classification ability for a real-world system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system by more than 40% without negatively impacting its error rates. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09102 [pdf, ps, other]

Quantitative diffusion approximation for the Neutral $r$-Alleles Wright-Fisher Model with Mutations

Authors: Peng Chen, Jie Xiong, Lihu Xu, Jiayu Zheng

Abstract: We apply a Lindeberg principle under the Markov process setting to approximate the Wright-Fisher model with neutral $r$-alleles using a diffusion process, deriving an error rate based on a function class distance involving fourth-order bounded differentiable functions. This error rate consists of a linear combination of the maximum mutation rate and the reciprocal of the population size. Our resul… ▽ More We apply a Lindeberg principle under the Markov process setting to approximate the Wright-Fisher model with neutral $r$-alleles using a diffusion process, deriving an error rate based on a function class distance involving fourth-order bounded differentiable functions. This error rate consists of a linear combination of the maximum mutation rate and the reciprocal of the population size. Our result improves the error bound in the seminal work [PNAS,1977], where only the special case $r=2$ was studied. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09095 [pdf, other]

TAPFixer: Automatic Detection and Repair of Home Automation Vulnerabilities based on Negated-property Reasoning

Authors: Yinbo Yu, Yuanqi Xu, Kepu Huang, Jiajia Liu

Abstract: Trigger-Action Programming (TAP) is a popular end-user programming framework in the home automation (HA) system, which eases users to customize home automation and control devices as expected. However, its simplified syntax also introduces new safety threats to HA systems through vulnerable rule interactions. Accurately fixing these vulnerabilities by logically and physically eliminating their roo… ▽ More Trigger-Action Programming (TAP) is a popular end-user programming framework in the home automation (HA) system, which eases users to customize home automation and control devices as expected. However, its simplified syntax also introduces new safety threats to HA systems through vulnerable rule interactions. Accurately fixing these vulnerabilities by logically and physically eliminating their root causes is essential before rules are deployed. However, it has not been well studied. In this paper, we present TAPFixer, a novel framework to automatically detect and repair rule interaction vulnerabilities in HA systems. It extracts TAP rules from HA profiles, translates them into an automaton model with physical and latency features, and performs model checking with various correctness properties. It then uses a novel negated-property reasoning algorithm to automatically infer a patch via model abstraction and refinement and model checking based on negated-properties. We evaluate TAPFixer on market HA apps (1177 TAP rules and 53 properties) and find that it can achieve an 86.65% success rate in repairing rule interaction vulnerabilities. We additionally recruit 23 HA users to conduct a user study that demonstrates the usefulness of TAPFixer for vulnerability repair in practical HA scenarios. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Journal ref: USENIX Security 2024

arXiv:2407.09093 [pdf, ps, other]

On Exact Bit-level Reversible Transformers Without Changing Architectures

Authors: Guoqiang Zhang, J. P. Lewis, W. B. Kleijn

Abstract: In the literature, various reversible deep neural networks (DNN) models have been proposed to reduce memory consumption or improve data-throughput in the training process. However, almost all existing reversible DNNs either are constrained to have special structures or are constructed by modifying the original DNN architectures considerably to enable reversibility. In this work, we propose exact b… ▽ More In the literature, various reversible deep neural networks (DNN) models have been proposed to reduce memory consumption or improve data-throughput in the training process. However, almost all existing reversible DNNs either are constrained to have special structures or are constructed by modifying the original DNN architectures considerably to enable reversibility. In this work, we propose exact bit-level reversible transformers without changing the architectures in the inference procedure. The basic idea is to first treat each transformer block as the Euler integration approximation for solving an ordinary differential equation (ODE) and then incorporate the technique of bidirectional integration approximation (BDIA) (see [26]) for BDIA-based diffusion inversion) into the neural architecture together with activation quantization to make it exactly bit-level reversible, referred to as BDIA-transformer. In the training process, we let a hyper-parameter $γ$ in BDIA-transformer randomly take one of the two values $\{0.5, -0.5\}$ per transformer block for averaging two consecutive integration approximations, which regularizes the models for improving the validation accuracy. Light-weight side information per transformer block is required to be stored in the forward process to account for binary quantization loss to enable exact bit-level reversibility. In the inference procedure, the expectation $\mathbb{E}(γ)=0$ is taken to make the resulting architectures of BDIA-transformer be identical to transformers up to activation quantization. Empirical study indicates that BDIA-transformers outperform their original counterparts notably due to the regularization effect of the $γ$ parameter. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09091 [pdf, other]

Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion

Authors: **hao He, Huaiyang Huang, Shuyang Zhang, Jianhao Jiao, Chengju Liu, Ming Liu

Abstract: Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve… ▽ More Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve comparable onboard localization performance by tracking deep-learning visual features on a LiDAR-enhanced visual prior map. Experiments show that the proposed algorithm can provide centimeter-level global positioning results with scale, which is effortlessly integrated and favorable for low-cost robot system deployment in real-world applications. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ICRA 2024

arXiv:2407.09089 [pdf]

Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis

Authors: Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong

Abstract: Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (… ▽ More Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a python-based bioinformatics toolkit that streamlines the generation of pathways and gene sets for transcriptomic analysis. It operates in three steps: 1) deriving relevant pathways based on the researcher's scientific question, 2) generating valid gene sets for each pathway, and 3) outputting the results as .GMX files. Lomics also provides explanations for pathway selections. Consistency and accuracy are ensured through iterative processes, JSON format validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol verification. Lomics serves as a foundation for integrating LLMs into omics research, potentially improving the specificity and efficiency of pathway analysis. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09082 [pdf, other]

The MeerKAT Fornax Survey. III. Ram-pressure strip** of the tidally interacting galaxy NGC 1427A in the Fornax cluster

Authors: P. Serra, T. A. Oosterloo, P. Kamphuis, G. I. G. Jozsa, W. J. G. de Blok, G. L. Bryan, J. H. van Gorkom, E. Iodice, D. Kleiner, A. Loni, S. I. Loubser, F. M. Maccagni, D. Molnar, R. Peletier, D. J. Pisano, M. Ramatsoku, M. W. L. Smith, M. A. W. Verheijen, N. Zabel

Abstract: We present MeerKAT Fornax Survey HI observations of NGC 1427A, a blue irregular galaxy with a stellar mass of 2e+9 Msun located near the centre of the Fornax galaxy cluster. Thanks to the excellent resolution (1 to 6 kpc spatially, 1.4 km/s in velocity) and HI column density sensitivity (4e+19/cm^2 to 1e+18/cm^2 depending on resolution), our data deliver new insights on the long-debated interactio… ▽ More We present MeerKAT Fornax Survey HI observations of NGC 1427A, a blue irregular galaxy with a stellar mass of 2e+9 Msun located near the centre of the Fornax galaxy cluster. Thanks to the excellent resolution (1 to 6 kpc spatially, 1.4 km/s in velocity) and HI column density sensitivity (4e+19/cm^2 to 1e+18/cm^2 depending on resolution), our data deliver new insights on the long-debated interaction of this galaxy with the cluster environment. We confirm the presence of a broad, one-sided, starless HI tail stretching from the outer regions of the stellar body and pointing away from the cluster centre. We find the tail to have 50% more HI (4e+8 Msun) and to be 3 times longer (70 kpc) than in previous observations. In fact, we detect scattered HI clouds out to 300 kpc from the galaxy in the direction of the tail -- possibly the most ancient remnant of the passage of NGC 1427A through the intracluster medium of Fornax. Both the velocity gradient along the HI tail and the peculiar kinematics of HI in the outer region of the stellar body are consistent with the effect of ram pressure given the line-of-sight motion of the galaxy within the cluster. However, several properties cannot be explained solely by ram pressure and suggest an ongoing tidal interaction. This includes: the close match between dense HI and stars within the disturbed stellar body; the abundant kinematically-anomalous HI; and the inversion of the HI velocity gradient near the base of the HI tail. We rule out an interaction with the cluster tidal field, and conclude that NGC 1427A is the result of a high-speed galaxy encounter or of a merger started at least 300 Myr ago, where ram pressure shapes the distribution and kinematics of the HI in the perturbed outer stellar body and in the tidal tails. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Astronomy & Astrophysics, accepted. Data available at the MeerKAT Fornax Survey website, https://sites.google.com/inaf.it/meerkatfornaxsurvey

arXiv:2407.09073 [pdf, other]

Open Vocabulary Multi-Label Video Classification

Authors: Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to… ▽ More Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously recognize multiple actions and entities e.g., objects in the video in an open vocabulary setting. We formulate this problem as open vocabulary multilabel video classification and propose a method to adapt a pre-trained VLM such as CLIP to solve this task. We leverage large language models (LLMs) to provide semantic guidance to the VLM about class labels to improve its open vocabulary performance with two key contributions. First, we propose an end-to-end trainable architecture that learns to prompt an LLM to generate soft attributes for the CLIP text-encoder to enable it to recognize novel classes. Second, we integrate a temporal modeling module into CLIP's vision encoder to effectively model the spatio-temporal dynamics of video concepts as well as propose a novel regularized finetuning technique to ensure strong open vocabulary classification performance in the video domain. Our extensive experimentation showcases the efficacy of our approach on multiple benchmark datasets. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.09069 [pdf, other]

Holographic Einstein Ring of Deformed AdS-Schwarzschild Black Holes

Authors: **-Yu Gui, Xiao-Xiong Zeng, Ke-Jian He, Huan Ye

Abstract: The Einstein ring of a deformed AdS-Schwarzschild black hole (BH) is investigated under the wave optics framework. When the source is fixed on the AdS boundary, we can obtain the corresponding response function generated on the opposite side of the boundary. Utilizing a specialized optical system equipped with a convex lens enables us to capture an image of the BH's holographic Einstein ring on th… ▽ More The Einstein ring of a deformed AdS-Schwarzschild black hole (BH) is investigated under the wave optics framework. When the source is fixed on the AdS boundary, we can obtain the corresponding response function generated on the opposite side of the boundary. Utilizing a specialized optical system equipped with a convex lens enables us to capture an image of the BH's holographic Einstein ring on the screen. The influence of the relevant physical parameters and the observed inclination on the characteristics of the Einstein ring can also be studied, which found that the alteration of the observer's position, the image displayed on the screen changes from an axisymmetric ring to an arc, ultimately becoming a solitary point of luminosity. Furthermore, variations in the relevant physical parameters naturally exert an influence on the Einstein ring, resulting in various changes to both its radius and brightness. We also discover the photon ring of the BH from the perspective of the geometric optics, and the numerical results demonstrate that the incident angle of the photon ring is consistent with that of the Einstein ring. It is found that the validity of studying the Einstein ring by wave optics. The investigation on the Einstein rings formed by the deformed AdS-Schwarzschild BHs is anticipated to yield more comprehensive insights into the spacetime geometry, enabling its distinction from Schwarzschild BH. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 12 pages, 21 figures

arXiv:2407.09064 [pdf, other]

Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports

Authors: Malte Tölle, Lukas Burger, Halvar Kelm, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Groß, Anja Hennemuth, Lars Kaderali, Nina Krüger, Andreas Leha, Simon Martin, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer, Moritz Seiffert, Jan Moritz Seliger, Stefan Simm, Tim Friede, Tim Seidler, Sandy Engelhardt

Abstract: Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance.… ▽ More Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09059 [pdf, other]

Domain-adaptive Video Deblurring via Test-time Blurring

Authors: **-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin

Abstract: Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adapta… ▽ More Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adaptation scheme based on a blurring model to achieve test-time fine-tuning for deblurring models in unseen domains. Since blurred and sharp pairs are unavailable for fine-tuning during inference, our scheme can generate domain-adaptive training pairs to calibrate a deblurring model for the target domain. First, a Relative Sharpness Detection Module is proposed to identify relatively sharp regions from the blurry input images and regard them as pseudo-sharp images. Next, we utilize a blurring model to produce blurred images based on the pseudo-sharp images extracted during testing. To synthesize blurred images in compliance with the target data distribution, we propose a Domain-adaptive Blur Condition Generation Module to create domain-specific blur conditions for the blurring model. Finally, the generated pseudo-sharp and blurred pairs are used to fine-tune a deblurring model for better performance. Extensive experimental results demonstrate that our approach can significantly improve state-of-the-art video deblurring methods, providing performance gains of up to 7.54dB on various real-world video deblurring datasets. The source code is available at https://github.com/**-Ting-He/DADeblur. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.09055 [pdf, other]

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis

Authors: Timothé Watteau, Aubin Bonnefoy, Simon Illouz-Laurent, Joaquim Jusseau, Serge Iovleff

Abstract: Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background sectio… ▽ More Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09053 [pdf, other]

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Authors: Jun Zhu, Zihao Du, Haotian Xu, Fengbo Lan, Zilong Zheng, Bo Ma, Shengjie Wang, Tao Zhang

Abstract: Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerat… ▽ More Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerator door). Humans intuitively navigate to objects with the right orientation using semantics and common sense. For instance, when opening a refrigerator, we naturally stand in front of it rather than to the side. Recent advances suggest that Vision-Language Models (VLMs) can provide robots with similar common sense. Therefore, we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches and precisely determines the optimal orientation relative to target objects. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09045 [pdf, other]

Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification

Authors: Chen Mao, Chong Tan, **gqi Hu, Min Zheng

Abstract: Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of rout… ▽ More Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09043 [pdf, other]

Molecule Language Model with Augmented Pairs and Expertise Transfer

Authors: Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

Abstract: Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augment… ▽ More Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ACL 2024 Workshop on Languages and Molecule

arXiv:2407.09041 [pdf, other]

Optimization of Long-Haul C+L+S Systems by means of a Closed Form EGN Model

Authors: Y. Jiang, J. Sarkis, A. Nespola, F. Forghieri, S. Piciaccia, A. Tanzi, M. Ranjbar Zefreh, P. Poggiolini

Abstract: We investigate C+L+S long-haul systems using a closed-form GN/EGN non-linearity model. We perform accurate launch power and Raman pump optimization. We show a potential 4x throughput increase over legacy C-band systems in 1000 km links, using moderate S-only Raman amplification. We simultaneously achieve extra-flat GSNR, within +/-0.5 dB across the whole C+L+S spectrum. We investigate C+L+S long-haul systems using a closed-form GN/EGN non-linearity model. We perform accurate launch power and Raman pump optimization. We show a potential 4x throughput increase over legacy C-band systems in 1000 km links, using moderate S-only Raman amplification. We simultaneously achieve extra-flat GSNR, within +/-0.5 dB across the whole C+L+S spectrum. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: The paper is identical to a manuscript submitted to PTL in June 2024, except this arXiv version has been updated in the references. Ref. [8] and [10] are about CFM6 and its experimental validation

arXiv:2407.09038 [pdf, other]

High-Resolution Hyperspectral Video Imaging Using A Hexagonal Camera Array

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspect… ▽ More Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspectral image at once, also called snapshot hyperspectral imaging, to avoid spectral smearing. For this task, a high-resolution snapshot hyperspectral camera array using a hexagonal shape is introduced.The hexagonal array for hyperspectral imaging uses off-the-shelf hardware, which enables high flexibility regarding employed cameras, lenses and filters. Hence, the spectral range can be easily varied by mounting a different set of filters. Moreover, the concept of using off-the-shelf hardware enables low prices in comparison to other approaches with highly specialized hardware. Since classical industrial cameras are used in this hyperspectral camera array, the spatial and temporal resolution is very high, while recording 37 hyperspectral channels in the range from 400 nm to 760 nm in 10 nm steps. A registration process is required for near-field imaging, which maps the peripheral camera views to the center view. It is shown that this combination using a hyperspectral camera array and the corresponding image registration pipeline is superior in comparison to other popular snapshot approaches. For this evaluation, a synthetic hyperspectral database is rendered. On the synthetic data, the novel approach outperforms its best competitor by more than 3 dB in reconstruction quality. This synthetic data is also used to show the superiority of the hexagonal shape in comparison to an orthogonal-spaced one. Moreover, a real-world high resolution hyperspectral video database is provided. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09036 [pdf, ps, other]

On the structure of the complement of skeleton

Authors: Morgan Brown, Jiachang Xu, Muyuan Zhang

Abstract: We study the higher dimensional geometry of Berkovich spaces using open fiber disks, which are given by open disks in a relative dimension $1$ fibration. Inspired by birational geometry, we conjecture that the Berkovich skeleton is the complement of the union of all open fiber disks, and prove this conjecture for $\mathcal{X}$ admitting a strictly semistable model with semiample canonical class. We study the higher dimensional geometry of Berkovich spaces using open fiber disks, which are given by open disks in a relative dimension $1$ fibration. Inspired by birational geometry, we conjecture that the Berkovich skeleton is the complement of the union of all open fiber disks, and prove this conjecture for $\mathcal{X}$ admitting a strictly semistable model with semiample canonical class. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Comments are welcome!

MSC Class: 14G22; 14E30

arXiv:2407.09035 [pdf, other]

GPC: Generative and General Pathology Image Classifier

Authors: Anh Tien Nguyen, ** Tae Kwak

Abstract: Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, t… ▽ More Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, training resources, and cost. Moreover, transferring arbitrary task-specific model to another task is still a challenging problem. Herein, we propose a task-agnostic generative and general pathology image classifier, so called GPC, that aims at learning from diverse kinds of pathology images and conducting numerous classification tasks in a unified model. GPC, equipped with a convolutional neural network and a Transformer-based language model, maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts via the image-to-text classification mechanism. We evaluate GPC on six datasets for four different pathology image classification tasks. Experimental results show that GPC holds considerable potential for develo** an effective and efficient universal model for pathology image analysis. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: MICCAI-MedAGI 2023 (Best Paper Honorable Mention)

arXiv:2407.09032 [pdf, other]

DRM Revisited: A Complete Error Analysis

Authors: Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, **wen Zhang

Abstract: In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number o… ▽ More In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision? △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09030 [pdf, other]

CAMP: Continuous and Adaptive Learning Model in Pathology

Authors: Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, ** Tae Kwak

Abstract: There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pa… ▽ More There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pathology image classification. CAMP is a generative, efficient, and adaptive classification model that can continuously adapt to any classification task by leveraging pathology-specific prior knowledge and learning taskspecific knowledge with minimal computational cost and without forgetting the knowledge from the existing tasks. We evaluated CAMP on 22 datasets, including 1,171,526 patches and 11,811 pathology slides, across 17 classification tasks. CAMP achieves state-of-theart classification performance on a wide range of datasets and tasks at both patch- and slide-levels and reduces up to 94% of computation time and 85% of storage memory in comparison to the conventional classification models. Our results demonstrate that CAMP can offer a fundamental transformation in pathology image classification, paving the way for the fully digitized and computerized pathology practice. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2407.09029 [pdf, other]

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

Authors: Haoqin Sun, Shiwan Zhao, Shaokai Li, Xiangyu Kong, Xuechen Wang, Aobo Kong, Jiaming Zhou, Yong Chen, Wenjia Zeng, Yong Qin

Abstract: Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle… ▽ More Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09027 [pdf, other]

Exploring the role of criticality in the quantum Otto cycle fueled by the anisotropic quantum Rabi-Stark model

Authors: He-Guang Xu, Jiasen **, Norton G. de Almeida, G. D. de Moraes Neto

Abstract: Quantum heat machines, encompassing heat engines, refrigerators, heaters, and accelerators, represent the forefront of quantum thermodynamics, offering a novel paradigm for converting heat energy into useful mechanical work. Leveraging quantum mechanical principles, these machines promise superior efficiency and performance compared to classical counterparts, with potential applications in renewab… ▽ More Quantum heat machines, encompassing heat engines, refrigerators, heaters, and accelerators, represent the forefront of quantum thermodynamics, offering a novel paradigm for converting heat energy into useful mechanical work. Leveraging quantum mechanical principles, these machines promise superior efficiency and performance compared to classical counterparts, with potential applications in renewable energy and quantum computing. This paper investigates a quantum Otto engine operating in both ideal and finite-time scenarios, employing a two-level system interacting with a harmonic oscillator within the framework of the anisotropic quantum Rabi-Stark model (AQRSM) as the working medium. This model is notable for exhibiting both first-order and continuous quantum phase transitions. By focusing on quantum heat engines, our study reveals that these phase transitions critically modulate the efficiency and power of AQRSM-based engines, outperforming quantum engines fueled by working medium with harmonic spectrum. Additionally, we explore the impacts of quantum friction and conduct limit cycle analysis in finite-time operations, providing insights into optimizing quantum heat engines for practical implementation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09025 [pdf, other]

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Authors: Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang

Abstract: Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization… ▽ More Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09024 [pdf, other]

Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

Authors: Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

Abstract: Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generaliz… ▽ More Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them compatible with existing LLM alignment theories. During policy fine-tuning, we extend preference-based alignment methods like Direct Preference Optimization (DPO) to align diffusion behaviors with continuous Q-functions. Our evaluation on the D4RL benchmark shows that EDA exceeds all baseline methods in overall performance. Notably, EDA maintains about 95\% of performance and still outperforms several baselines given only 1\% of Q-labelled data during fine-tuning. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09021 [pdf, other]

Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge

Authors: Jun Wei Yeow, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

Abstract: This technical report details our systems submitted for Task 3 of the DCASE 2024 Challenge: Audio and Audiovisual Sound Event Localization and Detection (SELD) with Source Distance Estimation (SDE). We address only the audio-only SELD with SDE (SELDDE) task in this report. We propose to improve the existing ResNet-Conformer architectures with Squeeze-and-Excitation blocks in order to introduce add… ▽ More This technical report details our systems submitted for Task 3 of the DCASE 2024 Challenge: Audio and Audiovisual Sound Event Localization and Detection (SELD) with Source Distance Estimation (SDE). We address only the audio-only SELD with SDE (SELDDE) task in this report. We propose to improve the existing ResNet-Conformer architectures with Squeeze-and-Excitation blocks in order to introduce additional forms of channel- and spatial-wise attention. In order to improve SELD performance, we also utilize the Spatial Cue-Augmented Log-Spectrogram (SALSA) features over the commonly used log-mel spectra features for polyphonic SELD. We complement the existing Sony-TAu Realistic Spatial Soundscapes 2023 (STARSS23) dataset with the audio channel swap** technique and synthesize additional data using the SpatialScaper generator. We also perform distance scaling in order to prevent large distance errors from contributing more towards the loss function. Finally, we evaluate our approach on the evaluation subset of the STARSS23 dataset. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Technical report for DCASE 2024 Challenge Task 3

arXiv:2407.09020 [pdf, other]

3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection

Authors: Rina Carines Cabral, Siwen Luo, Soyeon Caren Han, Josiah Poon

Abstract: The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to compr… ▽ More The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to comprehend complex situations or issues, we present a novel approach to address the limitations of current methodologies. In this work, we introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification, leveraging insights from cross-modal human understanding. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures (e.g., texts and sounds). To mitigate the computational complexity associated with integrating all features into a single model, we employ a multimodal and multi-teacher architecture. By distributing the learning process across multiple teachers, each specialising in a particular feature extraction aspect, we enhance the overall mental health classification performance. Through experimental validation, we demonstrate the efficacy of our model in achieving improved performance. All relevant codes will be made available upon publication. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09017 [pdf, other]

AI-Driven Guided Response for Security Operation Centers with Microsoft Copilot for Security

Authors: Scott Freitas, Jovan Kalajdjieski, Amir Gharib, Rob McCann

Abstract: Security operation centers contend with a constant stream of security incidents, ranging from straightforward to highly complex. To address this, we developed Copilot Guided Response (CGR), an industry-scale ML architecture that guides security analysts across three key tasks -- (1) investigation, providing essential historical context by identifying similar incidents; (2) triaging to ascertain th… ▽ More Security operation centers contend with a constant stream of security incidents, ranging from straightforward to highly complex. To address this, we developed Copilot Guided Response (CGR), an industry-scale ML architecture that guides security analysts across three key tasks -- (1) investigation, providing essential historical context by identifying similar incidents; (2) triaging to ascertain the nature of the incident -- whether it is a true positive, false positive, or benign positive; and (3) remediation, recommending tailored containment actions. CGR is integrated into the Microsoft Defender XDR product and deployed worldwide, generating millions of recommendations across thousands of customers. Our extensive evaluation, incorporating internal evaluation, collaboration with security experts, and customer feedback, demonstrates that CGR delivers high-quality recommendations across all three tasks. We provide a comprehensive overview of the CGR architecture, setting a precedent as the first cybersecurity company to openly discuss these capabilities in such depth. Additionally, we GUIDE, the largest public collection of real-world security incidents, spanning 13M evidences across 1M annotated incidents. By enabling researchers and practitioners to conduct research on real-world data, GUIDE advances the state of cybersecurity and supports the development of next-generation machine learning systems. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09016 [pdf, other]

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

Authors: Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu

Abstract: Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration beco… ▽ More Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration becomes more challenging in an open vocabulary setting. We introduce OVExp, a learning-based framework that integrates VLMs for Open-Vocabulary Exploration. OVExp constructs scene representations by encoding observations with VLMs and projecting them onto top-down maps for goal-conditioned exploration. Goals are encoded in the same VLM feature space, and a lightweight transformer-based decoder predicts target locations while maintaining versatile representation abilities. To address the impracticality of fusing dense pixel embeddings with full 3D scene reconstruction for training, we propose constructing maps using low-cost semantic categories and transforming them into CLIP's embedding space via the text encoder. The simple but effective design of OVExp significantly reduces computational costs and demonstrates strong generalization abilities to various navigation settings. Experiments on established benchmarks show OVExp outperforms previous zero-shot methods, can generalize to diverse scenes, and handle different goal modalities. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09014 [pdf, other]

CompAct: Compressing Retrieved Documents Actively for Question Answering

Authors: Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang

Abstract: Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios… ▽ More Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering (QA) benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x). △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Code available at https://github.com/dmis-lab/CompAct

arXiv:2407.09012 [pdf, other]

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Authors: Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo

Abstract: Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to… ▽ More Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/ △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: The first two authors contributed equally

arXiv:2407.09009 [pdf, other]

Probing Cold-to-Temperate Exoplanetary Atmospheres: The Role of Water Condensation on Surface Identification with JWST

Authors: Ziyu Huang, Xinting Yu, Shang-Min Tsai, Julianne I. Moses, Kazumasa Ohno, Joshua Krissansen-Totton, Xi Zhang, Jonathan Fortney

Abstract: Understanding the surface temperature and interior structure of cold-to-temperate sub-Neptunes is critical for assessing their habitability, yet direct observations are challenging. In this study, we investigate the impact of water condensation on the atmospheric compositions of sub-Neptunes, focusing on the implications for JWST spectroscopic observations. By modeling the atmospheric photochemist… ▽ More Understanding the surface temperature and interior structure of cold-to-temperate sub-Neptunes is critical for assessing their habitability, yet direct observations are challenging. In this study, we investigate the impact of water condensation on the atmospheric compositions of sub-Neptunes, focusing on the implications for JWST spectroscopic observations. By modeling the atmospheric photochemistry of two canonical sub-Neptunes, K2-18 b and LHS 1140 b, both with and without water condensation and with and without thick atmospheres, we demonstrate that water condensation can significantly affect the predicted atmospheric compositions. This effect is driven by oxygen depletion from the condensation of water vapor and primarily manifests as an increase in the C/O ratio within the photochemically active regions of the atmosphere. This change in composition particularly affects planets with thin H$_2$-dominated atmospheres, leading to a transition in dominant nitrogen and carbon carriers from N$_2$ and oxygen-rich species like CO/CO$_2$ towards heavier hydrocarbons and nitriles. While our models do not fully account for the loss mechanisms of these higher-order species, such molecules can go on to form more refractory molecules or hazes. Planets with thin H2-rich atmospheres undergoing significant water condensation are thus likely to exhibit very hazy atmospheres. The relatively flat JWST spectra observed for LHS 1140 b could be consistent with such a scenario, suggesting a shallow surface with extensive water condensation or a high atmospheric C/O ratio. Conversely, the JWST observations of K2-18 b are better aligned with a volatile-rich mini-Neptune with a thick atmosphere. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 21 pages, 7 figures

arXiv:2407.09005 [pdf, other]

Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

Authors: Yong** Kim, **bum Park, Sanha Kang, Hanguen Kim

Abstract: The maritime ship** industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light r… ▽ More The maritime ship** industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions. To address these challenges, high-performance deep learning algorithms tailored to maritime imagery and high-quality datasets specialized for maritime scenes are essential. Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems. Therefore, in this paper, we propose a Vertical and Detail Attention (VaDA) model for maritime object segmentation and a new model evaluation method, the Integrated Figure of Calculation Performance (IFCP), to verify its suitability for the system in real-time. Additionally, we introduce a benchmark maritime dataset, OASIs (Ocean AI Segmentation Initiatives) to standardize model performance evaluation across diverse maritime environments. OASIs dataset and details are available at our website: https://www.navlue.com/dataset △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 11 pages, 9 figures, whitepaper

Showing 101–150 of 709,815 results for author: J.