Search | arXiv e-print repository

Probing the Feasibility of Multilingual Speaker Anonymization

Authors: Sarina Meyer, Florian Lux, Ngoc Thang Vu

Abstract: In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing almost exclusively on English data. In this study, we extend a state-of-the-art anonymization system to nine languages by transforming language-dependen… ▽ More In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing almost exclusively on English data. In this study, we extend a state-of-the-art anonymization system to nine languages by transforming language-dependent components to their multilingual counterparts. Experiments testing the robustness of the anonymized speech against privacy attacks and speech deterioration show an overall success of this system for all languages. The results suggest that speaker embeddings trained on English data can be applied across languages, and that the anonymization performance for a language is mainly affected by the quality of the speech synthesis component used for it. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: accepted at Interspeech 2024

arXiv:2407.01381 [pdf, other]

Polaritonic Chemistry using the Density Matrix Renormalization Group Method

Authors: Mikuláš Matoušek, Nam Vu, Niranjan Govind, Jonathan J. Foley IV, Libor Veis

Abstract: The emerging field of polaritonic chemistry explores the behavior of molecules under strong coupling with cavity modes. Despite recent developments in ab initio polaritonic methods for simulating polaritonic chemistry under electronic strong coupling, their capabilities are limited, especially in cases where the molecule also features strong electronic correlation. To bridge this gap, we have deve… ▽ More The emerging field of polaritonic chemistry explores the behavior of molecules under strong coupling with cavity modes. Despite recent developments in ab initio polaritonic methods for simulating polaritonic chemistry under electronic strong coupling, their capabilities are limited, especially in cases where the molecule also features strong electronic correlation. To bridge this gap, we have developed a novel method for cavity QED calculations utilizing the Density Matrix Renormalization Group (DMRG) algorithm in conjunction with the Pauli-Fierz Hamiltonian. Our approach is applied to investigate the effect of the cavity on the S0 -S1 transition of n-oligoacenes, with n ranging from 2 to 5, encompassing 22 fully correlated π orbitals in the largest pentacene molecule. Our findings indicate that the influence of the cavity intensifies with larger acenes. Additionally, we demonstrate that, unlike the full determinantal representation, DMRG efficiently optimizes and eliminates excess photonic degrees of freedom, resulting in an asymptotically constant computational cost as the photonic basis increases. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00145 [pdf, other]

Co-evolving networks for opinion and social dynamics in agent-based models

Authors: Nataša Djurdjevac Conrad, Nhu Quang Vu, Sören Nagel

Abstract: The rise of digital social media has strengthened the coevolution of public opinions and social interactions, that shape social structures and collective outcomes in increasingly complex ways. Existing literature often explores this interplay as a one-directional influence, focusing on how opinions determine social ties within adaptive networks. However, this perspective overlooks the intrinsic dy… ▽ More The rise of digital social media has strengthened the coevolution of public opinions and social interactions, that shape social structures and collective outcomes in increasingly complex ways. Existing literature often explores this interplay as a one-directional influence, focusing on how opinions determine social ties within adaptive networks. However, this perspective overlooks the intrinsic dynamics driving social interactions, which can significantly influence how opinions form and evolve. In this work, we address this gap, by introducing the co-evolving opinion and social dynamics using stochastic agent-based models. Agents' mobility in a social space is governed by both their social and opinion similarity with others. Similarly, the dynamics of opinion formation is driven by the opinions of agents in their social vicinity. We analyze the underlying social and opinion interaction networks and explore the mechanisms influencing the appearance of emerging phenomena, like echo chambers and opinion consensus. To illustrate the model's potential for real-world analysis, we apply it to General Social Survey data on political identity and public opinion regarding governmental issues. Our findings highlight the model's strength in capturing the coevolution of social connections and individual opinions over time. △ Less

Submitted 28 June, 2024; originally announced July 2024.

MSC Class: 91Dxx; 05C82; 37Hxx

arXiv:2406.19038 [pdf, other]

Binary neutron star mergers using a discontinuous Galerkin-finite difference hybrid method

Authors: Nils Deppe, Francois Foucart, Marceline S. Bonilla, Michael Boyle, Nicholas J. Corso, Matthew D. Duez, Matthew Giesler, François Hébert, Lawrence E. Kidder, Yoonsoo Kim, Prayush Kumar, Isaac Legred, Geoffrey Lovelace, Elias R. Most, Jordan Moxon, Kyle C. Nelli, Harald P. Pfeiffer, Mark A. Scheel, Saul A. Teukolsky, William Throwe, Nils L. Vu

Abstract: We present a discontinuous Galerkin-finite difference hybrid scheme that allows high-order shock capturing with the discontinuous Galerkin method for general relativistic magnetohydrodynamics in dynamical spacetimes. We present several optimizations and stability improvements to our algorithm that allow the hybrid method to successfully simulate single, rotating, and binary neutron stars. The hybr… ▽ More We present a discontinuous Galerkin-finite difference hybrid scheme that allows high-order shock capturing with the discontinuous Galerkin method for general relativistic magnetohydrodynamics in dynamical spacetimes. We present several optimizations and stability improvements to our algorithm that allow the hybrid method to successfully simulate single, rotating, and binary neutron stars. The hybrid method achieves the efficiency of discontinuous Galerkin methods throughout almost the entire spacetime during the inspiral phase, while being able to robustly capture shocks and resolve the stellar surfaces. We also use Cauchy-Characteristic evolution to compute the first gravitational waveforms at future null infinity from binary neutron star mergers. The simulations presented here are the first successful binary neutron star inspiral and merger simulations using discontinuous Galerkin methods. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 31 pages, 8 figures, comments welcome!

arXiv:2406.09489 [pdf, other]

Language-driven Grasp Detection

Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samples, over 3M objects, and upwards of 10M gras** instructions. We utilize foundation models to create a large-scale scene corpus with corresponding images and grasp prompts. We approach the language-driven grasp detection task as a conditional generation problem. Drawing on the success of diffusion models in generative tasks and given that language plays a vital role in this task, we propose a new language-driven grasp detection method based on diffusion models. Our key contribution is the contrastive training objective, which explicitly contributes to the denoising process to detect the grasp pose given the language instructions. We illustrate that our approach is theoretically supportive. The intensive experiments show that our method outperforms state-of-the-art approaches and allows real-world robotic gras**. Finally, we demonstrate our large-scale dataset enables zero-short grasp detection and is a challenging benchmark for future work. Project website: https://airvlab.github.io/grasp-anything/ △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 19 pages. Accepted to CVPR24

arXiv:2406.09039 [pdf, other]

Language-Driven Closed-Loop Gras** with Model-Predictive Trajectory Replanning

Authors: Huy Hoang Nguyen, Minh Nhat Vu, Florian Beck, Gerald Ebmer, Anh Nguyen, Andreas Kugi

Abstract: Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects… ▽ More Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/. △ Less

Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures

arXiv:2406.08410 [pdf, other]

Quasistationary hair for binary black hole initial data in scalar Gauss-Bonnet gravity

Authors: Peter James Nee, Guillermo Lara, Harald P. Pfeiffer, Nils L. Vu

Abstract: Recent efforts to numerically simulate compact objects in alternative theories of gravity have largely focused on the time-evolution equations. Another critical aspect is the construction of constraint-satisfying initial data with precise control over the properties of the systems under consideration. Here, we augment the extended conformal thin sandwich framework to construct quasistationary init… ▽ More Recent efforts to numerically simulate compact objects in alternative theories of gravity have largely focused on the time-evolution equations. Another critical aspect is the construction of constraint-satisfying initial data with precise control over the properties of the systems under consideration. Here, we augment the extended conformal thin sandwich framework to construct quasistationary initial data for black hole systems in scalar Gauss-Bonnet theory and numerically implement it in the open-source SpECTRE code. Despite the resulting elliptic system being singular at black hole horizons, we demonstrate how to construct numerical solutions that extend smoothly across the horizon. We obtain quasistationary scalar hair configurations in the test-field limit for black holes with linear/angular momentum as well as for black hole binaries. For isolated black holes, we explicitly show that the scalar profile obtained is stationary by evolving the system in time and compare against previous formulations of scalar Gauss-Bonnet initial data. In the case of the binary, we find that the scalar hair near the black holes can be markedly altered by the presence of the other black hole. The initial data constructed here enables targeted simulations in scalar Gauss-Bonnet simulations with reduced initial transients. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages, 11 figures

arXiv:2406.07124 [pdf, other]

CHARME: A chain-based reinforcement learning approach for the minor embedding problem

Authors: Hoang M. Ngo, Nguyen H K. Do, Minh N. Vu, Tamer Kahveci, My T. Thai

Abstract: Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the mino… ▽ More Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the minor embedding problem suffer from scalability issues when confronted with larger problem sizes. In this paper, we propose a novel approach utilizing Reinforcement Learning (RL) techniques to address the minor embedding problem, named CHARME. CHARME includes three key components: a Graph Neural Network (GNN) architecture for policy modeling, a state transition algorithm ensuring solution validity, and an order exploration strategy for effective training. Through comprehensive experiments on synthetic and real-world instances, we demonstrate that the efficiency of our proposed order exploration strategy as well as our proposed RL framework, CHARME. In details, CHARME yields superior solutions compared to fast embedding methods such as Minorminer and ATOM. Moreover, our method surpasses the OCT-based approach, known for its slower runtime but high-quality solutions, in several cases. In addition, our proposed exploration enhances the efficiency of the training of the CHARME framework by providing better solutions compared to the greedy strategy. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06406 [pdf, other]

Controlling Emotion in Text-to-Speech with Natural Language Prompts

Authors: Thomas Bott, Florian Lux, Ngoc Thang Vu

Abstract: In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points wi… ▽ More In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained. △ Less

Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: accepted at Interspeech 2024

arXiv:2406.06403 [pdf, other]

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Authors: Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu

Abstract: In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn… ▽ More In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: accepted at Interspeech 2024

arXiv:2405.09335 [pdf, other]

Prompting-based Synthetic Data Generation for Few-Shot Question Answering

Authors: Maximilian Schmidt, Andrea Bartezzaghi, Ngoc Thang Vu

Abstract: Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the… ▽ More Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using large language models can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme. As a result, we consistently outperform previous approaches on few-shot Question Answering. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: LREC-COLING 2024

arXiv:2405.08868 [pdf, other]

A Review of Gravitational Memory and BMS Frame Fixing in Numerical Relativity

Authors: Keefe Mitman, Michael Boyle, Leo C. Stein, Nils Deppe, Lawrence E. Kidder, Jordan Moxon, Harald P. Pfeiffer, Mark A. Scheel, Saul A. Teukolsky, William Throwe, Nils L. Vu

Abstract: Gravitational memory effects and the BMS freedoms exhibited at future null infinity have recently been resolved and utilized in numerical relativity simulations. With this, gravitational wave models and our understanding of the fundamental nature of general relativity have been vastly improved. In this paper, we review the history and intuition behind memory effects and BMS symmetries, how they ma… ▽ More Gravitational memory effects and the BMS freedoms exhibited at future null infinity have recently been resolved and utilized in numerical relativity simulations. With this, gravitational wave models and our understanding of the fundamental nature of general relativity have been vastly improved. In this paper, we review the history and intuition behind memory effects and BMS symmetries, how they manifest in gravitational waves, and how controlling the infinite number of BMS freedoms of numerical relativity simulations can crucially improve the waveform models that are used by gravitational wave detectors. We reiterate the fact that, with memory effects and BMS symmetries, not only can these next-generation numerical waveforms be used to observe never-before-seen physics, but they can also be used to test GR and learn new astrophysical information about our universe. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 20 pages, 8 figures. Submitted to CGQ's focus issue: Gravitational-Wave Memory Effects: From Theory to Observation

arXiv:2405.06197 [pdf, other]

Improved frequency spectra of gravitational waves with memory in a binary-black-hole simulation

Authors: Yitian Chen, Michael Boyle, Nils Deppe, Lawrence E. Kidder, Keefe Mitman, Jordan Moxon, Kyle C. Nelli, Harald P. Pfeiffer, Mark A. Scheel, William Throwe, Nils L. Vu, Saul A. Teukolsky

Abstract: Numerical relativists can now produce gravitational waveforms with memory effects routinely and accurately. The gravitational-wave memory effect contains very low-frequency components, including a persistent offset. The presence of these components violates basic assumptions about time-shift behavior underpinning standard data-analysis techniques in gravitational-wave astronomy. This poses a chall… ▽ More Numerical relativists can now produce gravitational waveforms with memory effects routinely and accurately. The gravitational-wave memory effect contains very low-frequency components, including a persistent offset. The presence of these components violates basic assumptions about time-shift behavior underpinning standard data-analysis techniques in gravitational-wave astronomy. This poses a challenge to the analysis of waveform spectra: How to preserve the low-frequency characteristics when transforming a time-domain waveform to the frequency domain. To tackle this challenge, we revisit the preprocessing procedures applied to the waveforms that contain memory effects. We find inconsistency between the zero-frequency limit of displacement memory and the low- frequency spectrum of the same memory preprocessed using the common scheme in literature. To resolve the inconsistency, we propose a new robust preprocessing scheme that produces the spectra of memory waveforms more faithfully. Using this new scheme, we inspect several characteristics of the spectrum of a memory waveform. In particular, we find a discernible beating pattern formed by the dominant oscillatory mode and the displacement memory. This pattern is absent in the spectrum of a waveform without memory. The difference between the memory and no-memory waveforms is too small to be observed by current-generation detectors in a single binary-black-hole event. Detecting the memory in a single event is likely to occur in the era of next-generation detectors. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 24 pages, 11 figures, 5 tables

arXiv:2405.06120 [pdf, other]

A discontinuous Galerkin scheme for elliptic equations on extremely stretched grids

Authors: Nils L. Vu

Abstract: Discontinuous Galerkin (DG) methods for solving elliptic equations are gaining popularity in the computational physics community for their high-order spectral convergence and their potential for parallelization on computing clusters. However, problems in numerical relativity with extremely stretched grids, such as initial data problems for binary black holes that impose boundary conditions at larg… ▽ More Discontinuous Galerkin (DG) methods for solving elliptic equations are gaining popularity in the computational physics community for their high-order spectral convergence and their potential for parallelization on computing clusters. However, problems in numerical relativity with extremely stretched grids, such as initial data problems for binary black holes that impose boundary conditions at large distances from the black holes, have proven challenging for DG methods. To alleviate this problem we have developed a primal DG scheme that is generically applicable to a large class of elliptic equations, including problems on curved and extremely stretched grids. The DG scheme accommodates two widely used initial data formulations in numerical relativity, namely the puncture formulation and the extended conformal thin-sandwich (XCTS) formulation. We find that our DG scheme is able to stretch the grid by a factor of $\sim 10^9$ and hence allows to impose boundary conditions at large distances. The scheme converges exponentially with resolution both for the smooth XCTS problem and for the non-smooth puncture problem. With this method we are able to generate high-quality initial data for binary black hole problems using a parallelizable DG scheme. The code is publicly available in the open-source SpECTRE numerical relativity code. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 12 pages, 10 figures. Results are reproducible with the ancillary input files

arXiv:2404.10922 [pdf, other]

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Authors: Pavel Denisov, Ngoc Thang Vu

Abstract: Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness th… ▽ More Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness the capabilities of LLMs for speech recognition and beyond. Utilizing a multi-instructional training approach, we demonstrate the transferability of linguistic knowledge from the text to the speech modality. Our experiments, conducted on 1900 hours of transcribed data from 139 languages, establish that a multilingual speech representation can be effectively learned and aligned with a multilingual LLM. While this learned representation initially shows limitations in task generalization, we address this issue by generating synthetic targets in a multi-instructional style. Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: NAACL Findings 2024

arXiv:2404.10222 [pdf, other]

Simulating electronic structure on bosonic quantum computers

Authors: Rishab Dutta, Nam P. Vu, Ningyi Lyu, Chen Wang, Victor S. Batista

Abstract: Computations with quantum harmonic oscillators or qumodes is a promising and rapidly evolving approach towards quantum computing. In contrast to qubits, which are two-level quantum systems, bosonic qumodes can in principle have infinite discrete levels, and can also be represented with continuous variable bases. One of the most promising applications of quantum computing is simulating many-fermion… ▽ More Computations with quantum harmonic oscillators or qumodes is a promising and rapidly evolving approach towards quantum computing. In contrast to qubits, which are two-level quantum systems, bosonic qumodes can in principle have infinite discrete levels, and can also be represented with continuous variable bases. One of the most promising applications of quantum computing is simulating many-fermion problems such as molecular electronic structure. Although there has been a lot of recent progress on simulating many-fermion systems on qubit-based quantum hardware, they can not be easily extended to bosonic quantum devices due to the fundamental difference in physics represented by qubits and qumodes. In this work, we show how an electronic structure Hamiltonian can be transformed into a system of qumodes with a fermion to boson map** scheme and apply it to simulate the electronic structure of dihydrogen molecule as a system of two qumodes. Our work opens the door for simulating many-fermion systems by harnessing the power of bosonic quantum devices. △ Less

Submitted 27 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 47 pages including references, 7 figures, revised

arXiv:2404.10214 [pdf, other]

Simulating Chemistry on Bosonic Quantum Devices

Authors: Rishab Dutta, Delmar G. A. Cabral, Ningyi Lyu, Nam P. Vu, Yuchen Wang, Brandon Allen, Xiaohan Dan, Rodrigo G. Cortiñas, Pouya Khazaei, Max Schäfer, Alejandro C. C. d. Albornoz, Scott E. Smart, Scott Nie, Michel H. Devoret, David A. Mazziotti, Prineha Narang, Chen Wang, James D. Whitfield, Angela K. Wilson, Heidi P. Hendrickson, Daniel A. Lidar, Francisco Pérez-Bernal, Lea F. Santos, Sabre Kais, Eitan Geva , et al. (1 additional authors not shown)

Abstract: Bosonic quantum devices offer a novel approach to realize quantum computations, where the quantum two-level system (qubit) is replaced with the quantum (an)harmonic oscillator (qumode) as the fundamental building block of the quantum simulator. The simulation of chemical structure and dynamics can then be achieved by representing or map** the system Hamiltonians in terms of bosonic operators. In… ▽ More Bosonic quantum devices offer a novel approach to realize quantum computations, where the quantum two-level system (qubit) is replaced with the quantum (an)harmonic oscillator (qumode) as the fundamental building block of the quantum simulator. The simulation of chemical structure and dynamics can then be achieved by representing or map** the system Hamiltonians in terms of bosonic operators. In this perspective, we review recent progress and future potential of using bosonic quantum devices for addressing a wide range of challenging chemical problems, including the calculation of molecular vibronic spectra, the simulation of gas-phase and solution-phase adiabatic and nonadiabatic chemical dynamics, the efficient solution of molecular graph theory problems, and the calculations of electronic structure. △ Less

Submitted 5 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 40 pages including references, 13 figures, revised

arXiv:2404.07122 [pdf, other]

Driver Attention Tracking and Analysis

Authors: Dat Viet Thanh Nguyen, Anh Tran, Hoai Nam Vu, Cuong Pham, Minh Hoai

Abstract: We propose a novel method to estimate a driver's points-of-gaze using a pair of ordinary cameras mounted on the windshield and dashboard of a car. This is a challenging problem due to the dynamics of traffic environments with 3D scenes of unknown depths. This problem is further complicated by the volatile distance between the driver and the camera system. To tackle these challenges, we develop a n… ▽ More We propose a novel method to estimate a driver's points-of-gaze using a pair of ordinary cameras mounted on the windshield and dashboard of a car. This is a challenging problem due to the dynamics of traffic environments with 3D scenes of unknown depths. This problem is further complicated by the volatile distance between the driver and the camera system. To tackle these challenges, we develop a novel convolutional network that simultaneously analyzes the image of the scene and the image of the driver's face. This network has a camera calibration module that can compute an embedding vector that represents the spatial configuration between the driver and the camera system. This calibration module improves the overall network's performance, which can be jointly trained end to end. We also address the lack of annotated data for training and evaluation by introducing a large-scale driving dataset with point-of-gaze annotations. This is an in situ dataset of real driving sessions in an urban city, containing synchronized images of the driving scene as well as the face and gaze of the driver. Experiments on this dataset show that the proposed method outperforms various baseline methods, having the mean prediction error of 29.69 pixels, which is relatively small compared to the $1280{\times}720$ resolution of the scene camera. △ Less

Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.04018 [pdf, ps, other]

doi 10.1145/3638529.3654140

Superior Genetic Algorithms for the Target Set Selection Problem Based on Power-Law Parameter Choices and Simple Greedy Heuristics

Authors: Benjamin Doerr, Martin S. Krejca, Nguyen Vu

Abstract: The target set selection problem (TSS) asks for a set of vertices such that an influence spreading process started in these vertices reaches the whole graph. The current state of the art for this NP-hard problem are three recently proposed randomized search heuristics, namely a biased random-key genetic algorithm (BRKGA) obtained from extensive parameter tuning, a max-min ant system (MMAS), and a… ▽ More The target set selection problem (TSS) asks for a set of vertices such that an influence spreading process started in these vertices reaches the whole graph. The current state of the art for this NP-hard problem are three recently proposed randomized search heuristics, namely a biased random-key genetic algorithm (BRKGA) obtained from extensive parameter tuning, a max-min ant system (MMAS), and a MMAS using Q-learning with a graph convolutional network. We show that the BRKGA with two simple modifications and without the costly parameter tuning obtains significantly better results. Our first modification is to simply choose all parameters of the BRKGA in each iteration randomly from a power-law distribution. The resulting parameterless BRKGA is already competitive with the tuned BRKGA, as our experiments on the previously used benchmarks show. We then add a natural greedy heuristic, namely to repeatedly discard small-degree vertices that are not necessary for reaching the whole graph. The resulting algorithm consistently outperforms all of the state-of-the-art algorithms. Besides providing a superior algorithm for the TSS problem, this work shows that randomized parameter choices and elementary greedy heuristics can give better results than complex algorithms and costly parameter tuning. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.17647 [pdf, other]

Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

Authors: Pascal Tilli, Ngoc Thang Vu

Abstract: The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for grap… ▽ More The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for graph-based VQA and demonstrate competitive performance on the GQA dataset. This approach bridges the gap between interpretability and performance. Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation, providing insight into the decision making. To evaluate the quality of these generated subgraphs, we compare them against established post-hoc explainability methods for graph neural networks, and perform a human evaluation. Moreover, we present quantitative metrics that correlate with the evaluations of human assessors, acting as automatic metrics for the generated explanatory subgraphs. Our implementation is available at https://github.com/DigitalPhonetics/Intrinsic-Subgraph-Generation-for-VQA. △ Less

Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2403.17582 [pdf, other]

Towards a Zero-Data, Controllable, Adaptive Dialog System

Authors: Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

Abstract: Conversational Tree Search (Väth et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in… ▽ More Conversational Tree Search (Väth et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in new domains. To address this, we explore approaches to generate this data directly from dialog trees. We improve the original approach, and show that agents trained on synthetic data can achieve comparable dialog success to models trained on human data, both when using a commercial Large Language Model for generation, or when using a smaller open-source model, running on a single GPU. We further demonstrate the scalability of our approach by collecting and testing on two new datasets: ONBOARD, a new domain hel** foreign residents moving to a new city, and the medical domain DIAGNOSE, a subset of Wikipedia articles related to scalp and head symptoms. Finally, we perform human testing, where no statistically significant differences were found in either objective or subjective measures between models trained on human and generated data. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.08705 [pdf, other]

Scalarization of isolated black holes in scalar Gauss-Bonnet theory in the fixing-the-equations approach

Authors: Guillermo Lara, Harald P. Pfeiffer, Nikolas A. Wittek, Nils L. Vu, Kyle C. Nelli, Alexander Carpenter, Geoffrey Lovelace, Mark A. Scheel, William Throwe

Abstract: One of the most promising avenues to perform numerical evolutions in theories beyond General Relativity is the fixing-the-equations approach, a proposal in which new ``driver'' equations are added to the evolution equations in a way that allows for stable numerical evolutions. In this direction, we extend the numerical relativity code SpECTRE to evolve a ``fixed'' version of scalar Gauss-Bonnet th… ▽ More One of the most promising avenues to perform numerical evolutions in theories beyond General Relativity is the fixing-the-equations approach, a proposal in which new ``driver'' equations are added to the evolution equations in a way that allows for stable numerical evolutions. In this direction, we extend the numerical relativity code SpECTRE to evolve a ``fixed'' version of scalar Gauss-Bonnet theory in the decoupling limit, a phenomenologically interesting theory that allows for hairy black hole solutions in vacuum. We focus on isolated black hole systems both with and without linear and angular momentum, and propose a new driver equation to improve the recovery of such stationary solutions. We demonstrate the effectiveness of the latter by numerically evolving black holes that undergo spontaneous scalarization using different driver equations. Finally, we evaluate the accuracy of the obtained solutions by comparing with the original unaltered theory. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 16 pages, 12 figures

arXiv:2403.05338 [pdf, other]

Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

Authors: Wei Zhou, Heike Adel, Hendrik Schuff, Ngoc Thang Vu

Abstract: Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution sc… ▽ More Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness and comparing them with attribution scores extracted from fine-tuned models and large language models. In contrast to previous work, we introduce training size as another dimension into the analysis. We find that using the prompting paradigm (with either encoder-based or decoder-based models) yields more plausible explanations than fine-tuning the models in low-resource settings and Shapley Value Sampling consistently outperforms attention and Integrated Gradients in terms of leading to more plausible and faithful explanations. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04784 [pdf, other]

Analysis of Privacy Leakage in Federated Large Language Models

Authors: Minh N. Vu, Truc Nguyen, Tre' R. Jeter, My T. Thai

Abstract: With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is… ▽ More With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.04769 [pdf, other]

Hierarchical Motion Planning and Offline Robust Model Predictive Control for Autonomous Vehicles

Authors: Hung Duy Nguyen, Minh Nhat Vu, Nguyen Ngoc Nam, Kyoungseok Han

Abstract: Driving vehicles in complex scenarios under harsh conditions is the biggest challenge for autonomous vehicles (AVs). To address this issue, we propose hierarchical motion planning and robust control strategy using the front-active steering system in complex scenarios with various slippery road adhesion coefficients while considering vehicle uncertain parameters. Behaviors of human vehicles (HVs) a… ▽ More Driving vehicles in complex scenarios under harsh conditions is the biggest challenge for autonomous vehicles (AVs). To address this issue, we propose hierarchical motion planning and robust control strategy using the front-active steering system in complex scenarios with various slippery road adhesion coefficients while considering vehicle uncertain parameters. Behaviors of human vehicles (HVs) are considered and modeled in the form of a car-following model via the Intelligent Driver Model (IDM). Then, in the upper layer, the motion planner first generates an optimal trajectory by using the artificial potential field (APF) algorithm to formulate any surrounding objects, e.g., road marks, boundaries, and static/dynamic obstacles. To track the generated optimal trajectory, in the lower layer, an offline-constrained output feedback robust model predictive control (RMPC) is employed for the linear parameter varying (LPV) system by applying linear matrix inequality (LMI) optimization method that ensures the robustness against the model parameter uncertainties. Furthermore, by augmenting the system model, our proposed approach, called offline RMPC, achieves outstanding efficiency compared to three existing RMPC approaches, e.g., offset-offline RMPC, online RMPC, and offline RMPC without an augmented model (offline RMPC w/o AM), in both improving computing time and reducing input vibrations. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 6 pages, 9 illustrations, Accepted for publication in American Control Conference (ACC) 2024

arXiv:2402.04730 [pdf, other]

Model Predictive Trajectory Optimization With Dynamically Changing Waypoints for Serial Manipulators

Authors: Florian Beck, Minh Nhat Vu, Christian Hartl-Nesic, Andreas Kugi

Abstract: Systematically including dynamically changing waypoints as desired discrete actions, for instance, resulting from superordinate task planning, has been challenging for online model predictive trajectory optimization with short planning horizons. This paper presents a novel waypoint model predictive control (wMPC) concept for online replanning tasks. The main idea is to split the planning horizon a… ▽ More Systematically including dynamically changing waypoints as desired discrete actions, for instance, resulting from superordinate task planning, has been challenging for online model predictive trajectory optimization with short planning horizons. This paper presents a novel waypoint model predictive control (wMPC) concept for online replanning tasks. The main idea is to split the planning horizon at the waypoint when it becomes reachable within the current planning horizon and reduce the horizon length towards the waypoints and goal points. This approach keeps the computational load low and provides flexibility in adapting to changing conditions in real time. The presented approach achieves competitive path lengths and trajectory durations compared to (global) offline RRT-type planners in a multi-waypoint scenario. Moreover, the ability of wMPC to dynamically replan tasks online is experimentally demonstrated on a KUKA LBR iiwa 14 R820 robot in a dynamic pick-and-place scenario. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 8 pages, 6 figures

arXiv:2402.02819 [pdf, other]

doi 10.1103/PhysRevD.109.124030

Striking the right tone: toward a self-consistent framework for measuring black hole ringdowns

Authors: Teagan A. Clarke, Maximiliano Isi, Paul D. Lasky, Eric Thrane, Michael Boyle, Nils Deppe, Lawrence E. Kidder, Keefe Mitman, Jordan Moxon, Kyle C. Nelli, William Throwe, Nils L. Vu

Abstract: The ringdown portion of a binary black hole merger consists of a sum of modes, each containing an infinite number of tones that are exponentially damped sinusoids. In principle, these can be measured as gravitational-waves with observatories like LIGO/Virgo/KAGRA, however in practice it is unclear how many tones can be meaningfully resolved. We investigate the consistency and resolvability of the… ▽ More The ringdown portion of a binary black hole merger consists of a sum of modes, each containing an infinite number of tones that are exponentially damped sinusoids. In principle, these can be measured as gravitational-waves with observatories like LIGO/Virgo/KAGRA, however in practice it is unclear how many tones can be meaningfully resolved. We investigate the consistency and resolvability of the overtones of the quadrupolar $\ell = m = 2$ mode by starting at late times when the gravitational waveform is expected to be well-approximated by the $\ell m n = 220$ tone alone. We present a Bayesian inference framework to measure the tones in numerical relativity data. We measure tones at different start times, checking for consistency: we classify a tone as stably recovered if and only if the 95\% credible intervals for amplitude and phase at time $t$ overlap with the credible intervals at all subsequent times. We test a set of tones including the first four overtones of the fundamental mode and the 320 tone and find that the 220 and 221 tones can be measured consistently with the inclusion of additional overtones. The 222 tone measurements can be stabilised when we include the 223 tone, but only in a narrow time window, after which it is too weak to measure. The 223 tone recovery appears to be unstable, and does not become stable with the introduction of the 224 tone. We find that $N=3$ tones can be stably recovered simultaneously. However, when analysing $N \geq 4$ tones, the amplitude of one tone is consistent with zero. Thus, within our framework, one can identify only $N=3$ tones with non-zero amplitude that are simultaneously stable. △ Less

Submitted 11 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 14 pages, 8 figures, 2 tables. Published in PRD

arXiv:2401.17676 [pdf, other]

Observer-based Controller Design for Oscillation Dam** of a Novel Suspended Underactuated Aerial Platform

Authors: Hemjyoti Das, Minh Nhat Vu, Tobias Egle, Christian Ott

Abstract: In this work, we present a novel actuation strategy for a suspended aerial platform. By utilizing an underactuation approach, we demonstrate the successful oscillation dam** of the proposed platform, modeled as a spherical double pendulum. A state estimator is designed in order to obtain the deflection angles of the platform, which uses only onboard IMU measurements. The state estimator is an ex… ▽ More In this work, we present a novel actuation strategy for a suspended aerial platform. By utilizing an underactuation approach, we demonstrate the successful oscillation dam** of the proposed platform, modeled as a spherical double pendulum. A state estimator is designed in order to obtain the deflection angles of the platform, which uses only onboard IMU measurements. The state estimator is an extended Kalman filter (EKF) with intermittent measurements obtained at different frequencies. An optimal state feedback controller and a PD+ controller are designed in order to dampen the oscillations of the platform in the joint space and task space respectively. The proposed underactuated platform is found to be more energy-efficient than an omnidirectional platform and requires fewer actuators. The effectiveness of our proposed system is validated using both simulations and experimental studies. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 7 pages, 11 figures, Accepted for publication to ICRA 2024

arXiv:2401.09059 [pdf, other]

Autonomous Catheterization with Open-source Simulator and Expert Trajectory

Authors: Tudor Jianu, Baoru Huang, Tuan Vo, Minh Nhat Vu, **gxuan Kang, Hoan Nguyen, Olatunji Omisore, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

Abstract: Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures… ▽ More Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures. In this chapter, we introduce CathSim, the first open-source simulator for endovascular intervention to address these limitations. CathSim emphasizes real-time performance to enable rapid development and testing of learning algorithms. We validate CathSim against the real robot and show that our simulator can successfully mimic the behavior of the real robot. Based on CathSim, we develop a multimodal expert navigation network and demonstrate its effectiveness in downstream endovascular navigation tasks. The intensive experimental results suggest that CathSim has the potential to significantly accelerate research in the autonomous catheterization field. Our project is publicly available at https://github.com/airvlab/cathsim. △ Less

Submitted 19 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Code: https://github.com/airvlab/cathsim

arXiv:2401.00805 [pdf, other]

Nonlinear Effects In Black Hole Ringdown From Scattering Experiments I: spin and initial data dependence of quadratic mode coupling

Authors: Hengrui Zhu, Justin L. Ripley, Frans Pretorius, Sizheng Ma, Keefe Mitman, Robert Owen, Michael Boyle, Yitian Chen, Nils Deppe, Lawrence E. Kidder, Jordan Moxon, Kyle C. Nelli, Harald P. Pfeiffer, Mark A. Scheel, William Throwe, Nils L. Vu

Abstract: We investigate quadratic quasinormal mode coupling in black hole spacetime through numerical simulations of single perturbed black holes using both numerical relativity and second-order black hole perturbation theory. Focusing on the dominant $\ell=|m|=2$ quadrupolar modes, we find good agreement (within $\sim10\%$) between these approaches, with discrepancies attributed to truncation error and un… ▽ More We investigate quadratic quasinormal mode coupling in black hole spacetime through numerical simulations of single perturbed black holes using both numerical relativity and second-order black hole perturbation theory. Focusing on the dominant $\ell=|m|=2$ quadrupolar modes, we find good agreement (within $\sim10\%$) between these approaches, with discrepancies attributed to truncation error and uncertainties from mode fitting. Our results align with earlier studies extracting the coupling coefficients from select binary black hole merger simulations, showing consistency for the same remnant spins. Notably, the coupling coefficient is insensitive to a diverse range of initial data, including configurations that led to a significant (up to $5\%$) increase in the remnant black hole mass. These findings present opportunities for testing the nonlinear dynamics of general relativity with ground-based gravitational wave observatories. Lastly, we provide evidence of a bifurcation in coupling coefficients between counter-rotating and co-rotating quasinormal modes as black hole spin increases. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.08588 [pdf, other]

Black Hole Spectroscopy for Precessing Binary Black Hole Coalescences

Authors: Hengrui Zhu, Harrison Siegel, Keefe Mitman, Maximiliano Isi, Will M. Farr, Michael Boyle, Nils Deppe, Lawrence E. Kidder, Sizheng Ma, Jordan Moxon, Kyle C. Nelli, Harald P. Pfeiffer, Mark A. Scheel, Saul A. Teukolsky, William Throwe, Vijay Varma, Nils L. Vu

Abstract: To accurately perform black hole spectroscopy, it is essential to know which quasinormal modes dominate astrophysical ringdown signals. In this Letter, we present a phenomenological description of the quasinormal modes that are excited in the ringdowns of comparable mass, quasi-circular precessing binary black hole coalescences. By analyzing an exhaustive catalog of numerical relativity simulation… ▽ More To accurately perform black hole spectroscopy, it is essential to know which quasinormal modes dominate astrophysical ringdown signals. In this Letter, we present a phenomenological description of the quasinormal modes that are excited in the ringdowns of comparable mass, quasi-circular precessing binary black hole coalescences. By analyzing an exhaustive catalog of numerical relativity simulations, we confirm that the relative fundamental quasinormal mode amplitudes of precessing systems are related to those of non-precessing systems by a simple rotation, and that additional structure in the spectrum is connected to the system's kick velocity and other asymmetries in the orbital dynamics. We find that the ringdowns of precessing systems need not be dominated by the ${(\ell,m)=(2,\pm 2)}$ quasinormal modes. These results build upon previous works on waveform modeling, and are consistent with a recent ringdown analysis of the LIGO-Virgo gravitational wave signal GW190521. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: Data Release and Analysis Scripts: https://github.com/HengruiPrinceton/precession_ringdown

arXiv:2311.14465 [pdf, other]

DP-NMT: Scalable Differentially-Private Machine Translation

Authors: Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal

Abstract: Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implemen… ▽ More Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, kee** the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community. △ Less

Submitted 24 April, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Accepted at EACL 2024

arXiv:2310.17502 [pdf, other]

doi 10.21437/Interspeech.2023-858

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

Abstract: Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui… ▽ More Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

arXiv:2310.17499 [pdf, other]

The IMS Toucan System for the Blizzard Challenge 2023

Authors: Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

Abstract: For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synt… ▽ More For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Published at the Blizzard Challenge Workshop 2023, colocated with the Speech Synthesis Workshop 2023, a sattelite event of the Interspeech 2023

arXiv:2310.16618 [pdf, other]

Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers

Authors: Gerald Ebmer, Adam Loch, Minh Nhat Vu, Germain Haessig, Roberto Mecca, Markus Vincze, Christian Hartl-Nesic, Andreas Kugi

Abstract: Real-time applications for autonomous operations depend largely on fast and robust vision-based localization systems. Since image processing tasks require processing large amounts of data, the computational resources often limit the performance of other processes. To overcome this limitation, traditional marker-based localization systems are widely used since they are easy to integrate and achieve… ▽ More Real-time applications for autonomous operations depend largely on fast and robust vision-based localization systems. Since image processing tasks require processing large amounts of data, the computational resources often limit the performance of other processes. To overcome this limitation, traditional marker-based localization systems are widely used since they are easy to integrate and achieve reliable accuracy. However, classical marker-based localization systems significantly depend on standard cameras with low frame rates, which often lack accuracy due to motion blur. In contrast, event-based cameras provide high temporal resolution and a high dynamic range, which can be utilized for fast localization tasks, even under challenging visual conditions. This paper proposes a simple but effective event-based pose estimation system using active LED markers (ALM) for fast and accurate pose estimation. The proposed algorithm is able to operate in real time with a latency below \SI{0.5}{\milli\second} while maintaining output rates of \SI{3}{\kilo \hertz}. Experimental results in static and dynamic scenarios are presented to demonstrate the performance of the proposed approach in terms of computational speed and absolute accuracy, using the OptiTrack system as the basis for measurement. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 14 pages, 12 figures, this paper has been accepted to WACV 2024

arXiv:2310.15948 [pdf, other]

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

Authors: An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen

Abstract: Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which… ▽ More Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.15262 [pdf, other]

Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

Authors: Injy Hamed, Nizar Habash, Ngoc Thang Vu

Abstract: Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW.… ▽ More Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023

arXiv:2310.06103 [pdf, other]

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

Authors: Pavel Denisov, Ngoc Thang Vu

Abstract: A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four… ▽ More A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four languages in a generative manner, including the prediction of lexical fillers. We investigate how the proposed method can be improved by pretraining on widely available speech recognition data using several training objectives. Pretraining on 7000 hours of multilingual data allows us to outperform the state-of-the-art ultimately on two SLU datasets and partly on two more SLU datasets. Finally, we examine the cross-lingual capabilities of the proposed model and improve on the best known result on the PortMEDIA-Language dataset by almost half, achieving a Concept/Value Error Rate of 23.65%. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2023

arXiv:2309.10932 [pdf, other]

Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation

Authors: Tuan Van Vo, Minh Nhat Vu, Baoru Huang, Toan Nguyen, Ngan Le, Thieu Vo, Anh Nguyen

Abstract: Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D po… ▽ More Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: 8 pages

arXiv:2309.10911 [pdf, other]

Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Authors: Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van Vo, Vy Truong, Ngan Le, Thieu Vo, Bac Le, Anh Nguyen

Abstract: Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-wor… ▽ More Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3DAPNet.github.io △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Project page: https://3DAPNet.github.io

arXiv:2309.09818 [pdf, other]

Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

Authors: An Dinh Vuong, Minh Nhat Vu, Hieu Le, Baoru Huang, Binh Huynh, Thieu Vo, Andreas Kugi, Anh Nguyen

Abstract: Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately… ▽ More Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Project page: https://grasp-anything-2023.github.io

arXiv:2309.08049 [pdf, other]

doi 10.1109/OJSP.2023.3344375

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Authors: Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu

Abstract: Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity… ▽ More Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which improves the quality of the evaluation and reduces their computation time by 65 to 95%, depending on the metric. Our code is fully open source. △ Less

Submitted 21 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Accepted by OJSP-ICASSP 2024 https://ieeexplore.ieee.org/document/10365329

arXiv:2308.15005 [pdf, other]

Few-Shot Object Detection via Synthetic Features with Optimal Transport

Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Vinh-Tiep Nguyen, Tam Le, Minh-Triet Tran, Tam V. Nguyen

Abstract: Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. However, most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes that lack diversity. Hence, they may not be sufficient to capture the data distribution. To address that limitation, in this paper, we propose a novel… ▽ More Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. However, most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes that lack diversity. Hence, they may not be sufficient to capture the data distribution. To address that limitation, in this paper, we propose a novel approach in which we train a generator to generate synthetic data for novel classes. Still, directly training a generator on the novel class is not effective due to the lack of novel data. To overcome that issue, we leverage the large-scale dataset of base classes. Our overarching goal is to train a generator that captures the data variations of the base dataset. We then transform the captured variations into novel classes by generating synthetic data with the trained generator. To encourage the generator to capture data variations on base classes, we propose to train the generator with an optimal transport loss that minimizes the optimal transport distance between the distributions of real and synthetic data. Extensive experiments on two benchmark datasets demonstrate that the proposed method outperforms the state of the art. Source code will be available. △ Less

Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.10361 [pdf, other]

doi 10.1103/PhysRevD.109.124027

Fully relativistic three-dimensional Cauchy-characteristic matching for physical degrees of freedom

Authors: Sizheng Ma, Jordan Moxon, Mark A. Scheel, Kyle C. Nelli, Nils Deppe, Marceline S. Bonilla, Lawrence E. Kidder, Prayush Kumar, Geoffrey Lovelace, William Throwe, Nils L. Vu

Abstract: A fully relativistic three-dimensional Cauchy-characteristic matching (CCM) algorithm is implemented for physical degrees of freedom in a numerical relativity code SpECTRE. The method is free of approximations and can be applied to any physical system. We test the algorithm with various scenarios involving smooth data, including the propagation of Teukolsky waves within a flat background, the pert… ▽ More A fully relativistic three-dimensional Cauchy-characteristic matching (CCM) algorithm is implemented for physical degrees of freedom in a numerical relativity code SpECTRE. The method is free of approximations and can be applied to any physical system. We test the algorithm with various scenarios involving smooth data, including the propagation of Teukolsky waves within a flat background, the perturbation of a Kerr black hole with a Teukolsky wave, and the injection of a gravitational-wave pulse from the characteristic grid. Our investigations reveal no numerical instabilities in the simulations. In addition, the tests indicate that the CCM algorithm effectively directs characteristic information into the inner Cauchy system, yielding higher precision in waveforms and smaller violations of Bondi-gauge constraints, especially when the outer boundary of the Cauchy evolution is at a smaller radius. △ Less

Submitted 11 June, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

Journal ref: Phys. Rev. D 109, 124027 (2024)

arXiv:2308.06420 [pdf, other]

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

Authors: Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

Abstract: Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2) mammography exams contain two views of each breast, and both… ▽ More Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2) mammography exams contain two views of each breast, and both views ought to be considered to make a correct assessment; (3) most mammograms are negative and do not contain any findings. In this work, we tackle the three aforementioned challenges by: (1) leveraging Sparse R-CNN and showing that sparse detectors are more appropriate than dense detectors for mammography; (2) including a multi-view cross-attention module to synthesize information from different views; (3) incorporating multi-instance learning (MIL) to train with unannotated images and perform breast-level classification. The resulting model, M&M, is a Multi-view and Multi-instance learning system that can both localize malignant findings and provide breast-level predictions. We validate M&M's detection and classification performance using five mammography datasets. In addition, we demonstrate the effectiveness of each proposed component through comprehensive ablation studies. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: MICCAI 2023 with supplementary materials

arXiv:2307.03435 [pdf, other]

doi 10.1103/PhysRevD.108.084015

Extending black-hole remnant surrogate models to extreme mass ratios

Authors: Matteo Boschini, Davide Gerosa, Vijay Varma, Cristobal Armaza, Michael Boyle, Marceline S. Bonilla, Andrea Ceja, Yitian Chen, Nils Deppe, Matthew Giesler, Lawrence E. Kidder, Prayush Kumar, Guillermo Lara, Oliver Long, Sizheng Ma, Keefe Mitman, Peter James Nee, Harald P. Pfeiffer, Antoni Ramos-Buades, Mark A. Scheel, Nils L. Vu, Jooheon Yoo

Abstract: Numerical-relativity surrogate models for both black-hole merger waveforms and remnants have emerged as important tools in gravitational-wave astronomy. While producing very accurate predictions, their applicability is limited to the region of the parameter space where numerical-relativity simulations are available and computationally feasible. Notably, this excludes extreme mass ratios. We presen… ▽ More Numerical-relativity surrogate models for both black-hole merger waveforms and remnants have emerged as important tools in gravitational-wave astronomy. While producing very accurate predictions, their applicability is limited to the region of the parameter space where numerical-relativity simulations are available and computationally feasible. Notably, this excludes extreme mass ratios. We present a machine-learning approach to extend the validity of existing and future numerical-relativity surrogate models toward the test-particle limit, targeting in particular the mass and spin of post-merger black-hole remnants. Our model is trained on both numerical-relativity simulations at comparable masses and analytical predictions at extreme mass ratios. We extend the gaussian-process-regression model NRSur7dq4Remnant, validate its performance via cross validation, and test its accuracy against additional numerical-relativity runs. Our fit, which we dub NRSur7dq4EmriRemnant, reaches an accuracy that is comparable to or higher than that of existing remnant models while providing robust predictions for arbitrary mass ratios. △ Less

Submitted 24 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: 10 pages, 3 figures. Published in PRD. Model publicly available at https://pypi.org/project/surfinBH

Journal ref: Phys.Rev.D 108 (2023) 8, 084015

arXiv:2306.11377 [pdf, other]

HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation

Authors: An Dinh Vuong, Toan Tien Nguyen, Minh Nhat VU, Baoru Huang, Dzung Nguyen, Huynh Thi Thanh Binh, Thieu Vo, Anh Nguyen

Abstract: Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics hav… ▽ More Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 14 pages, 10 figures

arXiv:2306.06804 [pdf, other]

Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

Authors: Manuel Mager, Rajat Bhatnagar, Graham Neubig, Ngoc Thang Vu, Katharina Kann

Abstract: Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Americas are among them, having a limited amount of p… ▽ More Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any. Here, we present an introduction to the interested reader to the basic challenges, concepts, and techniques that involve the creation of MT systems for these languages. Finally, we discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted to AmericasNLP 2023

arXiv:2306.05653 [pdf]

Rapid, antibiotic incubation-free determination of tuberculosis drug resistance using machine learning and Raman spectroscopy

Authors: Babatunde Ogunlade, Loza F. Tadesse, Hongquan Li, Nhat Vu, Niaz Banaei, Amy K. Barczak, Amr. A. E. Saleh, Manu Prakash, Jennifer A. Dionne

Abstract: Tuberculosis (TB) is the world's deadliest infectious disease, with over 1.5 million deaths annually and 10 million new cases reported each year. The causative organism, Mycobacterium tuberculosis (Mtb) can take nearly 40 days to culture, a required step to determine the pathogen's antibiotic susceptibility. Both rapid identification of Mtb and rapid antibiotic susceptibility testing (AST) are ess… ▽ More Tuberculosis (TB) is the world's deadliest infectious disease, with over 1.5 million deaths annually and 10 million new cases reported each year. The causative organism, Mycobacterium tuberculosis (Mtb) can take nearly 40 days to culture, a required step to determine the pathogen's antibiotic susceptibility. Both rapid identification of Mtb and rapid antibiotic susceptibility testing (AST) are essential for effective patient treatment and combating antimicrobial resistance. Here, we demonstrate a rapid, culture-free, and antibiotic incubation-free drug susceptibility test for TB using Raman spectroscopy and machine learning. We collect few-to-single-cell Raman spectra from over 25,000 cells of the MtB complex strain Bacillus Calmette Guerin (BCG) resistant to one of the four mainstay anti-TB drugs, isoniazid, rifampicin, moxifloxacin and amikacin, as well as a pan susceptible wildtype strain. By training a neural network on this data, we classify the antibiotic resistance profile of each strain, both on dried samples and in patient sputum samples. On dried samples, we achieve >98% resistant versus susceptible classification accuracy across all 5 BCG strains. In patient sputum samples, we achieve ~79% average classification accuracy. We develop a feature recognition algorithm in order to verify that our machine learning model is using biologically relevant spectral features to assess the resistance profiles of our mycobacterial strains. Finally, we demonstrate how this approach can be deployed in resource-limited settings by develo** a low-cost, portable Raman microscope that costs <$5000. We show how this instrument and our machine learning model enables combined microscopy and spectroscopy for accurate few-to-single-cell drug susceptibility testing of BCG. △ Less

Submitted 9 April, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.04755 [pdf, other]

A positivity-preserving adaptive-order finite-difference scheme for GRMHD

Authors: Nils Deppe, Lawrence E. Kidder, Saul A. Teukolsky, Marceline S. Bonilla, François Hébert, Yoonsoo Kim, Mark A. Scheel, William Throwe, Nils L. Vu

Abstract: We present an adaptive-order positivity-preserving conservative finite-difference scheme that allows a high-order solution away from shocks and discontinuities while guaranteeing positivity and robustness at discontinuities. This is achieved by monitoring the relative power in the highest mode of the reconstructed polynomial and reducing the order when the polynomial series no longer converges. Ou… ▽ More We present an adaptive-order positivity-preserving conservative finite-difference scheme that allows a high-order solution away from shocks and discontinuities while guaranteeing positivity and robustness at discontinuities. This is achieved by monitoring the relative power in the highest mode of the reconstructed polynomial and reducing the order when the polynomial series no longer converges. Our approach is similar to the multidimensional optimal order detection (MOOD) strategy, but differs in several ways. The approach is a priori and so does not require retaking a time step. It can also readily be combined with positivity-preserving flux limiters that have gained significant traction in computational astrophysics and numerical relativity. This combination ultimately guarantees a physical solution both during reconstruction and time step**. We demonstrate the capabilities of the method using a standard suite of very challenging 1d, 2d, and 3d general relativistic magnetohydrodynamics test problems. △ Less

Submitted 18 January, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 48 pages, 17 figures. Matches published version, minor changes only

Showing 1–50 of 178 results for author: Vu, N