Skip to main content

Showing 1–50 of 133 results for author: Vu, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02937  [pdf, other

    cs.CL cs.SD eess.AS

    Probing the Feasibility of Multilingual Speaker Anonymization

    Authors: Sarina Meyer, Florian Lux, Ngoc Thang Vu

    Abstract: In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing almost exclusively on English data. In this study, we extend a state-of-the-art anonymization system to nine languages by transforming language-dependen… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted at Interspeech 2024

  2. arXiv:2406.09489  [pdf, other

    cs.CV

    Language-driven Grasp Detection

    Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 19 pages. Accepted to CVPR24

  3. arXiv:2406.09039  [pdf, other

    cs.RO

    Language-Driven Closed-Loop Gras** with Model-Predictive Trajectory Replanning

    Authors: Huy Hoang Nguyen, Minh Nhat Vu, Florian Beck, Gerald Ebmer, Anh Nguyen, Andreas Kugi

    Abstract: Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  4. arXiv:2406.07124  [pdf, other

    cs.AI cs.LG

    CHARME: A chain-based reinforcement learning approach for the minor embedding problem

    Authors: Hoang M. Ngo, Nguyen H K. Do, Minh N. Vu, Tamer Kahveci, My T. Thai

    Abstract: Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the mino… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.06406  [pdf, other

    cs.CL cs.SD eess.AS

    Controlling Emotion in Text-to-Speech with Natural Language Prompts

    Authors: Thomas Bott, Florian Lux, Ngoc Thang Vu

    Abstract: In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points wi… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted at Interspeech 2024

  6. arXiv:2406.06403  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Meta Learning Text-to-Speech Synthesis in over 7000 Languages

    Authors: Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu

    Abstract: In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted at Interspeech 2024

  7. arXiv:2405.09335  [pdf, other

    cs.CL

    Prompting-based Synthetic Data Generation for Few-Shot Question Answering

    Authors: Maximilian Schmidt, Andrea Bartezzaghi, Ngoc Thang Vu

    Abstract: Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: LREC-COLING 2024

  8. arXiv:2404.10922  [pdf, other

    cs.CL cs.SD eess.AS

    Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

    Authors: Pavel Denisov, Ngoc Thang Vu

    Abstract: Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness th… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: NAACL Findings 2024

  9. arXiv:2404.07122  [pdf, other

    cs.CV

    Driver Attention Tracking and Analysis

    Authors: Dat Viet Thanh Nguyen, Anh Tran, Hoai Nam Vu, Cuong Pham, Minh Hoai

    Abstract: We propose a novel method to estimate a driver's points-of-gaze using a pair of ordinary cameras mounted on the windshield and dashboard of a car. This is a challenging problem due to the dynamics of traffic environments with 3D scenes of unknown depths. This problem is further complicated by the volatile distance between the driver and the camera system. To tackle these challenges, we develop a n… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  10. Superior Genetic Algorithms for the Target Set Selection Problem Based on Power-Law Parameter Choices and Simple Greedy Heuristics

    Authors: Benjamin Doerr, Martin S. Krejca, Nguyen Vu

    Abstract: The target set selection problem (TSS) asks for a set of vertices such that an influence spreading process started in these vertices reaches the whole graph. The current state of the art for this NP-hard problem are three recently proposed randomized search heuristics, namely a biased random-key genetic algorithm (BRKGA) obtained from extensive parameter tuning, a max-min ant system (MMAS), and a… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  11. arXiv:2403.17647  [pdf, other

    cs.CL

    Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

    Authors: Pascal Tilli, Ngoc Thang Vu

    Abstract: The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for grap… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  12. arXiv:2403.17582  [pdf, other

    cs.CL cs.AI cs.LG

    Towards a Zero-Data, Controllable, Adaptive Dialog System

    Authors: Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

    Abstract: Conversational Tree Search (Väth et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2403.05338  [pdf, other

    cs.CL

    Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

    Authors: Wei Zhou, Heike Adel, Hendrik Schuff, Ngoc Thang Vu

    Abstract: Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution sc… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  14. arXiv:2403.04784  [pdf, other

    cs.CR cs.LG

    Analysis of Privacy Leakage in Federated Large Language Models

    Authors: Minh N. Vu, Truc Nguyen, Tre' R. Jeter, My T. Thai

    Abstract: With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  15. arXiv:2402.04769  [pdf, other

    cs.RO

    Hierarchical Motion Planning and Offline Robust Model Predictive Control for Autonomous Vehicles

    Authors: Hung Duy Nguyen, Minh Nhat Vu, Nguyen Ngoc Nam, Kyoungseok Han

    Abstract: Driving vehicles in complex scenarios under harsh conditions is the biggest challenge for autonomous vehicles (AVs). To address this issue, we propose hierarchical motion planning and robust control strategy using the front-active steering system in complex scenarios with various slippery road adhesion coefficients while considering vehicle uncertain parameters. Behaviors of human vehicles (HVs) a… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 6 pages, 9 illustrations, Accepted for publication in American Control Conference (ACC) 2024

  16. arXiv:2402.04730  [pdf, other

    cs.RO

    Model Predictive Trajectory Optimization With Dynamically Changing Waypoints for Serial Manipulators

    Authors: Florian Beck, Minh Nhat Vu, Christian Hartl-Nesic, Andreas Kugi

    Abstract: Systematically including dynamically changing waypoints as desired discrete actions, for instance, resulting from superordinate task planning, has been challenging for online model predictive trajectory optimization with short planning horizons. This paper presents a novel waypoint model predictive control (wMPC) concept for online replanning tasks. The main idea is to split the planning horizon a… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 8 pages, 6 figures

  17. arXiv:2401.17676  [pdf, other

    cs.RO

    Observer-based Controller Design for Oscillation Dam** of a Novel Suspended Underactuated Aerial Platform

    Authors: Hemjyoti Das, Minh Nhat Vu, Tobias Egle, Christian Ott

    Abstract: In this work, we present a novel actuation strategy for a suspended aerial platform. By utilizing an underactuation approach, we demonstrate the successful oscillation dam** of the proposed platform, modeled as a spherical double pendulum. A state estimator is designed in order to obtain the deflection angles of the platform, which uses only onboard IMU measurements. The state estimator is an ex… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 7 pages, 11 figures, Accepted for publication to ICRA 2024

  18. arXiv:2401.09059  [pdf, other

    cs.RO cs.CV

    Autonomous Catheterization with Open-source Simulator and Expert Trajectory

    Authors: Tudor Jianu, Baoru Huang, Tuan Vo, Minh Nhat Vu, **gxuan Kang, Hoan Nguyen, Olatunji Omisore, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

    Abstract: Endovascular robots have been actively developed in both academia and industry. However, progress toward autonomous catheterization is often hampered by the widespread use of closed-source simulators and physical phantoms. Additionally, the acquisition of large-scale datasets for training machine learning algorithms with endovascular robots is usually infeasible due to expensive medical procedures… ▽ More

    Submitted 19 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Code: https://github.com/airvlab/cathsim

  19. arXiv:2311.14465  [pdf, other

    cs.CL

    DP-NMT: Scalable Differentially-Private Machine Translation

    Authors: Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal

    Abstract: Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implemen… ▽ More

    Submitted 24 April, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted at EACL 2024

  20. Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

    Authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

    Abstract: Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

  21. arXiv:2310.17499  [pdf, other

    cs.CL cs.LG eess.AS

    The IMS Toucan System for the Blizzard Challenge 2023

    Authors: Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

    Abstract: For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synt… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Published at the Blizzard Challenge Workshop 2023, colocated with the Speech Synthesis Workshop 2023, a sattelite event of the Interspeech 2023

  22. arXiv:2310.16618  [pdf, other

    cs.CV cs.RO

    Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers

    Authors: Gerald Ebmer, Adam Loch, Minh Nhat Vu, Germain Haessig, Roberto Mecca, Markus Vincze, Christian Hartl-Nesic, Andreas Kugi

    Abstract: Real-time applications for autonomous operations depend largely on fast and robust vision-based localization systems. Since image processing tasks require processing large amounts of data, the computational resources often limit the performance of other processes. To overcome this limitation, traditional marker-based localization systems are widely used since they are easy to integrate and achieve… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 14 pages, 12 figures, this paper has been accepted to WACV 2024

  23. arXiv:2310.15948  [pdf, other

    cs.CV

    Language-driven Scene Synthesis using Multi-conditional Diffusion Model

    Authors: An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen

    Abstract: Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  24. arXiv:2310.15262  [pdf, other

    cs.CL

    Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

    Authors: Injy Hamed, Nizar Habash, Ngoc Thang Vu

    Abstract: Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW.… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  25. arXiv:2310.06103  [pdf, other

    cs.CL cs.SD eess.AS

    Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

    Authors: Pavel Denisov, Ngoc Thang Vu

    Abstract: A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2023

  26. arXiv:2309.10932  [pdf, other

    cs.RO

    Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation

    Authors: Tuan Van Vo, Minh Nhat Vu, Baoru Huang, Toan Nguyen, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D po… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 8 pages

  27. arXiv:2309.10911  [pdf, other

    cs.RO

    Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

    Authors: Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van Vo, Vy Truong, Ngan Le, Thieu Vo, Bac Le, Anh Nguyen

    Abstract: Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-wor… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Project page: https://3DAPNet.github.io

  28. arXiv:2309.09818  [pdf, other

    cs.RO cs.CV

    Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

    Authors: An Dinh Vuong, Minh Nhat Vu, Hieu Le, Baoru Huang, Binh Huynh, Thieu Vo, Andreas Kugi, Anh Nguyen

    Abstract: Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Project page: https://grasp-anything-2023.github.io

  29. VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

    Authors: Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu

    Abstract: Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by OJSP-ICASSP 2024 https://ieeexplore.ieee.org/document/10365329

  30. arXiv:2308.15005  [pdf, other

    cs.CV

    Few-Shot Object Detection via Synthetic Features with Optimal Transport

    Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Vinh-Tiep Nguyen, Tam Le, Minh-Triet Tran, Tam V. Nguyen

    Abstract: Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. However, most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes that lack diversity. Hence, they may not be sufficient to capture the data distribution. To address that limitation, in this paper, we propose a novel… ▽ More

    Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

  31. arXiv:2308.06420  [pdf, other

    cs.CV

    M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

    Authors: Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

    Abstract: Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2) mammography exams contain two views of each breast, and both… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: MICCAI 2023 with supplementary materials

  32. arXiv:2306.11377  [pdf, other

    cs.CV

    HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation

    Authors: An Dinh Vuong, Toan Tien Nguyen, Minh Nhat VU, Baoru Huang, Dzung Nguyen, Huynh Thi Thanh Binh, Thieu Vo, Anh Nguyen

    Abstract: Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics hav… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 14 pages, 10 figures

  33. arXiv:2306.06804  [pdf, other

    cs.CL stat.ML

    Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

    Authors: Manuel Mager, Rajat Bhatnagar, Graham Neubig, Ngoc Thang Vu, Katharina Kann

    Abstract: Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Americas are among them, having a limited amount of p… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to AmericasNLP 2023

  34. arXiv:2305.19474  [pdf, other

    cs.CL

    Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

    Authors: Manuel Mager, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

    Abstract: In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine transla… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL2023 Main Conference

  35. arXiv:2305.07440  [pdf, other

    cs.PF cs.AI cs.LG

    Optimizing Memory Map** Using Deep Reinforcement Learning

    Authors: Pengming Wang, Mikita Sazanovich, Berkin Ilbeyi, Phitchaya Mangpo Phothilimthana, Manish Purohit, Han Yang Tay, Ngân Vũ, Miaosen Wang, Cosmin Paduraru, Edouard Leurent, Anton Zhernov, Po-Sen Huang, Julian Schrittwieser, Thomas Hubert, Robert Tung, Paula Kurylowicz, Kieran Milan, Oriol Vinyals, Daniel J. Mankowitz

    Abstract: Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, n… ▽ More

    Submitted 17 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  36. arXiv:2305.02679  [pdf, other

    cs.CL cs.HC

    Neighboring Words Affect Human Interpretation of Saliency Explanations

    Authors: Alon Jacovi, Hendrik Schuff, Heike Adel, Ngoc Thang Vu, Yoav Goldberg

    Abstract: Word-level saliency explanations ("heat maps over words") are often used to communicate feature-attribution in text-based models. Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores. We conduct a user study to investigate how the marking of a word's neighboring words affect the explainee's perception of the word's i… ▽ More

    Submitted 6 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  37. Instance-level Few-shot Learning with Class Hierarchy Mining

    Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Tam V. Nguyen

    Abstract: Few-shot learning is proposed to tackle the problem of scarce training data in novel classes. However, prior works in instance-level few-shot learning have paid less attention to effectively utilizing the relationship between categories. In this paper, we exploit the hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. The… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: accepted by IEEE Transactions on Image Processing

  38. arXiv:2304.07444  [pdf, other

    cs.CV

    The Art of Camouflage: Few-shot Learning for Animal Detection and Segmentation

    Authors: Thanh-Danh Nguyen, Anh-Khoa Nguyen Vu, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Minh-Triet Tran, Tam V. Nguyen

    Abstract: Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data of camouflaged objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark.… ▽ More

    Submitted 21 January, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Under-review Journal

  39. arXiv:2304.04478  [pdf, other

    cs.CL cs.SD eess.AS

    Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

    Authors: Daniel Ortega, Chia-Yu Li, Ngoc Thang Vu

    Abstract: This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of u… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Published in ICASSP 2020

  40. arXiv:2304.04472  [pdf, other

    cs.CL

    Modeling Speaker-Listener Interaction for Backchannel Prediction

    Authors: Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

    Abstract: We present our latest findings on backchannel modeling novelly motivated by the canonical use of the minimal responses Yeah and Uh-huh in English and their correspondent tokens in German, and the effect of encoding the speaker-listener interaction. Backchanneling theories emphasize the active and continuous role of the listener in the course of the conversation, their effects on the speaker's subs… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Published in IWSDS 2023

  41. arXiv:2303.16417  [pdf

    cs.CV cs.LG q-bio.QM

    Problems and shortcuts in deep learning for screening mammography

    Authors: Trevor Tsue, Brent Mombourquette, Ahmed Taha, Thomas Paul Matthews, Yen Nhi Truong Vu, Jason Su

    Abstract: This work reveals undiscovered challenges in the performance and generalizability of deep learning models. We (1) identify spurious shortcuts and evaluation issues that can inflate performance and (2) propose training and analysis methods to address them. We trained an AI model to classify cancer on a retrospective dataset of 120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  42. arXiv:2303.12180  [pdf, other

    cs.RO

    Hierarchical control strategy for planar bipedal walking robots based on reduced order model

    Authors: Minh Nhat Vu

    Abstract: In this work, the hierarchical control strategy of template-based control for a bipedal robot is described. The axial force of a compliant leg is redirected to a point, called the virtual pivot point (VPP), of a 2D biped robot, which is located above the CoM of the model, to generate a restoring moment for the trunk motion. The resulting behavior of the model would resemble a virtual pendulum rota… ▽ More

    Submitted 12 February, 2023; originally announced March 2023.

    Comments: Master's thesis (Korea Institute of Science and Technology, August 2017)

  43. arXiv:2303.10227  [pdf, other

    cs.CL cs.AI cs.LG

    Conversational Tree Search: A New Hybrid Dialog Task

    Authors: Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

    Abstract: Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain. However, existing interfaces generally fall into one of two categories: FAQs, where users must have a concrete question in order to retrieve a general answer, or dialogs, where users must follow a predefined path but may receive a personalized answer. I… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: EACL 2023

  44. arXiv:2303.02401  [pdf, other

    cs.RO cs.AI cs.CV

    Open-Vocabulary Affordance Detection in 3D Point Clouds

    Authors: Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen

    Abstract: Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of dete… ▽ More

    Submitted 23 July, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: Accepted at The 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

  45. arXiv:2301.11852  [pdf, other

    cs.CE

    A Sequential Global Programming Approach for Two-scale Optimization of Homogenized Multiphysics Problems with Application to Biot Porous Media

    Authors: Bich Ngoc Vu, Vladimir Lukeš, Michael Stingl, Eduard Rohan

    Abstract: We present a new approach and an algorithm for optimizing the material configuration and behaviour of a fluid saturated porous medium in a two-scale setting. The state problem is governed by the Biot model describing the fluid-structure interaction in homogenized poroelastic structures. However, the approach is widely applicable to multiphysics problems involving several macroscopic fields where h… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: 26 pages, 19 figures, 2 tables

  46. arXiv:2212.00952  [pdf, other

    cs.LG

    On the Limit of Explaining Black-box Temporal Graph Neural Networks

    Authors: Minh N. Vu, My T. Thai

    Abstract: Temporal Graph Neural Network (TGNN) has been receiving a lot of attention recently due to its capability in modeling time-evolving graph-related tasks. Similar to Graph Neural Networks, it is also non-trivial to interpret predictions made by a TGNN due to its black-box nature. A major approach tackling this problems in GNNs is by analyzing the model' responses on some perturbations of the model's… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  47. arXiv:2211.12000  [pdf, other

    cs.CL

    ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

    Authors: Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

    Abstract: We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus.… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to the Seventh Arabic Natural Language Processing Workshop (WANLP 2022)

  48. arXiv:2211.11296  [pdf, other

    cs.CV

    SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes

    Authors: Nicolas Larue, Ngoc-Son Vu, Vitomir Struc, Peter Peer, Vassilis Christophides

    Abstract: Modern deepfake detectors have achieved encouraging results, when training and test images are drawn from the same data collection. However, when these detectors are applied to images produced with unknown deepfake-generation techniques, considerable performance degradations are commonly observed. In this paper, we propose a novel deepfake detector, called SeeABLE, that formalizes the detection pr… ▽ More

    Submitted 1 October, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted at ICCV 2023

    Journal ref: 2023, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 21011-21021

  49. arXiv:2211.09041  [pdf, other

    cs.CV

    Anomaly Detection via Multi-Scale Contrasted Memory

    Authors: Loic Jezequel, Ngoc-Son Vu, Jean Beaudet, Aymeric Histace

    Abstract: Deep anomaly detection (AD) aims to provide robust and efficient classifiers for one-class and unbalanced settings. However current AD models still struggle on edge-case normal samples and are often unable to keep high performance over different scales of anomalies. Moreover, there currently does not exist a unified framework efficiently covering both one-class and unbalanced learnings. In the lig… ▽ More

    Submitted 9 March, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

  50. arXiv:2211.06377  [pdf, other

    cs.RO

    Two-Step Online Trajectory Planning of a Quadcopter in Indoor Environments with Obstacles

    Authors: Martin Zimmermann, Minh Nhat Vu, Florian Beck, Anh Nguyen, Andreas Kugi

    Abstract: This paper presents a two-step algorithm for online trajectory planning in indoor environments with unknown obstacles. In the first step, sampling-based path planning techniques such as the optimal Rapidly exploring Random Tree (RRT*) algorithm and the Line-of-Sight (LOS) algorithm are employed to generate a collision-free path consisting of multiple waypoints. Then, in the second step, constraine… ▽ More

    Submitted 6 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 8 pages, 9 figures