Skip to main content

Showing 1–50 of 352 results for author: Nguyen, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17376  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

    Authors: Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng

    Abstract: Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in sp… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. arXiv:2406.15119  [pdf, other

    cs.SD cs.AI eess.AS

    Speech Emotion Recognition under Resource Constraints with Data Distillation

    Authors: Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller

    Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.02555  [pdf, ps, other

    eess.AS cs.CL

    PhoWhisper: Automatic Speech Recognition for Vietnamese

    Authors: Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

    Abstract: We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: Accepted to ICLR 2024 Tiny Papers Track

  4. arXiv:2405.19653  [pdf, other

    cs.LG cs.CL eess.SY

    SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

    Authors: Patrick Emami, Zhaonan Li, Saumya Sinha, Truc Nguyen

    Abstract: Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a "system caption", or SysCap. To address the lack of datasets of paired natura… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 17 pages. Under review

  5. arXiv:2405.16664  [pdf

    eess.SP physics.med-ph

    Deep learning improved autofocus for motion artifact reduction and its application in quantitative susceptibility map**

    Authors: Chao Li, **wei Zhang, Hang Zhang, Jiahao Li, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

    Abstract: Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when dat… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  6. arXiv:2405.00567  [pdf, other

    eess.IV

    Remote Sensing Data Assimilation with a Chained Hydrologic-hydraulic Model for Flood Forecasting

    Authors: Thanh Huy Nguyen, Andrea Piacentini, Sophie Ricci, Ludovic Cassan, Simon Munier, Quentin Bonassies, Raquel Rodriguez-Suquet

    Abstract: A chained hydrologic-hydraulic model is implemented using predicted runoff from a large-scale hydrologic model (namely ISBA-CTRIP) as inputs to local hydrodynamic models (TELEMAC-2D) to issue forecasts of water level and flood extent. The uncertainties in the hydrological forcing and in friction parameters are reduced by an Ensemble Kalman Filter that jointly assimilates in-situ water levels and f… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 13 pages, 14 figures. Submitted to the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

  7. arXiv:2404.11152  [pdf, other

    eess.IV cs.CV

    Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

    Authors: Abdullah F. Al-Battal, Soan T. M. Duong, Van Ha Tang, Quang Duc Tran, Steven Q. H. Truong, Chien Phan, Truong Q. Nguyen, Cheolhong An

    Abstract: Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with resp… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  8. arXiv:2404.09621  [pdf, other

    eess.SY cs.ET cs.HC cs.RO

    AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

    Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

    Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  9. arXiv:2403.20184  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context

    Authors: Tuan Nguyen, Corinne Fredouille, Alain Ghio, Mathieu Balaguer, Virginie Woisard

    Abstract: Automatic speech quality assessment has raised more attention as an alternative or support to traditional perceptual clinical evaluation. However, most research so far only gains good results on simple tasks such as binary classification, largely due to data scarcity. To deal with this challenge, current works tend to segment patients' audio files into many samples to augment the datasets. Neverth… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  10. arXiv:2403.16489  [pdf, other

    cs.RO eess.SY

    Spatially temporally distributed informative path planning for multi-robot systems

    Authors: Binh Nguyen, Linh Nguyen, Truong X. Nghiem, Hung La, Jose Baca, Pablo Rangel, Miguel Cid Montoya, Thang Nguyen

    Abstract: This paper investigates the problem of informative path planning for a mobile robotic sensor network in spatially temporally distributed map**. The robots are able to gather noisy measurements from an area of interest during their movements to build a Gaussian Process (GP) model of a spatio-temporal field. The model is then utilized to predict the spatio-temporal phenomenon at different points o… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.15189  [pdf, other

    eess.SY

    Forecasting the load of Parcel Pickup Points using a Markov Jump Process

    Authors: Thi-Thu-Tam Nguyen, Adnane Cabani, Iyadh Cabani, Koen De Turck, Michel Kieffer

    Abstract: The growth of e-commerce has resulted in a surge in parcel deliveries, increasing transportation costs and pollution issues. Alternatives to home delivery have emerged, such as the delivery to so-called parcel pick-up points (PUPs), which eliminates delivery failure due to customers not being at home. Nevertheless, parcels reaching overloaded PUPs may need to be redirected to alternative PUPs, som… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  12. arXiv:2403.14395  [pdf, other

    eess.IV physics.ao-ph

    Early Flood Warning Using Satellite-Derived Convective System and Precipitation Data -- A Retrospective Case Study of Central Vietnam

    Authors: Tran-Vu La, Thanh Huy Nguyen, Patrick Matgen, Marco Chini

    Abstract: This paper addresses the challenges of an early flood warning caused by complex convective systems (CSs), by using Low-Earth Orbit and Geostationary satellite data. We focus on a sequence of extreme events that took place in central Vietnam during October 2020, with a specific emphasis on the events leading up to the floods, i.e., those occurring before October 10th, 2020. In this critical phase,… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in IEEE 2024 International Geoscience & Remote Sensing Symposium (IGARSS 2024)

  13. arXiv:2403.14394  [pdf, other

    eess.IV

    Assimilation of SWOT Altimetry and Sentinel-1 Flood Extent Observations for Flood Reanalysis -- A Proof-of-Concept

    Authors: Thanh Huy Nguyen, Sophie Ricci, Andrea Piacentini, Charlotte Emery, Raquel Rodriguez Suquet, Santiago Peña Luque

    Abstract: In spite of astonishing advances and developments in remote sensing technologies, meeting the spatio-temporal requirements for flood hydrodynamic modeling remains a great challenge for Earth Observation. The assimilation of multi-source remote sensing data in 2D hydrodynamic models participates to overcome such a challenge. The recently launched Surface Water and Ocean Topography (SWOT) wide-swath… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in IEEE 2024 International Geoscience & Remote Sensing Symposium (IGARSS 2024)

  14. Multi-IRS-aided Terahertz Networks: Channel Modelling and User Association With Imperfect CSI

    Authors: Muddasir Rahim, Thanh Luan Nguyen, Georges Kaddoum, Tri Nhu Do

    Abstract: Terahertz (THz) communication is envisioned as one of the candidate technologies for future wireless communications to enable achievable data rates of up to several terabits per second (Tbps). However, the high pathloss and molecular absorption in THz band communications often limit the transmission range. To overcome these limitations, this paper proposes intelligent reconfigurable surface (IRS)-… ▽ More

    Submitted 5 February, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.15028

  15. arXiv:2403.02701  [pdf, other

    cs.SD cs.AI eess.AS

    Fighting Game Adaptive Background Music for Improved Gameplay

    Authors: Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

    Abstract: This paper presents our work to enhance the background music (BGM) in DareFightingICE by adding adaptive features. The adaptive BGM consists of three different categories of instruments playing the BGM of the winner sound design from the 2022 DareFightingICE Competition. The BGM adapts by changing the volume of each category of instruments. Each category is connected to a different element of the… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: This is an updated version of our IEEE CoG 2023 paper (https://ieeexplore.ieee.org/document/10333245). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734

    ACM Class: I.2; H.5.2; H.5

  16. arXiv:2403.02687   

    cs.HC cs.AI cs.SD eess.AS

    Enhanced DareFightingICE Competitions: Sound Design and AI Competitions

    Authors: Ibrahim Khan, Chollakorn Nimpattanavong, Thai Van Nguyen, Kantinan Plupattanakit, Ruck Thawonmas

    Abstract: This paper presents a new and improved DareFightingICE platform, a fighting game platform with a focus on visually impaired players (VIPs), in the Unity game engine. It also introduces the separation of the DareFightingICE Competition into two standalone competitions called DareFightingICE Sound Design Competition and DareFightingICE AI Competition--at the 2024 IEEE Conference on Games (CoG)--in w… ▽ More

    Submitted 27 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv

    ACM Class: I.2; H.5.2; H.5.5

  17. Real-time hybrid controls of energy storage and load shedding for integrated power and energy systems of ships

    Authors: Linh Vu, Thai-Thanh Nguyen, Bang Le-Huy Nguyen, Md Isfakul Anam, Tuyen Vu

    Abstract: This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this pap… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 15 pages, 17 figures

    Journal ref: Electric Power Systems Research, volume 229, pages 110191, year 2024

  18. arXiv:2403.00379  [pdf, other

    eess.AS cs.SD

    The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

    Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  19. arXiv:2402.15989  [pdf, other

    cs.AI eess.SY

    PIDformer: Transformer Meets Control Theory

    Authors: Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk

    Abstract: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input p… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  20. arXiv:2402.13554  [pdf, ps, other

    cs.IT eess.SP

    Secrecy Performance Analysis of Space-to-Ground Optical Satellite Communications

    Authors: Thang V. Nguyen, Thanh V. Pham, Anh T. Pham, Dang T. Ngoc

    Abstract: Free-space optics (FSO)-based satellite communication systems have recently received considerable attention due to their enhanced capacity compared to their radio frequency (RF) counterparts. This paper analyzes the performance of physical layer security of space-to-ground intensity modulation/direct detection FSO satellite links under the effect of atmospheric loss, misalignment, cloud attenuatio… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  21. arXiv:2402.13549  [pdf, ps, other

    cs.IT eess.SY

    Q-learning-based Joint Design of Adaptive Modulation and Precoding for Physical Layer Security in Visible Light Communications

    Authors: Duc M. T. Hoang, Thanh V. Pham, Anh T. Pham, Chuyen T Nguyen

    Abstract: There has been an increasing interest in physical layer security (PLS), which, compared with conventional cryptography, offers a unique approach to guaranteeing information confidentiality against eavesdroppers. In this paper, we study a joint design of adaptive $M$-ary pulse amplitude modulation (PAM) and precoding, which aims to optimize wiretap visible-light channels' secrecy capacity and bit e… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  22. arXiv:2402.06695  [pdf, other

    cs.AI cs.LG eess.SY

    Integrating LLMs for Explainable Fault Diagnosis in Complex Systems

    Authors: Akshay J. Dave, Tat Nghia Nguyen, Richard B. Vilim

    Abstract: This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable expl… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 4 pages

  23. arXiv:2402.05755  [pdf, other

    cs.CL cs.SD eess.AS

    SpiRit-LM: Interleaved Spoken and Written Language Model

    Authors: Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

    Abstract: We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  24. arXiv:2402.03648  [pdf, other

    eess.SP cs.LG

    Multilinear Kernel Regression and Imputation via Manifold Learning

    Authors: Duc Thien Nguyen, Konstantinos Slavakis

    Abstract: This paper introduces a novel nonparametric framework for data imputation, coined multilinear kernel regression and imputation via the manifold assumption (MultiL-KRIM). Motivated by manifold learning, MultiL-KRIM models data features as a point cloud located in or close to a user-unknown smooth manifold embedded in a reproducing kernel Hilbert space. Unlike typical manifold-learning routes, which… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  25. arXiv:2402.01198  [pdf, other

    cs.IT eess.SP

    Physical Layer Location Privacy in SIMO Communication Using Fake Paths Injection

    Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

    Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  26. Channel Characterization of UAV-RIS-aided Systems with Adaptive Phase-shift Configuration

    Authors: Thanh Luan Nguyen, Georges Kaddoum, Tri Nhu Do, Zygmunt J. Haas

    Abstract: This letter considers a UAV aiding communication between a ground transmitter and a ground receiver in the presence of co-channel interference. A discrete-time Markov process is adopted to model the complex nature of the Air-to-Ground (A2G) channel, including the occurrence of Line-of-Sight, Non-Line-of-Sight, and blockage events. Moreover, an adaptive phase-shift-enabled Reconfigurable Intelligen… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  27. arXiv:2401.15028  [pdf, other

    eess.SP

    User Association Optimization for IRS-aided Terahertz Networks: A Matching Theory Approach

    Authors: Muddasir Rahim, Thanh Luan Nguyen, Georges Kaddoum, Tri Nhu Do

    Abstract: Terahertz (THz) communication is a promising technology for future wireless communications, offering data rates of up to several terabits-per-second (Tbps). However, the range of THz band communications is often limited by high pathloss and molecular absorption. To overcome these challenges, this paper proposes intelligent reconfigurable surfaces (IRSs) to enhance THz communication systems. Specif… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  28. arXiv:2401.14203  [pdf, other

    eess.SP cs.IT

    Statistical Characterization of RIS-assisted UAV Communications in Terrestrial and Non-Terrestrial Networks Under Channel Aging

    Authors: Thanh Luan Nguyen, Georges Kaddoum, Tri Nhu Do, Zygmunt J. Haas

    Abstract: This paper studies the statistical characterization of ground-to-air (G2A) and reconfigurable intelligent surface (RIS)-assisted air-to-ground (A2G) communications with unmanned aerial vehicles (UAVs) in terrestrial and non-terrestrial networks under the impact of channel aging. We first model the G2A and A2G signal-to-noise ratios (SNRs) as non-central complex Gaussian quadratic random variable… ▽ More

    Submitted 30 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures and 7 subfigures, IEEE ICC'24 (Revision)

  29. arXiv:2401.10032  [pdf, other

    eess.AS cs.AI eess.SP

    FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated co… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  30. arXiv:2401.07326  [pdf, other

    eess.IV cs.CV

    Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound Diagnosis

    Authors: Dat T. Chung, Minh-Anh Dang, Mai-Anh Vu, Minh T. Nguyen, Thanh-Huy Nguyen, Vinh Q. Dinh

    Abstract: Breast Ultrasound plays a vital role in cancer diagnosis as a non-invasive approach with cost-effective. In recent years, with the development of deep learning, many CNN-based approaches have been widely researched in both tumor localization and cancer classification tasks. Even though previous single models achieved great performance in both tasks, these methods have some limitations in inference… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 7 pages, 3 figures

  31. The Smooth Trajectory Estimator for LMB Filters

    Authors: Hoa Van Nguyen, Tran Thien Dat Nguyen, Changbeom Shim, Marzhar Anuar

    Abstract: This paper proposes a smooth-trajectory estimator for the labelled multi-Bernoulli (LMB) filter by exploiting the special structure of the generalised labelled multi-Bernoulli (GLMB) filter. We devise a simple and intuitive approach to store the best association map when approximating the GLMB random finite set (RFS) to the LMB RFS. In particular, we construct a smooth-trajectory estimator (i.e.,… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures. Presented at The 12th IEEE International Conference on Control, Automation and Information Sciences (ICCAIS 2023), Nov 2023, Hanoi, Vietnam

  32. arXiv:2312.16835  [pdf, other

    eess.IV cs.CV

    RimSet: Quantitatively Identifying and Characterizing Chronic Active Multiple Sclerosis Lesion on Quantitative Susceptibility Maps

    Authors: Hang Zhang, Thanh D. Nguyen, **wei Zhang, Renjiu Hu, Susan A. Gauthier, Yi Wang

    Abstract: Background: Rim+ lesions in multiple sclerosis (MS), detectable via Quantitative Susceptibility Map** (QSM), correlate with increased disability. Existing literature lacks quantitative analysis of these lesions. We introduce RimSet for quantitative identification and characterization of rim+ lesions on QSM. Methods: RimSet combines RimSeg, an unsupervised segmentation method using level-set meth… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 13 pages, 7 figures, 4 tables

  33. arXiv:2312.11825  [pdf, other

    cs.SD eess.AS

    MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

    Authors: Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Jiaqi Yip, Dianwen Ng, Bin Ma

    Abstract: Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, accepted by ICASSP 2024

  34. arXiv:2312.10543  [pdf, other

    q-bio.NC eess.SP

    Study of cognitive component of auditory attention to natural speech events

    Authors: Nhan D. T. Nguyen, Kaare Mikkelsen, Preben Kidmose

    Abstract: Event-related potentials (ERP) have been used to address a wide range of research questions in neuroscience and cognitive psychology including selective auditory attention. The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstructi… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 15 pages, 11 figures

  35. arXiv:2312.09445  [pdf, other

    eess.SP cs.CV cs.LG

    IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

    Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

    Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More

    Submitted 16 November, 2023; originally announced December 2023.

  36. arXiv:2311.17256  [pdf, other

    cs.CV eess.IV

    Pattern retrieval of traffic congestion using graph-based associations of traffic domain-specific features

    Authors: Tin T. Nguyen, Simeon C. Calvert, Guopeng Li, Hans van Lint

    Abstract: The fast-growing amount of traffic data brings many opportunities for revealing more insightful information about traffic dynamics. However, it also demands an effective database management system in which information retrieval is arguably an important feature. The ability to locate similar patterns in big datasets potentially paves the way for further valuable analyses in traffic management. This… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 20 pages, 14 figures

  37. arXiv:2311.11096  [pdf, other

    eess.IV cs.CV

    On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

    Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

    Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

  38. Modeling Power Systems Dynamics with Symbolic Physics-Informed Neural Networks

    Authors: Huynh T. T. Tran, Hieu T. Nguyen

    Abstract: In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks,… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Journal ref: The 2024 Conference on Innovative Smart Grid Technologies, North America (ISGT NA 2024)

  39. arXiv:2310.14506  [pdf, other

    eess.SP cs.DB

    Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

    Authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyun** Choi, Youngho Kim

    Abstract: Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures

  40. arXiv:2310.12574  [pdf

    eess.IV cs.CV

    A reproducible 3D convolutional neural network with dual attention module (3D-DAM) for Alzheimer's disease classification

    Authors: Thanh Phuong Vu, Tien Nhat Nguyen, N. Minh Nhat Hoang, Gia Minh Hoang

    Abstract: Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classifi… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  41. arXiv:2310.11532  [pdf, other

    cs.CL eess.AS

    Multi-stage Large Language Model Correction for Speech Recognition

    Authors: Jie Pu, Thai-Son Nguyen, Sebastian Stüker

    Abstract: In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from previous LLM-based ASR error correction methods, we propose a novel multi-stage approach that utilizes uncertainty estimation of ASR outputs and reasoning capability of LLMs. Specifically, the proposed approach has two stages: the first stage… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

  42. arXiv:2310.10822  [pdf, other

    cs.RO cs.CV eess.SY

    Vision and Language Navigation in the Real World via Online Visual Language Map**

    Authors: Chengguang Xu, Hieu T. Nguyen, Christopher Amato, Lawson L. S. Wong

    Abstract: Navigating in unseen environments is crucial for mobile robots. Enhancing them with the ability to follow instructions in natural language will further improve navigation efficiency in unseen cases. However, state-of-the-art (SOTA) vision-and-language navigation (VLN) methods are mainly evaluated in simulation, neglecting the complex and noisy real world. Directly transferring SOTA navigation poli… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  43. arXiv:2310.04791  [pdf, other

    eess.AS cs.LG cs.SD

    Conditional Diffusion Model for Target Speaker Extraction

    Authors: Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

    Abstract: We propose DiffSpEx, a generative target speaker extraction method based on score-based generative modelling through stochastic differential equations. DiffSpEx deploys a continuous-time stochastic diffusion process in the complex short-time Fourier transform domain, starting from the target speaker source and converging to a Gaussian distribution centred on the mixture of sources. For the reverse… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: 5 pages, 4 figures, submitted to ICASSP 2024

  44. arXiv:2310.01413  [pdf

    eess.IV cs.AI cs.CV

    A multi-institutional pediatric dataset of clinical radiology MRIs by the Children's Brain Tumor Network

    Authors: Ariana M. Familiar, Anahita Fathi Kazerooni, Hannah Anderson, Aliaksandr Lubneuski, Karthik Viswanathan, Rocky Breslow, Nastaran Khalili, Sina Bagheri, Debanjan Haldar, Meen Chul Kim, Sherjeel Arif, Rachel Madhogarhia, Thinh Q. Nguyen, Elizabeth A. Frenkel, Zeinab Helili, Jessica Harrison, Keyvan Farahani, Marius George Linguraru, Ulas Bagci, Yury Velichko, Jeffrey Stevens, Sarah Leary, Robert M. Lober, Stephani Campion, Amy A. Smith , et al. (15 additional authors not shown)

    Abstract: Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  45. arXiv:2310.00418  [pdf, other

    eess.IV cs.CV

    MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images

    Authors: Huyen Tran, Duc Thanh Nguyen, John Yearwood

    Abstract: Medical image analysis using computer-based algorithms has attracted considerable attention from the research community and achieved tremendous progress in the last decade. With recent advances in computing resources and availability of large-scale medical image datasets, many deep learning models have been developed for disease diagnosis from medical images. However, existing techniques focus on… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  46. arXiv:2309.17020  [pdf, other

    eess.AS cs.SD

    Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

    Authors: Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed

    Abstract: Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TT… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ASRU 2023 SPARKS Workshop

  47. arXiv:2309.16699  [pdf

    cs.RO eess.SY

    Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors

    Authors: Xuan Quang Ngo, Tri Duc Tran, Huy Hung Nguyen, Van Dong Nguyen, Van Tu Duong, Tan Tien Nguyen

    Abstract: This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed… ▽ More

    Submitted 12 August, 2023; originally announced September 2023.

    Comments: 6 pages, 12 figures, the 2023 International Symposium on Electrical and Electronics Engineering, Ho Chi Minh, Viet Nam, 2023

  48. arXiv:2309.15483  [pdf, ps, other

    cs.IT eess.SY

    Energy-Efficient Precoding Designs for Multi-User Visible Light Communication Systems with Confidential Messages

    Authors: Son T. Duong, Thanh V. Pham, Chuyen T. Nguyen, Anh T. Pham

    Abstract: This paper studies energy-efficient precoding designs for multi-user visible light communication (VLC) systems from the perspective of physical layer security where users' messages must be kept mutually confidential. For such systems, we first derive a lower bound on the achievable secrecy rate of each user. Next, the total power consumption for illumination and data transmission is thoroughly ana… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  49. arXiv:2309.12608  [pdf, other

    eess.AS cs.SD

    SPGM: Prioritizing Local Features for enhanced speech separation performance

    Authors: Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

    Abstract: Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we pro… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: This paper was accepted by ICASSP 2024

  50. arXiv:2309.09413  [pdf, other

    cs.SD eess.AS

    Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

    Authors: Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

    Abstract: Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understa… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.