Search | arXiv e-print repository

arXiv:2401.01916 [pdf, other]

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

Authors: Ernest Perkowski, Rui Pan, Tuan Dung Nguyen, Yuan-Sen Ting, Sandor Kruk, Tong Zhang, Charlie O'Neill, Maja Jablonska, Zechang Sun, Michael J. Smith, Huiling Liu, Kevin Schawinski, Kartheik Iyer, Ioana Ciucă for UniverseTBD

Abstract: We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like… ▽ More We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community. △ Less

Submitted 5 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: 4 pages, 1 figure, model is available at https://huggingface.co/universeTBD, published in RNAAS

arXiv:2401.01108 [pdf, other]

Unveiling Comparative Sentiments in Vietnamese Product Reviews: A Sequential Classification Framework

Authors: Ha Le, Bao Tran, Phuong Le, Tan Nguyen, Dac Nguyen, Ngoan Pham, Dang Huynh

Abstract: Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, o… ▽ More Comparative opinion mining is a specialized field of sentiment analysis that aims to identify and extract sentiments expressed comparatively. To address this task, we propose an approach that consists of solving three sequential sub-tasks: (i) identifying comparative sentence, i.e., if a sentence has a comparative meaning, (ii) extracting comparative elements, i.e., what are comparison subjects, objects, aspects, predicates, and (iii) classifying comparison types which contribute to a deeper comprehension of user sentiments in Vietnamese product reviews. Our method is ranked fifth at the Vietnamese Language and Speech Processing (VLSP) 2023 challenge on Comparative Opinion Mining (ComOM) from Vietnamese Product Reviews. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Accepted manuscript at VLSP 2023

arXiv:2401.00953 [pdf, ps, other]

Families of costs with zero and nonnegative MTW tensor in optimal transport

Authors: Du Nguyen

Abstract: We compute explicitly the MTW tensor (or cross curvature) for the optimal transport problem on $\mathbb{R}^n$ with a cost function of form $\mathsf{c}(x, y) = \mathsf{u}(x^{\mathfrak{t}}y)$, where $\mathsf{u}$ is a scalar function with inverse $\mathsf{s}$, $x^{\ft}y$ is a nondegenerate bilinear pairing of vectors $x, y$ belonging to an open subset of $\mathbb{R}^n$. The condition that the MTW-ten… ▽ More We compute explicitly the MTW tensor (or cross curvature) for the optimal transport problem on $\mathbb{R}^n$ with a cost function of form $\mathsf{c}(x, y) = \mathsf{u}(x^{\mathfrak{t}}y)$, where $\mathsf{u}$ is a scalar function with inverse $\mathsf{s}$, $x^{\ft}y$ is a nondegenerate bilinear pairing of vectors $x, y$ belonging to an open subset of $\mathbb{R}^n$. The condition that the MTW-tensor vanishes on null vectors under the Kim-McCann metric is a fourth-order nonlinear ODE, which could be reduced to a linear ODE of the form $\mathsf{s}^{(2)} - S\mathsf{s}^{(1)} + P\mathsf{s} = 0$ with constant coefficients $P$ and $S$. The resulting inverse functions include {\it Lambert} and {\it generalized inverse hyperbolic\slash trigonometric} functions. The square Euclidean metric and $\log$-type costs are equivalent to instances of these solutions. The optimal map for the family is also explicit. For cost functions of a similar form on a hyperboloid model of the hyperbolic space and unit sphere, we also express this tensor in terms of algebraic expressions in derivatives of $\mathsf{s}$ using the Gauss-Codazzi equation, obtaining new families of strictly regular costs for these manifolds, including new families of {\it power function costs}. We analyze the $\sinh$-type hyperbolic cost, providing examples of $\mathsf{c}$-convex functions and divergence. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 24 pages

MSC Class: 58C05; 49Q22; 53C80; 57Z20; 57Z25; 68T05; 26B25

arXiv:2401.00682 [pdf, other]

doi 10.1109/ICCAIS59597.2023.10382267

The Smooth Trajectory Estimator for LMB Filters

Authors: Hoa Van Nguyen, Tran Thien Dat Nguyen, Changbeom Shim, Marzhar Anuar

Abstract: This paper proposes a smooth-trajectory estimator for the labelled multi-Bernoulli (LMB) filter by exploiting the special structure of the generalised labelled multi-Bernoulli (GLMB) filter. We devise a simple and intuitive approach to store the best association map when approximating the GLMB random finite set (RFS) to the LMB RFS. In particular, we construct a smooth-trajectory estimator (i.e.,… ▽ More This paper proposes a smooth-trajectory estimator for the labelled multi-Bernoulli (LMB) filter by exploiting the special structure of the generalised labelled multi-Bernoulli (GLMB) filter. We devise a simple and intuitive approach to store the best association map when approximating the GLMB random finite set (RFS) to the LMB RFS. In particular, we construct a smooth-trajectory estimator (i.e., an estimator over the entire trajectories of labelled estimates) for the LMB filter based on the history of the best association map and all of the measurements up to the current time. Experimental results under two challenging scenarios demonstrate significant tracking accuracy improvements with negligible additional computational time compared to the conventional LMB filter. The source code is publicly available at https://tinyurl.com/ste-lmb, aimed at promoting advancements in MOT algorithms. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures. Presented at The 12th IEEE International Conference on Control, Automation and Information Sciences (ICCAIS 2023), Nov 2023, Hanoi, Vietnam

arXiv:2401.00024 [pdf]

Mathematical Modeling of the Synergetic Effect between Radiotherapy and Immunotherapy

Authors: Yixun Xing, Casey Moore, Debabrata Saha, Dan Nguyen, MaryLena Bleile, Xun Jia, Robert Timmerman, Hao Peng, Steve Jiang

Abstract: Achieving effective synergy between radiotherapy and immunotherapy is critical for optimizing tumor control and treatment outcomes. To explore the underlying mechanisms of this synergy, we have investigated a novel treatment approach known as personalized ultra-fractionated stereotactic adaptive radiation therapy (PULSAR), which emphasizes the impact of radiation timing on treatment efficacy. Howe… ▽ More Achieving effective synergy between radiotherapy and immunotherapy is critical for optimizing tumor control and treatment outcomes. To explore the underlying mechanisms of this synergy, we have investigated a novel treatment approach known as personalized ultra-fractionated stereotactic adaptive radiation therapy (PULSAR), which emphasizes the impact of radiation timing on treatment efficacy. However, the precise mechanism remains unclear. Building on insights from small animal PULSAR studies, we developed a mathematical framework consisting of multiple ordinary differential equations to elucidate the temporal dynamics of tumor control resulting from radiation and the adaptive immune response. The model accounts for the migration and infiltration of T-cells within the tumor microenvironment. This proposed model establishes a causal and quantitative link between radiation therapy and immunotherapy, providing a valuable in-silico analysis tool for designing future PULSAR trials. △ Less

Submitted 28 December, 2023; originally announced January 2024.

arXiv:2312.17505 [pdf, other]

Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation

Authors: Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Binh-Son Hua, Nhat Minh Chung, Ivor W. Tsang, Sai-Kit Yeung

Abstract: Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In t… ▽ More Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In this paper, we leverage these technical advances to solve a challenging problem in computer vision: camouflaged instance segmentation. Specifically, we propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations. Such cross-domain representations are desirable in segmenting camouflaged objects where visual cues are subtle to distinguish the objects from the background, especially in segmenting novel objects which are not seen in training. We also develop technically supportive components to effectively fuse cross-domain features and engage relevant features towards respective foreground objects. We validate our method and compare it with existing ones on several benchmark datasets of camouflaged instance segmentation and generic open-vocabulary instance segmentation. Experimental results confirm the advances of our method over existing ones. We will publish our code and pre-trained models to support future research. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: This work is under review

arXiv:2312.17330 [pdf, other]

Count What You Want: Exemplar Identification and Few-shot Counting of Human Actions in the Wild

Authors: Yifeng Huang, Duc Duy Nguyen, Lam Nguyen, Cuong Pham, Minh Hoai

Abstract: This paper addresses the task of counting human actions of interest using sensor data from wearable devices. We propose a novel exemplar-based framework, allowing users to provide exemplars of the actions they want to count by vocalizing predefined sounds ''one'', ''two'', and ''three''. Our method first localizes temporal positions of these utterances from the audio sequence. These positions serv… ▽ More This paper addresses the task of counting human actions of interest using sensor data from wearable devices. We propose a novel exemplar-based framework, allowing users to provide exemplars of the actions they want to count by vocalizing predefined sounds ''one'', ''two'', and ''three''. Our method first localizes temporal positions of these utterances from the audio sequence. These positions serve as the basis for identifying exemplars representing the action class of interest. A similarity map is then computed between the exemplars and the entire sensor data sequence, which is further fed into a density estimation module to generate a sequence of estimated density values. Summing these density values provides the final count. To develop and evaluate our approach, we introduce a diverse and realistic dataset consisting of real-world data from 37 subjects and 50 action categories, encompassing both sensor and audio data. The experiments on this dataset demonstrate the viability of the proposed method in counting instances of actions from new classes and subjects that were not part of the training data. On average, the discrepancy between the predicted count and the ground truth value is 7.47, significantly lower than the errors of the frequency-based and transformer-based methods. Our project, code and dataset can be found at https://github.com/cvlab-stonybrook/ExRAC. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16835 [pdf, other]

RimSet: Quantitatively Identifying and Characterizing Chronic Active Multiple Sclerosis Lesion on Quantitative Susceptibility Maps

Authors: Hang Zhang, Thanh D. Nguyen, **wei Zhang, Renjiu Hu, Susan A. Gauthier, Yi Wang

Abstract: Background: Rim+ lesions in multiple sclerosis (MS), detectable via Quantitative Susceptibility Map** (QSM), correlate with increased disability. Existing literature lacks quantitative analysis of these lesions. We introduce RimSet for quantitative identification and characterization of rim+ lesions on QSM. Methods: RimSet combines RimSeg, an unsupervised segmentation method using level-set meth… ▽ More Background: Rim+ lesions in multiple sclerosis (MS), detectable via Quantitative Susceptibility Map** (QSM), correlate with increased disability. Existing literature lacks quantitative analysis of these lesions. We introduce RimSet for quantitative identification and characterization of rim+ lesions on QSM. Methods: RimSet combines RimSeg, an unsupervised segmentation method using level-set methodology, and radiomic measurements with Local Binary Pattern texture descriptors. We validated RimSet using simulated QSM images and an in vivo dataset of 172 MS subjects with 177 rim+ and 3986 rim-lesions. Results: RimSeg achieved a 78.7% Dice score against the ground truth, with challenges in partial rim lesions. RimSet detected rim+ lesions with a partial ROC AUC of 0.808 and PR AUC of 0.737, surpassing existing methods. QSMRim-Net showed the lowest mean square error (0.85) and high correlation (0.91; 95% CI: 0.88, 0.93) with expert annotations at the subject level. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 13 pages, 7 figures, 4 tables

arXiv:2312.16299 [pdf, other]

La suspension homologue pour les CW-complexes quotients d'actions libres de $((Z/2)^n$

Authors: Dang Ho Hai Nguyen, Lionel Schwartz

Abstract: This note shows that the $n$-th homology suspension is surjective for certain quotients of finite $((Z/2)^n$-CW.complexes. This is true as soon as the equivariant $((Z/2)^n$-cohomology are quotients is a free $H^*$((Z/2)^n$-module. An application is given to certain Brown-Gitler spectrum. This note shows that the $n$-th homology suspension is surjective for certain quotients of finite $((Z/2)^n$-CW.complexes. This is true as soon as the equivariant $((Z/2)^n$-cohomology are quotients is a free $H^*$((Z/2)^n$-module. An application is given to certain Brown-Gitler spectrum. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: in French language

MSC Class: 55P35; 55P40

arXiv:2312.13970 [pdf, other]

On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods

Authors: Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Lam M. Nguyen, Kim-Chuan Toh

Abstract: This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of… ▽ More This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced. △ Less

Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.13906 [pdf]

EfficientPPS: Part-aware Panoptic Segmentation of Transparent Objects for Robotic Manipulation

Authors: Benjamin Alt, Minh Dang Nguyen, Andreas Hermann, Darko Katic, Rainer Jäkel, Rüdiger Dillmann, Eric Sax

Abstract: The use of autonomous robots for assistance tasks in hospitals has the potential to free up qualified staff and im-prove patient care. However, the ubiquity of deformable and transparent objects in hospital settings poses signif-icant challenges to vision-based perception systems. We present EfficientPPS, a neural architecture for part-aware panoptic segmentation that provides robots with semantic… ▽ More The use of autonomous robots for assistance tasks in hospitals has the potential to free up qualified staff and im-prove patient care. However, the ubiquity of deformable and transparent objects in hospital settings poses signif-icant challenges to vision-based perception systems. We present EfficientPPS, a neural architecture for part-aware panoptic segmentation that provides robots with semantically rich visual information for gras** and ma-nipulation tasks. We also present an unsupervised data collection and labelling method to reduce the need for human involvement in the training process. EfficientPPS is evaluated on a dataset containing real-world hospital objects and demonstrated to be robust and efficient in gras** transparent transfusion bags with a collaborative robot arm. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures, presented at the 56th International Symposium on Robotics (ISR Europe)

MSC Class: 68T45 ACM Class: I.4.6; I.2.10; I.2.9

Journal ref: ISR Europe 2023

arXiv:2312.12175 [pdf, other]

Fast Forward-Backward splitting for monotone inclusions with a convergence rate of the tangent residual of $o(1/k)$

Authors: Radu Ioan Bot, Dang-Khoa Nguyen, Chunxiang Zong

Abstract: We address the problem of finding the zeros of the sum of a maximally monotone operator and a cocoercive operator. Our approach introduces a modification to the forward-backward method by integrating an inertial/momentum term alongside a correction term. We demonstrate that the sequence of iterations thus generated converges weakly towards a solution for the monotone inclusion problem. Furthermore… ▽ More We address the problem of finding the zeros of the sum of a maximally monotone operator and a cocoercive operator. Our approach introduces a modification to the forward-backward method by integrating an inertial/momentum term alongside a correction term. We demonstrate that the sequence of iterations thus generated converges weakly towards a solution for the monotone inclusion problem. Furthermore, our analysis reveals an outstanding attribute of our algorithm: it displays rates of convergence of the order $o(1/k)$ for the discrete velocity and the tangent residual approaching zero. These rates for tangent residuals can be extended to fixed-point residuals frequently discussed in the existing literature. Specifically, when applied to minimize a nonsmooth convex function subject to linear constraints, our method evolves into a primal-dual full splitting algorithm. Notably, alongside the convergence of iterates, this algorithm possesses a remarkable characteristic of nonergodic/last iterate $o(1/k)$ convergence rates for both the function value and the feasibility measure. Our algorithm showcases the most advanced convergence and convergence rate outcomes among primal-dual full splitting algorithms when minimizing nonsmooth convex functions with linear constraints. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 36 pages, 8 figures

arXiv:2312.12111 [pdf, other]

Designing and Evaluating General-Purpose User Representations Based on Behavioral Logs from a Measurement Process Perspective: A Case Study with Snapchat

Authors: Qixiang Fang, Zhihan Zhou, Francesco Barbieri, Yozen Liu, Leonardo Neves, Dong Nguyen, Daniel L. Oberski, Maarten W. Bos, Ron Dotsch

Abstract: In human-computer interaction, understanding user behaviors and tailoring systems accordingly is pivotal. To this end, general-purpose user representation learning based on behavior logs is emerging as a powerful tool in user modeling, offering adaptability to various downstream tasks such as item recommendations and ad conversion prediction, without the need to fine-tune the upstream user model.… ▽ More In human-computer interaction, understanding user behaviors and tailoring systems accordingly is pivotal. To this end, general-purpose user representation learning based on behavior logs is emerging as a powerful tool in user modeling, offering adaptability to various downstream tasks such as item recommendations and ad conversion prediction, without the need to fine-tune the upstream user model. While this methodology has shown promise in contexts like search engines and e-commerce platforms, its fit for instant messaging apps, a cornerstone of modern digital communication, remains largely uncharted. These apps, with their distinct interaction patterns, data structures, and user expectations, necessitate specialized attention. We explore this user modeling approach with Snapchat data as a case study. Furthermore, we introduce a novel design and evaluation framework rooted in the principles of the Measurement Process Framework from social science research methodology. Using this new framework, we design a Transformer-based user model that can produce high-quality general-purpose user representations for instant messaging platforms like Snapchat. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11735 [pdf, other]

Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions

Authors: David D. Nguyen, David Liebowitz, Surya Nepal, Salil S. Kanhere

Abstract: In many real-world applications, from robotics to pedestrian trajectory prediction, there is a need to predict multiple real-valued outputs to represent several potential scenarios. Current deep learning techniques to address multiple-output problems are based on two main methodologies: (1) mixture density networks, which suffer from poor stability at high dimensions, or (2) multiple choice learni… ▽ More In many real-world applications, from robotics to pedestrian trajectory prediction, there is a need to predict multiple real-valued outputs to represent several potential scenarios. Current deep learning techniques to address multiple-output problems are based on two main methodologies: (1) mixture density networks, which suffer from poor stability at high dimensions, or (2) multiple choice learning (MCL), an approach that uses $M$ single-output functions, each only producing a point estimate hypothesis. This paper presents a Mixture of Multiple-Output functions (MoM) approach using a novel variant of dropout, Multiple Hypothesis Dropout. Unlike traditional MCL-based approaches, each multiple-output function not only estimates the mean but also the variance for its hypothesis. This is achieved through a novel stochastic winner-take-all loss which allows each multiple-output function to estimate variance through the spread of its subnetwork predictions. Experiments on supervised learning problems illustrate that our approach outperforms existing solutions for reconstructing multimodal output distributions. Additional studies on unsupervised learning problems show that estimating the parameters of latent posterior distributions within a discrete autoencoder significantly improves codebook efficiency, sample quality, precision and recall. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: To appear in Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-24). 13 pages (9 main, 4 appendix)

arXiv:2312.10671 [pdf, other]

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Authors: Phuc D. A. Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen

Abstract: We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic… ▽ More We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals, they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D class-agnostic instance proposals to include a wide range of objects in the real world. To validate our approach, we conducted experiments on three prominent datasets, including ScanNet200, S3DIS, and Replica, demonstrating significant performance gains in segmenting objects with diverse categories over the state-of-the-art approaches. △ Less

Submitted 5 April, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: CVPR 2024. Project page: https://open3dis.github.io/

arXiv:2312.10543 [pdf, other]

Study of cognitive component of auditory attention to natural speech events

Authors: Nhan D. T. Nguyen, Kaare Mikkelsen, Preben Kidmose

Abstract: Event-related potentials (ERP) have been used to address a wide range of research questions in neuroscience and cognitive psychology including selective auditory attention. The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstructi… ▽ More Event-related potentials (ERP) have been used to address a wide range of research questions in neuroscience and cognitive psychology including selective auditory attention. The recent progress in auditory attention decoding (AAD) methods is based on algorithms that find a relation between the audio envelope and the neurophysiological response. The most popular approach is based on the reconstruction of the audio envelope based on EEG signals. However, these methods are mainly based on the neurophysiological entrainment to physical attributes of the sensory stimulus and are generally limited by a long detection window. This study proposes a novel approach to auditory attention decoding by looking at higher-level cognitive responses to natural speech. To investigate if natural speech events elicit cognitive ERP components and how these components are affected by attention mechanisms, we designed a series of four experimental paradigms with increasing complexity: a word category oddball paradigm, a word category oddball paradigm with competing speakers, and competing speech streams with and without specific targets. We recorded the electroencephalogram (EEG) from 32 scalp electrodes and 12 in-ear electrodes (ear-EEG) from 24 participants. A cognitive ERP component, which we believe is related to the well-known P3b component, was observed at parietal electrode sites with a latency of approximately 620 ms. The component is statistically most significant for the simplest paradigm and gradually decreases in strength with increasing complexity of the paradigm. We also show that the component can be observed in the in-ear EEG signals by using spatial filtering. The cognitive component elicited by auditory attention may contribute to decoding auditory attention from electrophysiological recordings and its presence in the ear-EEG signals is promising for future applications within hearing aids. △ Less

Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: 15 pages, 11 figures

arXiv:2312.09633 [pdf, other]

Natural Gradient Variational Bayes without Fisher Matrix Analytic Calculation and Its Inversion

Authors: A. Godichon-Baggioni, D. Nguyen, M-N Tran

Abstract: This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that conv… ▽ More This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that converge to the inverse of Fisher information. The natural gradient variational Bayes algorithm without analytic expression of the Fisher matrix and its inversion is provably convergent and achieves a convergence rate of order O(log s/s), with s the number of iterations. We also obtain a central limit theorem for the iterates. Implementation of our method does not require storage of large matrices, and achieves a linear complexity in the number of variational parameters. Our algorithm exhibits versatility, making it applicable across a diverse array of variational Bayes domains, including Gaussian approximation and normalizing flow Variational Bayes. We offer a range of numerical examples to demonstrate the efficiency and reliability of the proposed variational Bayes method. △ Less

Submitted 26 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 43 pages

arXiv:2312.08747 [pdf, other]

Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference

Authors: Dat Thanh Nguyen

Abstract: In recent years, the availability of large-scale annotated datasets, such as the Stanford Natural Language Inference and the Multi-Genre Natural Language Inference, coupled with the advent of pre-trained language models, has significantly contributed to the development of the natural language inference domain. However, these crowdsourced annotated datasets often contain biases or dataset artifacts… ▽ More In recent years, the availability of large-scale annotated datasets, such as the Stanford Natural Language Inference and the Multi-Genre Natural Language Inference, coupled with the advent of pre-trained language models, has significantly contributed to the development of the natural language inference domain. However, these crowdsourced annotated datasets often contain biases or dataset artifacts, leading to overestimated model performance and poor generalization. In this work, we focus on investigating dataset artifacts and develo** strategies to address these issues. Through the utilization of a novel statistical testing procedure, we discover a significant association between vocabulary distribution and text entailment classes, emphasizing vocabulary as a notable source of biases. To mitigate these issues, we propose several automatic data augmentation strategies spanning character to word levels. By fine-tuning the ELECTRA pre-trained language model, we compare the performance of boosted models with augmented data against their baseline counterparts. The experiments demonstrate that the proposed approaches effectively enhance model accuracy and reduce biases by up to 0.66% and 1.14%, respectively. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.08737 [pdf, other]

JPIS: A Joint Model for Profile-based Intent Detection and Slot Filling with Slot-to-Intent Attention

Authors: Thinh Pham, Dat Quoc Nguyen

Abstract: Profile-based intent detection and slot filling are important tasks aimed at reducing the ambiguity in user utterances by leveraging user-specific supporting profile information. However, research in these two tasks has not been extensively explored. To fill this gap, we propose a joint model, namely JPIS, designed to enhance profile-based intent detection and slot filling. JPIS incorporates the s… ▽ More Profile-based intent detection and slot filling are important tasks aimed at reducing the ambiguity in user utterances by leveraging user-specific supporting profile information. However, research in these two tasks has not been extensively explored. To fill this gap, we propose a joint model, namely JPIS, designed to enhance profile-based intent detection and slot filling. JPIS incorporates the supporting profile information into its encoder and introduces a slot-to-intent attention mechanism to transfer slot information representations to intent detection. Experimental results show that our JPIS substantially outperforms previous profile-based models, establishing a new state-of-the-art performance in overall accuracy on the Chinese benchmark dataset ProSLU. △ Less

Submitted 16 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: To appear in Proceedings of ICASSP 2024 (Camera-ready version)

arXiv:2312.08444 [pdf, other]

doi 10.3847/1538-4357/ad4606

Hot Gas Outflow Properties of the Starburst Galaxy NGC 4945

Authors: Natalia Porraz Barrera, Sebastian Lopez, Laura A. Lopez, Adi Foord, Dustin D. Nguyen, Todd A. Thompson, Smita Mathur, Alberto D. Bolatto

Abstract: We analyze 330 ks of {\it Chandra} X-ray imaging and spectra of the nearby, edge-on starburst and Seyfert Type 2 galaxy NGC 4945 to measure the hot gas properties along the galactic outflows. We extract and model spectra from 15 regions extending from $-$0.55 kpc to $+$0.85 kpc above and below the galactic disk to determine the best-fit parameters and metal abundances. We find that the hot gas tem… ▽ More We analyze 330 ks of {\it Chandra} X-ray imaging and spectra of the nearby, edge-on starburst and Seyfert Type 2 galaxy NGC 4945 to measure the hot gas properties along the galactic outflows. We extract and model spectra from 15 regions extending from $-$0.55 kpc to $+$0.85 kpc above and below the galactic disk to determine the best-fit parameters and metal abundances. We find that the hot gas temperatures and number densities peak in the central regions and decrease along the outflows. These profiles are inconsistent with a spherical, adiabatically-expanding wind model, suggesting the need to include mass loading and/or a non-spherical outflow geometry. We estimate the mass outflow rate of the hot wind to be $1.6\:M_{\odot}~\rm{yr}^{-1}$. Emission from charge exchange is detected in the northern outflow, and we estimate it contributes 12\% to the emitted, broad-band ($0.5-7$ keV) X-ray flux. △ Less

Submitted 11 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: 11 pages, 7 figures; submitted (13 December 2023), accepted ApJ (11 April 2023)

arXiv:2312.08377 [pdf, other]

doi 10.1145/3628797.3628983

ALGNet: Attention Light Graph Memory Network for Medical Recommendation System

Authors: Minh-Van Nguyen, Duy-Thinh Nguyen, Quoc-Huy Trinh, Bac-Hoai Le

Abstract: Medication recommendation is a vital task for improving patient care and reducing adverse events. However, existing methods often fail to capture the complex and dynamic relationships among patient medical records, drug efficacy and safety, and drug-drug interactions (DDI). In this paper, we propose ALGNet, a novel model that leverages light graph convolutional networks (LGCN) and augmentation mem… ▽ More Medication recommendation is a vital task for improving patient care and reducing adverse events. However, existing methods often fail to capture the complex and dynamic relationships among patient medical records, drug efficacy and safety, and drug-drug interactions (DDI). In this paper, we propose ALGNet, a novel model that leverages light graph convolutional networks (LGCN) and augmentation memory networks (AMN) to enhance medication recommendation. LGCN can efficiently encode the patient records and the DDI graph into low-dimensional embeddings, while AMN can augment the patient representation with external knowledge from a memory module. We evaluate our model on the MIMIC-III dataset and show that it outperforms several baselines in terms of recommendation accuracy and DDI avoidance. We also conduct an ablation study to analyze the effects of different components of our model. Our results demonstrate that ALGNet can achieve superior performance with less computation and more interpretability. The implementation of this paper can be found at: https://github.com/huyquoctrinh/ALGNet. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.07831 [pdf, other]

doi 10.1145/3628797.3628921

Abusive Span Detection for Vietnamese Narrative Texts

Authors: Nhu-Thanh Nguyen, Khoa Thi-Kim Phan, Duc-Vu Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sour… ▽ More Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sourced these texts from VnExpress, Vietnam's popular online newspaper, where readers often share stories containing abusive content. Identifying and categorizing abusive spans in these texts posed significant challenges during dataset creation, but it also motivated our research. We experimented with lightweight baseline models by freezing PhoBERT and XLM-RoBERTa and using their hidden states in a BiLSTM to assess the complexity of the dataset. According to our experimental results, PhoBERT outperforms other models in both labeled and unlabeled abusive span detection tasks. These results indicate that it has the potential for future improvements. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted at SoICT 2023

arXiv:2312.07011 [pdf, ps, other]

Securing MIMO Wiretap Channel with Learning-Based Friendly Jamming under Imperfect CSI

Authors: Bui Minh Tuan, Diep N. Nguyen, Nguyen Linh Trung, Van-Dinh Nguyen, Nguyen Van Huynh, Dinh Thai Hoang, Marwan Krunz, Eryk Dutkiewicz

Abstract: Wireless communications are particularly vulnerable to eavesdrop** attacks due to their broadcast nature. To effectively deal with eavesdroppers, existing security techniques usually require accurate channel state information (CSI), e.g., for friendly jamming (FJ), and/or additional computing resources at transceivers, e.g., cryptography-based solutions, which unfortunately may not be feasible i… ▽ More Wireless communications are particularly vulnerable to eavesdrop** attacks due to their broadcast nature. To effectively deal with eavesdroppers, existing security techniques usually require accurate channel state information (CSI), e.g., for friendly jamming (FJ), and/or additional computing resources at transceivers, e.g., cryptography-based solutions, which unfortunately may not be feasible in practice. This challenge is even more acute in low-end IoT devices. We thus introduce a novel deep learning-based FJ framework that can effectively defeat eavesdrop** attacks with imperfect CSI and even without CSI of legitimate channels. In particular, we first develop an autoencoder-based communication architecture with FJ, namely AEFJ, to jointly maximize the secrecy rate and minimize the block error rate at the receiver without requiring perfect CSI of the legitimate channels. In addition, to deal with the case without CSI, we leverage the mutual information neural estimation (MINE) concept and design a MINE-based FJ scheme that can achieve comparable security performance to the conventional FJ methods that require perfect CSI. Extensive simulations in a multiple-input multiple-output (MIMO) system demonstrate that our proposed solution can effectively deal with eavesdrop** attacks in various settings. Moreover, the proposed framework can seamlessly integrate MIMO security and detection tasks into a unified end-to-end learning process. This integrated approach can significantly maximize the throughput and minimize the block error rate, offering a good solution for enhancing communication security in wireless communication systems. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 12 pages, 15 figures

arXiv:2312.05741 [pdf, other]

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention

Authors: Thinh Pham, Chi Tran, Dat Quoc Nguyen

Abstract: The research study of detecting multiple intents and filling slots is becoming more popular because of its relevance to complicated real-world situations. Recent advanced approaches, which are joint models based on graphs, might still face two potential issues: (i) the uncertainty introduced by constructing graphs based on preliminary intents and slots, which may transfer intent-slot correlation i… ▽ More The research study of detecting multiple intents and filling slots is becoming more popular because of its relevance to complicated real-world situations. Recent advanced approaches, which are joint models based on graphs, might still face two potential issues: (i) the uncertainty introduced by constructing graphs based on preliminary intents and slots, which may transfer intent-slot correlation information to incorrect label node destinations, and (ii) direct incorporation of multiple intent labels for each token w.r.t. token-level intent voting might potentially lead to incorrect slot predictions, thereby hurting the overall performance. To address these two issues, we propose a joint model named MISCA. Our MISCA introduces an intent-slot co-attention mechanism and an underlying layer of label attention mechanism. These mechanisms enable MISCA to effectively capture correlations between intents and slot labels, eliminating the need for graph construction. They also facilitate the transfer of correlation information in both directions: from intents to slots and from slots to intents, through multiple levels of label-specific representations, without relying on token-level intent information. Experimental results show that MISCA outperforms previous models, achieving new state-of-the-art overall accuracy performances on two benchmark datasets MixATIS and MixSNIPS. This highlights the effectiveness of our attention mechanisms. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: Findings of EMNLP 2023 (https://aclanthology.org/2023.findings-emnlp.841.pdf); Long paper - 10 pages; 3 figures and 3 tables

arXiv:2312.05594 [pdf, other]

Generative AI for Physical Layer Communications: A Survey

Authors: Nguyen Van Huynh, Jiacheng Wang, Hongyang Du, Dinh Thai Hoang, Dusit Niyato, Diep N. Nguyen, Dong In Kim, Khaled B. Letaief

Abstract: The recent evolution of generative artificial intelligence (GAI) leads to the emergence of groundbreaking applications such as ChatGPT, which not only enhances the efficiency of digital content production, such as text, audio, video, or even network traffic data, but also enriches its diversity. Beyond digital content creation, GAI's capability in analyzing complex data distributions offers great… ▽ More The recent evolution of generative artificial intelligence (GAI) leads to the emergence of groundbreaking applications such as ChatGPT, which not only enhances the efficiency of digital content production, such as text, audio, video, or even network traffic data, but also enriches its diversity. Beyond digital content creation, GAI's capability in analyzing complex data distributions offers great potential for wireless communications, particularly amidst a rapid expansion of new physical layer communication technologies. For example, the diffusion model can learn input signal distributions and use them to improve the channel estimation accuracy, while the variational autoencoder can model channel distribution and infer latent variables for blind channel equalization. Therefore, this paper presents a comprehensive investigation of GAI's applications for communications at the physical layer, ranging from traditional issues, including signal classification, channel estimation, and equalization, to emerging topics, such as intelligent reflecting surfaces and joint source channel coding. We also compare GAI-enabled physical layer communications with those supported by traditional AI, highlighting GAI's inherent capabilities and unique contributions in these areas. Finally, the paper discusses open issues and proposes several future research directions, laying a foundation for further exploration and advancement of GAI in physical layer communications. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.03559 [pdf, other]

MCAIMem: a Mixed SRAM and eDRAM Cell for Area and Energy-efficient on-chip AI Memory

Authors: Duy-Thanh Nguyen, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Abstract: AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies like non-volatile memory, which offers fast-read memory access and a small cell area. Despite these advantages, non-volatile mem… ▽ More AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies like non-volatile memory, which offers fast-read memory access and a small cell area. Despite these advantages, non-volatile memory's slow write memory access and high write energy consumption prevent it from surpassing SRAM performance in AI applications with extensive memory access requirements. Some research has also investigated eDRAM as an area-efficient on-chip memory with similar access times as SRAM. Still, refresh power remains a concern, leaving the trade-off between performance, area, and power consumption unresolved. To address this issue, our paper presents a novel mixed CMOS cell memory design that balances performance, area, and energy efficiency for AI memory by combining SRAM and eDRAM cells. We consider the proportion ratio of one SRAM and seven eDRAM cells in the memory to achieve area reduction using mixed CMOS cell memory. Additionally, we capitalize on the characteristics of DNN data representation and integrate asymmetric eDRAM cells to lower energy consumption. To validate our proposed MCAIMem solution, we conduct extensive simulations and benchmarking against traditional SRAM. Our results demonstrate that MCAIMem significantly outperforms these alternatives in terms of area and energy efficiency. Specifically, our MCAIMem can reduce the area by 48\% and energy consumption by 3.4$\times$ compared to SRAM designs, without incurring any accuracy loss. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03093 [pdf, other]

RESIN-EDITOR: A Schema-guided Hierarchical Event Graph Visualizer and Editor

Authors: Khanh Duy Nguyen, Zixuan Zhang, Reece Suchocki, Sha Li, Martha Palmer, Susan Brown, Jiawei Han, Heng Ji

Abstract: In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, com… ▽ More In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, comprehensive source tracing, and interactive user editing, which is more powerful and versatile than existing Information Extraction (IE) visualization tools. In our evaluation of RESIN-EDITOR, we demonstrate ways in which our tool is effective in understanding complex events and enhancing system performance. The source code, a video demonstration, and a live website for RESIN-EDITOR have been made publicly available. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: The first two authors contribute equally to this paper

arXiv:2312.02490 [pdf, other]

Constrained Twin Variational Auto-Encoder for Intrusion Detection in IoT Systems

Authors: Phai Vu Dinh, Quang Uy Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Son Pham Bao, Eryk Dutkiewicz

Abstract: Intrusion detection systems (IDSs) play a critical role in protecting billions of IoT devices from malicious attacks. However, the IDSs for IoT devices face inherent challenges of IoT systems, including the heterogeneity of IoT data/devices, the high dimensionality of training data, and the imbalanced data. Moreover, the deployment of IDSs on IoT systems is challenging, and sometimes impossible, d… ▽ More Intrusion detection systems (IDSs) play a critical role in protecting billions of IoT devices from malicious attacks. However, the IDSs for IoT devices face inherent challenges of IoT systems, including the heterogeneity of IoT data/devices, the high dimensionality of training data, and the imbalanced data. Moreover, the deployment of IDSs on IoT systems is challenging, and sometimes impossible, due to the limited resources such as memory/storage and computing capability of typical IoT devices. To tackle these challenges, this article proposes a novel deep neural network/architecture called Constrained Twin Variational Auto-Encoder (CTVAE) that can feed classifiers of IDSs with more separable/distinguishable and lower-dimensional representation data. Additionally, in comparison to the state-of-the-art neural networks used in IDSs, CTVAE requires less memory/storage and computing power, hence making it more suitable for IoT IDS systems. Extensive experiments with the 11 most popular IoT botnet datasets show that CTVAE can boost around 1% in terms of accuracy and Fscore in detection attack compared to the state-of-the-art machine learning and representation learning methods, whilst the running time for attack detection is lower than 2E-6 seconds and the model size is lower than 1 MB. We also further investigate various characteristics of CTVAE in the latent space and in the reconstruction representation to demonstrate its efficacy compared with current well-known methods. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.02185 [pdf, other]

Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition

Authors: Duc-Anh Nguyen, Cuong Pham, Nhien-An Le-Khac

Abstract: Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like user privacy and acceptance, costly set-up, operat… ▽ More Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like user privacy and acceptance, costly set-up, operation, and maintenance. To deal with this problem, we propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference. Contrastive learning is adopted to exploit the correlation among sensors. Virtual Fusion gives significantly better accuracy than training with the same single sensor, and in some cases, it even surpasses actual fusion using multiple sensors at test time. We also extend this method to a more general version called Actual Fusion within Virtual Fusion (AFVF), which uses a subset of training sensors during inference. Our method achieves state-of-the-art accuracy and F1-score on UCI-HAR and PAMAP2 benchmark datasets. Implementation is available upon request. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2312.01777 [pdf, ps, other]

Doubly 1-Bit Quantized Massive MIMO

Authors: Italo Atzeni, Antti Tölli, Duy H. N. Nguyen, A. Lee Swindlehurst

Abstract: Enabling communications in the (sub-)THz band will call for massive multiple-input multiple-output (MIMO) arrays at either the transmit- or receive-side, or at both. To scale down the complexity and power consumption when operating across massive frequency and antenna dimensions, a sacrifice in the resolution of the digital-to-analog/analog-to-digital converters (DACs/ADCs) will be inevitable. In… ▽ More Enabling communications in the (sub-)THz band will call for massive multiple-input multiple-output (MIMO) arrays at either the transmit- or receive-side, or at both. To scale down the complexity and power consumption when operating across massive frequency and antenna dimensions, a sacrifice in the resolution of the digital-to-analog/analog-to-digital converters (DACs/ADCs) will be inevitable. In this paper, we analyze the extreme scenario where both the transmit- and receive-side are equipped with fully digital massive MIMO arrays and 1-bit DACs/ADCs, which leads to a system with minimum radio-frequency complexity, cost, and power consumption. Building upon the Bussgang decomposition, we derive a tractable approximation of the mean squared error (MSE) between the transmitted data symbols and their soft estimates. Numerical results show that, despite its simplicity, a doubly 1-bit quantized massive MIMO system with very large antenna arrays can deliver an impressive performance in terms of MSE and symbol error rate. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Presented at the IEEE Asilomar Conference on Signals, Systems, and Computers 2023

arXiv:2312.01612 [pdf, other]

xNeuSM: Explainable Neural Subgraph Matching with Graph Learnable Multi-hop Attention Networks

Authors: Duc Q. Nguyen, Thanh Toan Nguyen, Tho quan

Abstract: Subgraph matching is a challenging problem with a wide range of applications in database systems, biochemistry, and cognitive science. It involves determining whether a given query graph is present within a larger target graph. Traditional graph-matching algorithms provide precise results but face challenges in large graph instances due to the NP-complete problem, limiting their practical applicab… ▽ More Subgraph matching is a challenging problem with a wide range of applications in database systems, biochemistry, and cognitive science. It involves determining whether a given query graph is present within a larger target graph. Traditional graph-matching algorithms provide precise results but face challenges in large graph instances due to the NP-complete problem, limiting their practical applicability. In contrast, recent neural network-based approximations offer more scalable solutions, but often lack interpretable node correspondences. To address these limitations, this article presents xNeuSM: Explainable Neural Subgraph Matching which introduces Graph Learnable Multi-hop Attention Networks (GLeMA) that adaptively learns the parameters governing the attention factor decay for each node across hops rather than relying on fixed hyperparameters. We provide a theoretical analysis establishing error bounds for GLeMA's approximation of multi-hop attention as a function of the number of hops. Additionally, we prove that learning distinct attention decay factors for each node leads to a correct approximation of multi-hop attention. Empirical evaluation on real-world datasets shows that xNeuSM achieves substantial improvements in prediction accuracy of up to 34% compared to approximate baselines and, notably, at least a seven-fold faster query time than exact algorithms. The source code of our implementation is available at https://github.com/martinakaduc/xNeuSM. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 33 pages, 8 figures, 6 tables

arXiv:2312.00610 [pdf]

Experiment on Gender and Racial/Ethnic Bias Against Video Game Streamers: Comparing Perceived Gameplay Skill and Viewer Engagement

Authors: David V. Nguyen, Edward F. Melcer, Deanne Adams

Abstract: Research suggests there is a perception that females and underrepresented racial/ethnic minorities have worse gameplay skills and produce less engaging video game streaming content. This bias might impact streamers' audience size, viewers' financial patronage of a streamer, streamers' sponsorship offers, etc. However, few studies on this topic use experimental methods. To fill this gap, we conduct… ▽ More Research suggests there is a perception that females and underrepresented racial/ethnic minorities have worse gameplay skills and produce less engaging video game streaming content. This bias might impact streamers' audience size, viewers' financial patronage of a streamer, streamers' sponsorship offers, etc. However, few studies on this topic use experimental methods. To fill this gap, we conducted a between-subjects survey experiment to examine if viewers are biased against video game streamers based on the streamer's gender or race/ethnicity. 200 survey participants rated the gameplay skill and viewer engagement of an identical gameplay recording. The only change between experimental conditions was the streamer's name who purportedly created the recording. The Dunnett's test found no statistically significant differences in viewer engagement ratings when comparing White male streamers to either White female (p = 0.37), Latino male (p = 0.66), or Asian male (p = 0.09) streamers. Similarly, there were no statistically significant differences in gameplay skill ratings when comparing White male streamers to either White female (p = 0.10), Latino male (p = 1.00), or Asian male (p = 0.59) streamers. Potential contributors to statistically non-significant results and counter-intuitive results (i.e., White females received non-significantly higher ratings than White males) are discussed. △ Less

Submitted 30 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.17154 [pdf, other]

Pragmatic Radiology Report Generation

Authors: Dang Nguyen, Chacha Chen, He He, Chenhao Tan

Abstract: When pneumonia is not found on a chest X-ray, should the report describe this negative observation or omit it? We argue that this question cannot be answered from the X-ray alone and requires a pragmatic perspective, which captures the communicative goal that radiology reports serve between radiologists and patients. However, the standard image-to-text formulation for radiology report generation f… ▽ More When pneumonia is not found on a chest X-ray, should the report describe this negative observation or omit it? We argue that this question cannot be answered from the X-ray alone and requires a pragmatic perspective, which captures the communicative goal that radiology reports serve between radiologists and patients. However, the standard image-to-text formulation for radiology report generation fails to incorporate such pragmatic intents. Following this pragmatic perspective, we demonstrate that the indication, which describes why a patient comes for an X-ray, drives the mentions of negative observations and introduce indications as additional input to report generation. With respect to the output, we develop a framework to identify uninferable information from the image as a source of model hallucinations, and limit them by cleaning groundtruth reports. Finally, we use indications and cleaned groundtruth reports to develop pragmatic models, and show that they outperform existing methods not only in new pragmatics-inspired metrics (+4.3 Negative F1) but also in standard metrics (+6.3 Positive F1 and +11.0 BLEU-2). △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 18 pages, 1 figure, 18 tables. Code at https://github.com/ChicagoHAI/llm_radiology

arXiv:2311.16314 [pdf, other]

Towards Designing Spatial Robots that are Architecturally Motivated

Authors: Binh Vinh Duc Nguyen, Andrew Vande Moere

Abstract: While robots are increasingly integrated into the built environment, little is known how their qualities can meaningfully influence our spaces to facilitate enjoyable and agreeable interaction, rather than robotic settings that are driven by functional goals. Motivated by the premise that future robots should be aware of architectural sensitivities, we developed a set of exploratory studies that c… ▽ More While robots are increasingly integrated into the built environment, little is known how their qualities can meaningfully influence our spaces to facilitate enjoyable and agreeable interaction, rather than robotic settings that are driven by functional goals. Motivated by the premise that future robots should be aware of architectural sensitivities, we developed a set of exploratory studies that combine methods from both architectural and interaction design. While we empirically discovered that dynamically moving spatial elements, which we coin as spatial robots, can indeed create unique life-sized affordances that encourage or resist human activities, we also encountered many unforeseen design challenges originated from how ordinary users and experts perceived spatial robots. This discussion thus could inform similar design studies in the areas of human-building architecture (HBI) or responsive and interactive architecture. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15041 [pdf, other]

MPCNN: A Novel Matrix Profile Approach for CNN-based Sleep Apnea Classification

Authors: Hieu X. Nguyen, Duong V. Nguyen, Hieu H. Pham, Cuong D. Do

Abstract: Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed wit… ▽ More Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed within the complete PQRST segments. In this study, we propose an innovative approach to address this diagnostic gap by delving deeper into the comprehensive segments of the ECG signal. The proposed methodology draws inspiration from Matrix Profile algorithms, which generate an Euclidean distance profile from fixed-length signal subsequences. From this, we derived the Min Distance Profile (MinDP), Max Distance Profile (MaxDP), and Mean Distance Profile (MeanDP) based on the minimum, maximum, and mean of the profile distances, respectively. To validate the effectiveness of our approach, we use the modified LeNet-5 architecture as the primary CNN model, along with two existing lightweight models, BAFNet and SE-MSCNN, for ECG classification tasks. Our extensive experimental results on the PhysioNet Apnea-ECG dataset revealed that with the new feature extraction method, we achieved a per-segment accuracy up to 92.11 \% and a per-recording accuracy of 100\%. Moreover, it yielded the highest correlation compared to state-of-the-art methods, with a correlation coefficient of 0.989. By introducing a new feature extraction method based on distance relationships, we enhanced the performance of certain lightweight models, showing potential for home sleep apnea test (HSAT) and SA detection in IoT devices. The source code for this work is made publicly available in GitHub: https://github.com/vinuni-vishc/MPCNN-Sleep-Apnea. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.12916 [pdf, other]

Optimal control of swee** processes in unmanned surface vehicle and nanoparticle modeling

Authors: Boris S. Mordukhovich, Dao Nguyen, Trang Nguyen

Abstract: This paper addresses novel applications to practical modeling of the newly developed theory of necessary optimality conditions in controlled swee**/Moreau processes with free time and pointwise control and state constraints. Problems of this type appear, in particular, in dynamical models dealing with unmanned surface vehicles (USVs) and nanoparticles. We formulate optimal control problems for a… ▽ More This paper addresses novel applications to practical modeling of the newly developed theory of necessary optimality conditions in controlled swee**/Moreau processes with free time and pointwise control and state constraints. Problems of this type appear, in particular, in dynamical models dealing with unmanned surface vehicles (USVs) and nanoparticles. We formulate optimal control problems for a general class of such dynamical systems and show that the developed necessary optimality conditions for constrained free-time controlled swee** processes lead us to designing efficient procedures to solve practical models of this class. Moreover, the paper contains numerical calculations of optimal solutions to marine USVs and nanoparticle models in specific situations. Overall, this study contributes to the advancement of optimal control theory for constrained swee** processes and its practical applications in the fields of marine USVs and nanoparticle modeling. △ Less

Submitted 21 November, 2023; originally announced November 2023.

MSC Class: 49J52; 49J53; 49K24; 49M25; 90C30

arXiv:2311.11349 [pdf, other]

Coverage-Validity-Aware Algorithmic Recourse

Authors: Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen

Abstract: Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the fut… ▽ More Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.11096 [pdf, other]

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on develo** better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

arXiv:2311.10640 [pdf]

Multi-delay arterial spin-labeled perfusion estimation with biophysics simulation and deep learning

Authors: Renjiu Hu, Qihao Zhang, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

Abstract: Purpose: To develop biophysics-based method for estimating perfusion Q from arterial spin labeling (ASL) images using deep learning. Methods: A 3D U-Net (QTMnet) was trained to estimate perfusion from 4D tracer propagation images. The network was trained and tested on simulated 4D tracer concentration data based on artificial vasculature structure generated by constrained constructive optimization… ▽ More Purpose: To develop biophysics-based method for estimating perfusion Q from arterial spin labeling (ASL) images using deep learning. Methods: A 3D U-Net (QTMnet) was trained to estimate perfusion from 4D tracer propagation images. The network was trained and tested on simulated 4D tracer concentration data based on artificial vasculature structure generated by constrained constructive optimization (CCO) method. The trained network was further tested in a synthetic brain ASL image based on vasculature network extracted from magnetic resonance (MR) angiography. The estimations from both trained network and a conventional kinetic model were compared in ASL images acquired from eight healthy volunteers. Results: QTMnet accurately reconstructed perfusion Q from concentration data. Relative error of the synthetic brain ASL image was 7.04% for perfusion Q, lower than the error using single-delay ASL model: 25.15% for Q, and multi-delay ASL model: 12.62% for perfusion Q. Conclusion: QTMnet provides accurate estimation on perfusion parameters and is a promising approach as a clinical ASL MRI image processing pipeline. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 32 pages, 5 figures

arXiv:2311.10219 [pdf, other]

Measuring Moral Dimensions in Social Media with Mformer

Authors: Tuan Dung Nguyen, Ziyu Chen, Nicholas George Carroll, Alasdair Tran, Colin Klein, Lexing Xie

Abstract: The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incomp… ▽ More The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incompleteness and fragility of their lexicons and from poor generalization across data domains. In this paper, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and long- and short-form online discussions. The resulting model, called Mformer, outperforms existing approaches on the same domains by 4--12% in AUC and further generalizes well to four commonly used moral text datasets, improving by up to 17% in AUC. We present case studies using Mformer to analyze everyday moral dilemmas on Reddit and controversies on Twitter, showing that moral foundations can meaningfully describe people's stance on social issues and such variations are topic-dependent. Pre-trained model and datasets are released publicly. We posit that Mformer will help the research community quantify moral dimensions for a range of tasks and data domains, and eventually contribute to the understanding of moral situations faced by humans and machines. △ Less

Submitted 19 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: To be published in ICWSM 2024

arXiv:2311.09650 [pdf, ps, other]

Levinson's theorem for two-dimensional scattering systems: it was a surprise, it is now topological!

Authors: A. Alexander, D. T. Nguyen, A. Rennie, S. Richard

Abstract: We prove a general Levinson's theorem for Schrödinger operators in two dimensions with threshold obstructions at zero energy. Our results confirm and simplify earlier seminal results of Bollé, Gesztesy et al., while providing an explicit topological interpretation. We also derive explicit formulas for the wave operators, and so show that they are elements of a $C^*$-algebra introduced by Cordes. A… ▽ More We prove a general Levinson's theorem for Schrödinger operators in two dimensions with threshold obstructions at zero energy. Our results confirm and simplify earlier seminal results of Bollé, Gesztesy et al., while providing an explicit topological interpretation. We also derive explicit formulas for the wave operators, and so show that they are elements of a $C^*$-algebra introduced by Cordes. As a consequence of our approach, we provide an evaluation of the spectral shift function at zero in the presence of $p$-resonances. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 27 pages

arXiv:2311.08070 [pdf, other]

First Principles Prediction Unveils High-T$_c$ Superconductivity in YSc$_2$H$_{24}$ Cage Structures

Authors: Viet-Ha Chu, Truong-Tho Pham, Duc-Long Nguyen

Abstract: The quest for room-temperature superconductivity has been a long-standing aspiration in the field of materials science, driving extensive research efforts. In this work, we present a novel hydride, YSc$_2$H$_{24}$, which is stable at high pressure using a crystal structure prediction approach with a fixed composition based on known structures. The discovered material is crystalline in a hexagonal… ▽ More The quest for room-temperature superconductivity has been a long-standing aspiration in the field of materials science, driving extensive research efforts. In this work, we present a novel hydride, YSc$_2$H$_{24}$, which is stable at high pressure using a crystal structure prediction approach with a fixed composition based on known structures. The discovered material is crystalline in a hexagonal unit cell with space group P6/mmm and has a fastinating structure consisting of two distinct cages: Sc@H$_{24}$ and Y@H$_{30}$. By conducting an extensive numerical investigation of lattice dynamics, electron-phonon coupling, and solving the isotropic Eliashberg equation, we have revealed a significant value of $λ$ = 2.96 as the underlying factor responsible for the remarkably high critical temperature (T$_c$) of 306-332 K in YSc$_2$H$_{24}$. As pressure increases, the T$_c$ remains above the ambient temperature. Our work has the potential to enhance the existing understanding of high-temperature superconductors, with implications for practical applications. The unique network of these cage-like structures holds great promise for advancing our understanding of high-temperature superconductors, potentially leading to innovative applications. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.07143 [pdf, other]

Learning Symmetrization for Equivariance with Orbit Distance Minimization

Authors: Tien Dat Nguyen, **woo Kim, Hongseok Yang, Seunghoon Hong

Abstract: We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance betwe… ▽ More We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method's competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 16 pages, 1 figure

arXiv:2311.06894 [pdf, other]

doi 10.1109/RIVF55975.2022.10013894

An Application of Vector Autoregressive Model for Analyzing the Impact of Weather And Nearby Traffic Flow On The Traffic Volume

Authors: Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Trong-Hop Do

Abstract: This paper aims to predict the traffic flow at one road segment based on nearby traffic volume and weather conditions. Our team also discover the impact of weather conditions and nearby traffic volume on the traffic flow at a target point. The analysis results will help solve the problem of traffic flow prediction and develop an optimal transport network with efficient traffic movement and minimal… ▽ More This paper aims to predict the traffic flow at one road segment based on nearby traffic volume and weather conditions. Our team also discover the impact of weather conditions and nearby traffic volume on the traffic flow at a target point. The analysis results will help solve the problem of traffic flow prediction and develop an optimal transport network with efficient traffic movement and minimal traffic congestion. Hourly historical weather and traffic flow data are selected to solve this problem. This paper uses model VAR(36) with time trend and constant to train the dataset and forecast. With an RMSE of 565.0768111 on average, the model is considered appropriate although some statistical tests implies that the residuals are unstable and non-normal. Also, this paper points out some variables that are not useful in forecasting, which helps simplify the data-collecting process when building the forecasting system. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: International Conference on Computing and Communication Technologies (RIVF2022)

Report number: D1-2022-48

arXiv:2311.06851 [pdf, other]

Automatic Textual Normalization for Hate Speech Detection

Authors: Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Nguyet Thi Nguyen, Khanh Thanh-Duy Ho, Kiet Van Nguyen

Abstract: Social media data is a valuable resource for research, yet it contains a wide range of non-standard words (NSW). These irregularities hinder the effective operation of NLP tools. Current state-of-the-art methods for the Vietnamese language address this issue as a problem of lexical normalization, involving the creation of manual rules or the implementation of multi-staged deep learning frameworks,… ▽ More Social media data is a valuable resource for research, yet it contains a wide range of non-standard words (NSW). These irregularities hinder the effective operation of NLP tools. Current state-of-the-art methods for the Vietnamese language address this issue as a problem of lexical normalization, involving the creation of manual rules or the implementation of multi-staged deep learning frameworks, which necessitate extensive efforts to craft intricate rules. In contrast, our approach is straightforward, employing solely a sequence-to-sequence (Seq2Seq) model. In this research, we provide a dataset for textual normalization, comprising 2,181 human-annotated comments with an inter-annotator agreement of 0.9014. By leveraging the Seq2Seq model for textual normalization, our results reveal that the accuracy achieved falls slightly short of 70%. Nevertheless, textual normalization enhances the accuracy of the Hate Speech Detection (HSD) task by approximately 2%, demonstrating its potential to improve the performance of complex NLP tasks. Our dataset is accessible for research purposes. △ Less

Submitted 4 December, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: Accepted to present at 2023 International Conference on Intelligent Systems Design and Applications (ISDA2023)

arXiv:2311.06660 [pdf, ps, other]

On asymptotic properties of solutions to $σ$-evolution equations with general double dam**

Authors: Tuan Anh Dao, Dinh Van Duong, Duc Anh Nguyen

Abstract: In this paper, we would like to consider the Cauchy problem for semi-linear $σ$-evolution equations with double structural dam** for any $σ\ge 1$. The main purpose of the present work is to not only study the asymptotic profiles of solutions to the corresponding linear equations but also describe large-time behaviors of globally obtained solutions to the semi-linear equations. We want to emphasi… ▽ More In this paper, we would like to consider the Cauchy problem for semi-linear $σ$-evolution equations with double structural dam** for any $σ\ge 1$. The main purpose of the present work is to not only study the asymptotic profiles of solutions to the corresponding linear equations but also describe large-time behaviors of globally obtained solutions to the semi-linear equations. We want to emphasize that the new contribution is to find out the sharp interplay of ``parabolic like models" corresponding to $σ_1 \in [0,σ/2)$ and ``$σ$-evolution like models" corresponding to $σ_2 \in (σ/2,σ]$, which together appear in an equation. In this connection, we understand clearly how each dam** term influences the asymptotic properties of solutions. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 29 pages

MSC Class: 35B40; 35B44; 35L30; 35L56

arXiv:2311.04918 [pdf, other]

Low-Resource Named Entity Recognition: Can One-vs-All AUC Maximization Help?

Authors: Ngoc Dang Nguyen, Wei Tan, Lan Du, Wray Buntine, Richard Beare, Changyou Chen

Abstract: Named entity recognition (NER), a task that identifies and categorizes named entities such as persons or organizations from text, is traditionally framed as a multi-class classification problem. However, this approach often overlooks the issues of imbalanced label distributions, particularly in low-resource settings, which is common in certain NER contexts, like biomedical NER (bioNER). To address… ▽ More Named entity recognition (NER), a task that identifies and categorizes named entities such as persons or organizations from text, is traditionally framed as a multi-class classification problem. However, this approach often overlooks the issues of imbalanced label distributions, particularly in low-resource settings, which is common in certain NER contexts, like biomedical NER (bioNER). To address these issues, we propose an innovative reformulation of the multi-class problem as a one-vs-all (OVA) learning problem and introduce a loss function based on the area under the receiver operating characteristic curve (AUC). To enhance the efficiency of our OVA-based approach, we propose two training strategies: one groups labels with similar linguistic characteristics, and another employs meta-learning. The superiority of our approach is confirmed by its performance, which surpasses traditional NER learning in varying NER settings. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 6 pages, 3 figures, ICDM 2023

arXiv:2311.02945 [pdf, ps, other]

PhoGPT: Generative Pre-training for Vietnamese

Authors: Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui

Abstract: We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat vari… ▽ More We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat variant, PhoGPT-4B-Chat, is the modeling output obtained by fine-tuning PhoGPT-4B on a dataset of 70K instructional prompts and their responses, along with an additional 290K conversations. In addition, we also demonstrate its superior performance compared to previous open-source models. Our PhoGPT models are available at: https://github.com/VinAIResearch/PhoGPT △ Less

Submitted 22 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: PhoGPT-4B Technical Report - 5 pages

arXiv:2311.02735 [pdf, other]

doi 10.1016/j.physa.2023.129452

Remark on the Entropy Production of Adaptive Run-and-Tumble Chemotaxis

Authors: Minh D. N. Nguyen, Phuc H. Pham, Khang V. Ngo, Van H. Do, Shengkai Li, Trung V. Phan

Abstract: Chemotactic active particles, such as bacteria and cells, exhibit an adaptive run-and-tumble motion, giving rise to complex emergent behaviors in response to external chemical fields. This motion is generated by the conversion of internal chemical energy into self-propulsion, allowing each agent to sustain a steady-state far from thermal equilibrium and perform works. The rate of entropy productio… ▽ More Chemotactic active particles, such as bacteria and cells, exhibit an adaptive run-and-tumble motion, giving rise to complex emergent behaviors in response to external chemical fields. This motion is generated by the conversion of internal chemical energy into self-propulsion, allowing each agent to sustain a steady-state far from thermal equilibrium and perform works. The rate of entropy production serves as an indicates of how extensive these agents operate away from thermal equilibrium, providing a measure for estimating maximum obtainable power. Here we present the general framework for calculating the entropy production rate created by such population of agents from the first principle, using the minimal model of bacterial adaptive chemotaxis, as they execute the most basic collective action -- the mass transport. △ Less

Submitted 27 January, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

Journal ref: Physica A: Statistical Mechanics and its Applications 634 (2024): 129452

Showing 151–200 of 2,441 results for author: Nguyen, D