-
ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development
Authors:
Ta Duc Huy,
Nguyen Anh Tu,
Tran Hoang Vu,
Nguyen Phuc Minh,
Nguyen Phan,
Trung H. Bui,
Steven Q. H. Truong
Abstract:
Existing medical text datasets usually take the form of question and answer pairs that support the task of natural language generation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag…
▽ More
Existing medical text datasets usually take the form of question and answer pairs that support the task of natural language generation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag sets for two tasks are in medical domain and can facilitate the development of task-oriented healthcare chatbots with better comprehension of queries from patients. We train baseline models for the two tasks and propose a simple self-supervised training strategy with span-noise modelling that substantially improves the performance. Dataset and code will be published at https://github.com/tadeephuy/ViMQ
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Exact SOHS decompositions of trigonometric univariate polynomials with Gaussian coefficients
Authors:
Victor Magron,
Mohab Safey El Din,
Markus Schweighofer,
Trung Hieu Vu
Abstract:
Certifying the positivity of trigonometric polynomials is of first importance for design problems in discrete-time signal processing. It is well known from the Riesz-Fejéz spectral factorization theorem that any trigonometric univariate polynomial positive on the unit circle can be decomposed as a Hermitian square with complex coefficients. Here we focus on the case of polynomials with Gaussian in…
▽ More
Certifying the positivity of trigonometric polynomials is of first importance for design problems in discrete-time signal processing. It is well known from the Riesz-Fejéz spectral factorization theorem that any trigonometric univariate polynomial positive on the unit circle can be decomposed as a Hermitian square with complex coefficients. Here we focus on the case of polynomials with Gaussian integer coefficients, i.e., with real and imaginary parts being integers. We design, analyze and compare, theoretically and practically,three hybrid numeric-symbolic algorithms computing weighted sums of Hermitian squares decompositions for trigonometric univariate polynomials positive on the unit circle with Gaussian coefficients. The numerical steps the first and second algorithm rely on are complex root isolation and semidefinite programming, respectively. An exact sum of Hermitian squares decomposition is obtained thanks to compensation techniques. The third algorithm, also based on complex semidefinite programming, is an adaptation of the rounding and projection algorithm by Peyrl and Parrilo. For all three algorithms, we prove bit complexity and output size estimates that are polynomial in the degree of the input and linear in the maximum bitsize of its coefficients. We compare their performance on randomly chosen benchmarks, and further design a certified finite impulse filter.
△ Less
Submitted 4 October, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Signal Classification under structure sparsity constraints
Authors:
Tiep Huu Vu
Abstract:
Object Classification is a key direction of research in signal and image processing, computer vision and artificial intelligence. The goal is to come up with algorithms that automatically analyze images and put them in predefined categories. This dissertation focuses on the theory and application of sparse signal processing and learning algorithms for image processing and computer vision, especial…
▽ More
Object Classification is a key direction of research in signal and image processing, computer vision and artificial intelligence. The goal is to come up with algorithms that automatically analyze images and put them in predefined categories. This dissertation focuses on the theory and application of sparse signal processing and learning algorithms for image processing and computer vision, especially object classification problems. A key emphasis of this work is to formulate novel optimization problems for learning dictionary and structured sparse representations. Tractable solutions are proposed subsequently for the corresponding optimization problems.
An important goal of this dissertation is to demonstrate the wide applications of these algorithmic tools for real-world applications. To that end, we explored important problems in the areas of:
1. Medical imaging: histopathological images acquired from mammalian tissues, human breast tissues, and human brain tissues.
2. Low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture radar: detecting bombs and mines buried under rough surfaces.
3. General object classification: face, flowers, objects, dogs, indoor scenes, etc.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.
-
Adaptive matching pursuit for sparse signal recovery
Authors:
Tiep H. Vu,
Hojjat S. Mousavi,
Vishal Monga
Abstract:
Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming p…
▽ More
Spike and Slab priors have been of much recent interest in signal processing as a means of inducing sparsity in Bayesian inference. Applications domains that benefit from the use of these priors include sparse recovery, regression and classification. It is well-known that solving for the sparse coefficient vector to maximize these priors results in a hard non-convex and mixed integer programming problem. Most existing solutions to this optimization problem either involve simplifying assumptions/relaxations or are computationally expensive. We propose a new greedy and adaptive matching pursuit (AMP) algorithm to directly solve this hard problem. Essentially, in each step of the algorithm, the set of active elements would be updated by either adding or removing one index, whichever results in better improvement. In addition, the intermediate steps of the algorithm are calculated via an inexpensive Cholesky decomposition which makes the algorithm much faster. Results on simulated data sets as well as real-world image recovery challenges confirm the benefits of the proposed AMP, particularly in providing a superior cost-quality trade-off over existing alternatives.
△ Less
Submitted 12 September, 2016;
originally announced October 2016.
-
Learning a low-rank shared dictionary for object classification
Authors:
Tiep H. Vu,
Vishal Monga
Abstract:
Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. Inspired by this observation, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (c…
▽ More
Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. Inspired by this observation, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e. claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Further, we propose a new fast and accurate algorithm to solve the sparse coding problems in the learning step, accelerating its convergence. The said algorithm could also be applied to FDDL and its extensions. Experimental results on widely used image databases establish the advantages of our method over state-of-the-art dictionary learning methods.
△ Less
Submitted 17 May, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
Histopathological Image Classification using Discriminative Feature-oriented Dictionary Learning
Authors:
Tiep Huu Vu,
Hojjat Seyed Mousavi,
Vishal Monga,
Arvind UK Rao,
Ganesh Rao
Abstract:
In histopathological image analysis, feature extraction for classification is a challenging task due to the diversity of histology features suitable for each problem as well as presence of rich geometrical structures. In this paper, we propose an automatic feature discovery framework via learning class-specific dictionaries and present a low-complexity method for classification and disease grading…
▽ More
In histopathological image analysis, feature extraction for classification is a challenging task due to the diversity of histology features suitable for each problem as well as presence of rich geometrical structures. In this paper, we propose an automatic feature discovery framework via learning class-specific dictionaries and present a low-complexity method for classification and disease grading in histopathology. Essentially, our Discriminative Feature-oriented Dictionary Learning (DFDL) method learns class-specific dictionaries such that under a sparsity constraint, the learned dictionaries allow representing a new image sample parsimoniously via the dictionary corresponding to the class identity of the sample. At the same time, the dictionary is designed to be poorly capable of representing samples from other classes. Experiments on three challenging real-world image databases: 1) histopathological images of intraductal breast lesions, 2) mammalian kidney, lung and spleen images provided by the Animal Diagnostics Lab (ADL) at Pennsylvania State University, and 3) brain tumor images from The Cancer Genome Atlas (TCGA) database, reveal the merits of our proposal over state-of-the-art alternatives. {Moreover, we demonstrate that DFDL exhibits a more graceful decay in classification accuracy against the number of training images which is highly desirable in practice where generous training is often not available
△ Less
Submitted 29 March, 2016; v1 submitted 16 June, 2015;
originally announced June 2015.
-
DFDL: Discriminative Feature-oriented Dictionary Learning for Histopathological Image Classification
Authors:
Tiep H. Vu,
Hojjat S. Mousavi,
Vishal Monga,
UK Arvind Rao,
Ganesh Rao
Abstract:
In histopathological image analysis, feature extraction for classification is a challenging task due to the diversity of histology features suitable for each problem as well as presence of rich geometrical structure. In this paper, we propose an automatic feature discovery framework for extracting discriminative class-specific features and present a low-complexity method for classification and dis…
▽ More
In histopathological image analysis, feature extraction for classification is a challenging task due to the diversity of histology features suitable for each problem as well as presence of rich geometrical structure. In this paper, we propose an automatic feature discovery framework for extracting discriminative class-specific features and present a low-complexity method for classification and disease grading in histopathology. Essentially, our Discriminative Feature-oriented Dictionary Learning (DFDL) method learns class-specific features which are suitable for representing samples from the same class while are poorly capable of representing samples from other classes. Experiments on three challenging real-world image databases: 1) histopathological images of intraductal breast lesions, 2) mammalian lung images provided by the Animal Diagnostics Lab (ADL) at Pennsylvania State University, and 3) brain tumor images from The Cancer Genome Atlas (TCGA) database, show the significance of DFDL model in a variety problems over state-of-the-art methods
△ Less
Submitted 3 February, 2015;
originally announced February 2015.