Search | arXiv e-print repository

Using Game Engines and Machine Learning to Create Synthetic Satellite Imagery for a Tabletop Verification Exercise

Authors: Johannes Hoster, Sara Al-Sayed, Felix Biessmann, Alexander Glaser, Kristian Hildebrand, Igor Moric, Tuong Vy Nguyen

Abstract: Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly availab… ▽ More Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly available satellite imagery. In this article, we demonstrate how modern game engines combined with advanced machine-learning techniques can be used to generate synthetic imagery of sites of interest with the ability to choose relevant parameters upon request; these include time of day, cloud cover, season, or level of activity onsite. At the same time, resolution and off-nadir angle can be adjusted to simulate different characteristics of the satellite. While there are several possible use-cases for synthetic imagery, here we focus on its usefulness to support tabletop exercises in which simple monitoring scenarios can be examined to better understand verification capabilities enabled by new satellite constellations and very short revisit times. △ Less

Submitted 23 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Annual Meeting of the Institute of Nuclear Materials Management (INMM), Vienna

arXiv:2404.07754 [pdf, other]

Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification

Authors: Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann

Abstract: Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in reali… ▽ More Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in realistic high-resolution image generation. Given these developments, issues of data authentication in monitoring and verification deserve a careful and systematic analysis: How realistic are synthetic images? How easily can they be generated? How useful are they for ML researchers, and what is their potential for Open Science? In this work, we use novel DL models to explore how synthetic satellite images can be created using conditioning mechanisms. We investigate the challenges of synthetic satellite image generation and evaluate the results based on authenticity and state-of-the-art metrics. Furthermore, we investigate how synthetic data can alleviate the lack of data in the context of ML methods for remote-sensing. Finally we discuss implications of synthetic satellite imagery in the context of monitoring and verification. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: https://resources.inmm.org/annual-meeting-proceedings/generating-synthetic-satellite-imagery-deep-learning-text-image-models

Journal ref: Presented at the Annual Meeting of the Institute of Nuclear Materials Management (INMM), Vienna, 2023

arXiv:2403.08876 [pdf, other]

ARtVista: Gateway To Empower Anyone Into Artist

Authors: Trong-Vu Hoang, Quang-Binh Nguyen, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVis… ▽ More Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVista not only recommends reference images aligned with users' abstract ideas and generates sketches for users to draw but also goes beyond, crafting vibrant paintings in various painting styles. ARtVista also offers users an alternative approach to create striking paintings by simulating the paint-by-number concept on reference images, empowering users to create visually stunning artwork devoid of the necessity for advanced drawing skills. We perform a pilot study and reveal positive feedback on its usability, emphasizing its effectiveness in visualizing user ideas and aiding the painting process to achieve stunning pictures without requiring advanced drawing skills. The source code will be available at https://github.com/htrvu/ARtVista. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2403.08746 [pdf, other]

iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer

Authors: Dinh-Khoi Vo, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an… ▽ More Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly interface, iCONTRA enables both experienced designers and novices to effortlessly explore creative design concepts and efficiently generate thematic collections. We also propose a zero-shot image editing algorithm, eliminating the need for fine-tuning models, which gradually integrates information from initial objects, ensuring consistency in the generation process without influencing the background. A pilot study suggests iCONTRA's potential to reduce designers' efforts. Experimental results demonstrate its effectiveness in producing consistent and high-quality object concept transfers. iCONTRA stands as a promising tool for innovation and creative exploration in thematic collection design. The source code will be available at: https://github.com/vdkhoi20/iCONTRA. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2402.13554 [pdf, ps, other]

Secrecy Performance Analysis of Space-to-Ground Optical Satellite Communications

Authors: Thang V. Nguyen, Thanh V. Pham, Anh T. Pham, Dang T. Ngoc

Abstract: Free-space optics (FSO)-based satellite communication systems have recently received considerable attention due to their enhanced capacity compared to their radio frequency (RF) counterparts. This paper analyzes the performance of physical layer security of space-to-ground intensity modulation/direct detection FSO satellite links under the effect of atmospheric loss, misalignment, cloud attenuatio… ▽ More Free-space optics (FSO)-based satellite communication systems have recently received considerable attention due to their enhanced capacity compared to their radio frequency (RF) counterparts. This paper analyzes the performance of physical layer security of space-to-ground intensity modulation/direct detection FSO satellite links under the effect of atmospheric loss, misalignment, cloud attenuation, and atmospheric turbulence-induced fading. Specifically, a wiretap channel consisting of a legitimate transmitter Alice (i.e., the satellite), a legitimate user Bob, and an eavesdropper Eve over turbulence channels modeled by the Fisher-Snedecor $\mathcal{F}$ distribution is considered. The secrecy performance in terms of the average secrecy capacity, secrecy outage probability, and strictly positive secrecy capacity are derived in closed-form. Simulation results reveal significant impacts of satellite altitude, zenith angle, and turbulence strength on the secrecy performance. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2308.15660 [pdf, other]

Unveiling Camouflage: A Learnable Fourier-based Augmentation for Camouflaged Object Detection and Instance Segmentation

Authors: Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do

Abstract: Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve the performance of models by inc… ▽ More Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve the performance of models by increasing the size and diversity of the training data and exposing the model to a wider range of variations in the data. Besides, we aim to automatically learn transformations that help to reveal the underlying structure of camouflaged objects and allow the model to learn to better identify and segment camouflaged objects. To achieve this, we propose a learnable augmentation method in the frequency domain for COD and CIS via Fourier transform approach, dubbed CamoFourier. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swap** with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image. This approach aims to make camouflaged objects more visible for detection and segmentation models. Without bells and whistles, our proposed augmentation method boosts the performance of camouflaged object detectors and camouflaged instance segmenters by large margins. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.15005 [pdf, other]

Few-Shot Object Detection via Synthetic Features with Optimal Transport

Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Vinh-Tiep Nguyen, Tam Le, Minh-Triet Tran, Tam V. Nguyen

Abstract: Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. However, most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes that lack diversity. Hence, they may not be sufficient to capture the data distribution. To address that limitation, in this paper, we propose a novel… ▽ More Few-shot object detection aims to simultaneously localize and classify the objects in an image with limited training samples. However, most existing few-shot object detection methods focus on extracting the features of a few samples of novel classes that lack diversity. Hence, they may not be sufficient to capture the data distribution. To address that limitation, in this paper, we propose a novel approach in which we train a generator to generate synthetic data for novel classes. Still, directly training a generator on the novel class is not effective due to the lack of novel data. To overcome that issue, we leverage the large-scale dataset of base classes. Our overarching goal is to train a generator that captures the data variations of the base dataset. We then transform the captured variations into novel classes by generating synthetic data with the trained generator. To encourage the generator to capture data variations on base classes, we propose to train the generator with an optimal transport loss that minimizes the optimal transport distance between the distributions of real and synthetic data. Extensive experiments on two benchmark datasets demonstrate that the proposed method outperforms the state of the art. Source code will be available. △ Less

Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.13798 [pdf, other]

DM-VTON: Distilled Mobile Real-time Virtual Try-On

Authors: Khoi-Nguyen Nguyen-Ngoc, Thanh-Tung Phan-Nguyen, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shop** platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output qu… ▽ More The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shop** platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output quality, they often disregard the execution time, which restricts their applications on a limited range of devices. To address this gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel virtual try-on framework designed to achieve simplicity and efficiency. Our approach is based on a knowledge distillation scheme that leverages a strong Teacher network as supervision to guide a Student network without relying on human parsing. Notably, we introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime while ensuring high-quality output. Additionally, we propose Virtual Try-on-guided Pose for Data Synthesis to address the limited pose variation observed in training images. Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory while producing almost the same output quality as other state-of-the-art methods. DM-VTON stands poised to facilitate the advancement of real-time AR applications, in addition to the generation of lifelike attired human figures tailored for diverse specialized training tasks. https://sites.google.com/view/ltnghia/research/DMVTON △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepted to ISMAR 2023 (Poster paper)

arXiv:2308.13795 [pdf, other]

VIDES: Virtual Interior Design via Natural Language and Visual Guidance

Authors: Minh-Hien Le, Chi-Bien Chu, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts… ▽ More Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts quickly, given user text description and visual guidance. Using both visual guidance and language as the conditional inputs significantly enhances the accuracy and coherence of the generated scenes, resulting in visually appealing designs. Through extensive experimentation, we demonstrate the effectiveness of VIDES in develo** new indoor concepts, changing indoor styles, and replacing and removing interior objects. The system successfully captures the essence of users' descriptions while providing flexibility for customization. Consequently, this system can potentially reduce the entry barrier for indoor design, making it more accessible to users with limited technical skills and reducing the time required to create high-quality images. Individuals who have a background in design can now easily communicate their ideas visually and effectively present their design concepts. https://sites.google.com/view/ltnghia/research/VIDES △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepted to ISMAR 2023 (Poster paper)

arXiv:2304.07459 [pdf, other]

doi 10.1109/TIP.2023.3267621

Instance-level Few-shot Learning with Class Hierarchy Mining

Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Tam V. Nguyen

Abstract: Few-shot learning is proposed to tackle the problem of scarce training data in novel classes. However, prior works in instance-level few-shot learning have paid less attention to effectively utilizing the relationship between categories. In this paper, we exploit the hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. The… ▽ More Few-shot learning is proposed to tackle the problem of scarce training data in novel classes. However, prior works in instance-level few-shot learning have paid less attention to effectively utilizing the relationship between categories. In this paper, we exploit the hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. These features are extracted from abundant data of base classes, which could be utilized to reasonably describe classes with scarce data. Specifically, we propose a novel superclass approach that automatically creates a hierarchy considering base and novel classes as fine-grained classes for few-shot instance segmentation (FSIS). Based on the hierarchical information, we design a novel framework called Soft Multiple Superclass (SMS) to extract relevant features or characteristics of classes in the same superclass. A new class assigned to the superclass is easier to classify by leveraging these relevant features. Besides, in order to effectively train the hierarchy-based-detector in FSIS, we apply the label refinement to further describe the associations between fine-grained classes. The extensive experiments demonstrate the effectiveness of our method on FSIS benchmarks. Code is available online. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: accepted by IEEE Transactions on Image Processing

arXiv:2304.07444 [pdf, other]

The Art of Camouflage: Few-shot Learning for Animal Detection and Segmentation

Authors: Thanh-Danh Nguyen, Anh-Khoa Nguyen Vu, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Minh-Triet Tran, Tam V. Nguyen

Abstract: Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data of camouflaged objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark.… ▽ More Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data of camouflaged objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark. We then propose a novel method to efficiently detect and segment the camouflaged objects in the images. In particular, we introduce the instance triplet loss and the instance memory storage. The extensive experiments demonstrated that our proposed method achieves state-of-the-art performance on the newly collected dataset. △ Less

Submitted 21 January, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Under-review Journal

arXiv:2304.06053 [pdf, other]

TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval

Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, E-Ro Nguyen, Minh-Khoi Nguyen-Nhat, Tuan-An To, Trung-Truc Huynh-Le, Nham-Tan Nguyen, Hoang-Chau Luong , et al. (8 additional authors not shown)

Abstract: 3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC chall… ▽ More 3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D animal models. Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval. Despite the increased difficulty, we believe this task can potentially drive useful applications in practice and facilitate more intuitive interactions with 3D objects. Five groups participated in our competition, submitting a total of 114 runs. While the results obtained in our competition are satisfactory, we note that the challenges presented by this task are far from fully solved. As such, we provide insights into potential areas for future research and improvements. We believe we can help push the boundaries of 3D object retrieval and facilitate more user-friendly interactions via vision-language technologies. https://aichallenge.hcmus.edu.vn/textanimar △ Less

Submitted 9 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to Computers and Graphics (3DOR, Journal Track)

arXiv:2304.05731 [pdf, other]

SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval

Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Ngoc-Linh Nguyen-Ha, Tuong-Vy Truong-Thuy, Truong Hoai Phong, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran , et al. (9 additional authors not shown)

Abstract: The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this… ▽ More The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches. Furthermore, a new dataset named ANIMAR was constructed in this study, comprising a collection of 711 unique 3D animal models and 140 corresponding sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs. Although further improvement is necessary, the proposed task has the potential to incentivize additional research in the domain of 3D object retrieval, potentially yielding benefits for a wide range of applications. We also provide insights into potential areas of future research, such as improving techniques for feature extraction and matching and creating more diverse datasets to evaluate retrieval performance. https://aichallenge.hcmus.edu.vn/sketchanimar △ Less

Submitted 9 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to Computers & Graphics (3DOR 2023, Journal track)

arXiv:2303.05105 [pdf, other]

MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation

Authors: Minh-Quan Le, Tam V. Nguyen, Trung-Nghia Le, Thanh-Toan Do, Minh N. Do, Minh-Triet Tran

Abstract: Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (\eg mean of $K-$shot) for pr… ▽ More Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (\eg mean of $K-$shot) for prediction, leading to performance instability. To overcome the disadvantage of the point estimation mechanism, we propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask, which is conditioned on an object region and $K-$shot information. Inspired by augmentation approaches that perturb data with Gaussian noise for populating low data density regions, we model the mask distribution with a diffusion probabilistic model. We also propose to utilize classifier-free guided mask sampling to integrate category information into the binary mask generation process. Without bells and whistles, our proposed method consistently outperforms state-of-the-art methods on both base and novel classes of the COCO dataset while simultaneously being more stable than existing methods. The source code is available at: https://github.com/minhquanlecs/MaskDiff. △ Less

Submitted 21 January, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted at AAAI 2024 (oral presentation)

arXiv:2301.12540 [pdf, other]

Implicit Regularization for Group Sparsity

Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

Abstract: We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In… ▽ More We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments. △ Less

Submitted 29 January, 2023; originally announced January 2023.

Comments: accepted by ICLR 2023

arXiv:2209.04794 [pdf, other]

doi 10.1371/journal.pone.0276545

Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese

Authors: Thao T. B. Nguyen, Tam M. Vo, Thang V. Nguyen, Hieu H. Pham, Ha Q. Nguyen

Abstract: We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we… ▽ More We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with various model architectures to select the one that delivers the best performance. Our best model (CheXpert-pretrained EfficientNet-B2) yields an F1-score of 0.6989 (95% CI 0.6740, 0.7240), AUC of 0.7912, sensitivity of 0.7064 and specificity of 0.8760 for the abnormal diagnosis in general. Finally, we demonstrate that our coarse classification (based on five specific locations of abnormalities) yields comparable results to fine classification (twelve pathologies) on the benchmark CheXpert dataset for general anomaly detection while delivering better performance in terms of the average performance of all classes. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: This work has been provisionally accepted for publication by Plos One journal

arXiv:2203.11206 [pdf, ps, other]

doi 10.1002/mp.15551

Phase Recognition in Contrast-Enhanced CT Scans based on Deep Learning and Random Sampling

Authors: Binh T. Dao, Thang V. Nguyen, Hieu H. Pham, Ha Q. Nguyen

Abstract: A fully automated system for interpreting abdominal computed tomography (CT) scans with multiple phases of contrast enhancement requires an accurate classification of the phases. This work aims at develo** and validating a precise, fast multi-phase classifier to recognize three main types of contrast phases in abdominal CT scans. We propose in this study a novel method that uses a random samplin… ▽ More A fully automated system for interpreting abdominal computed tomography (CT) scans with multiple phases of contrast enhancement requires an accurate classification of the phases. This work aims at develo** and validating a precise, fast multi-phase classifier to recognize three main types of contrast phases in abdominal CT scans. We propose in this study a novel method that uses a random sampling mechanism on top of deep CNNs for the phase recognition of abdominal CT scans of four different phases: non-contrast, arterial, venous, and others. The CNNs work as a slice-wise phase prediction, while the random sampling selects input slices for the CNN models. Afterward, majority voting synthesizes the slice-wise results of the CNNs, to provide the final prediction at scan level. Our classifier was trained on 271,426 slices from 830 phase-annotated CT scans, and when combined with majority voting on 30% of slices randomly chosen from each scan, achieved a mean F1-score of 92.09% on our internal test set of 358 scans. The proposed method was also evaluated on 2 external test sets: CTPAC-CCRCC (N = 242) and LiTS (N = 131), which were annotated by our experts. Although a drop in performance has been observed, the model performance remained at a high level of accuracy with a mean F1-score of 76.79% and 86.94% on CTPAC-CCRCC and LiTS datasets, respectively. Our experimental results also showed that the proposed method significantly outperformed the state-of-the-art 3D approaches while requiring less computation time for inference. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Accepted for publication by Medical Physics

arXiv:2203.06732 [pdf, other]

BioSimulators: a central registry of simulation engines and services for recommending specific tools

Authors: Bilal Shaikh, Lucian P. Smith, Dan Vasilescu, Gnaneswara Marupilla, Michael Wilson, Eran Agmon, Henry Agnew, Steven S. Andrews, Azraf Anwar, Moritz E. Beber, Frank T. Bergmann, David Brooks, Lutz Brusch, Laurence Calzone, Kiri Choi, Joshua Cooper, John Detloff, Brian Drawert, Michel Dumontier, G. Bard Ermentrout, James R. Faeder, Andrew P. Freiburger, Fabian Fröhlich, Akira Funahashi, Alan Garny , et al. (46 additional authors not shown)

Abstract: Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find… ▽ More Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line, and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML, and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Comments: 6 pages, 2 figures

arXiv:2112.06489 [pdf, other]

Multi-Modal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing

Authors: Tuan Hoang, Thanh-Toan Do, Tam V. Nguyen, Ngai-Man Cheung

Abstract: In this paper, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH). First, to learn informative representations that can preserve both intra- and inter-modal similarities, we leverage the recent advances in estimating… ▽ More In this paper, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH). First, to learn informative representations that can preserve both intra- and inter-modal similarities, we leverage the recent advances in estimating variational lower-bound of MI to maximize the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modelled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intra- and inter-modal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2112.06193 [pdf, other]

GUNNEL: Guided Mixup Augmentation and Multi-View Fusion for Aquatic Animal Segmentation

Authors: Minh-Quan Le, Trung-Nghia Le, Tam V. Nguyen, Isao Echizen, Minh-Triet Tran

Abstract: Recent years have witnessed great advances in object segmentation research. In addition to generic objects, aquatic animals have attracted research attention. Deep learning-based methods are widely used for aquatic animal segmentation and have achieved promising performance. However, there is a lack of challenging datasets for benchmarking. In this work, we build a new dataset dubbed Aquatic Anima… ▽ More Recent years have witnessed great advances in object segmentation research. In addition to generic objects, aquatic animals have attracted research attention. Deep learning-based methods are widely used for aquatic animal segmentation and have achieved promising performance. However, there is a lack of challenging datasets for benchmarking. In this work, we build a new dataset dubbed Aquatic Animal Species. We also devise a novel GUided mixup augmeNtatioN and multi-modEl fusion for aquatic animaL segmentation (GUNNEL) that leverages the advantages of multiple segmentation models to effectively segment aquatic animals and improves the training performance by synthesizing hard samples. Extensive experiments demonstrated the superiority of our proposed framework over existing state-of-the-art instance segmentation methods. The code is available at https://github.com/lmquan2000/mask-mixup. The dataset is available at https://doi.org/10.5281/zenodo.8208877 . △ Less

Submitted 10 August, 2023; v1 submitted 12 December, 2021; originally announced December 2021.

Comments: The code is available at https://github.com/lmquan2000/mask-mixup . The dataset is available at https://doi.org/10.5281/zenodo.8208877

arXiv:2108.06486 [pdf, other]

Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest Radiographs Using Deep Convolutional Neural Networks

Authors: Thanh T. Tran, Hieu H. Pham, Thang V. Nguyen, Tung T. Le, Hieu T. Nguyen, Ha Q. Nguyen

Abstract: Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. I… ▽ More Chest radiograph (CXR) interpretation in pediatric patients is error-prone and requires a high level of understanding of radiologic expertise. Recently, deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting CXR in adults. However, there is a lack of evidence indicating that D-CNNs can recognize accurately multiple lung pathologies from pediatric CXR scans. In particular, the development of diagnostic models for the detection of pediatric chest diseases faces significant challenges such as (i) lack of physician-annotated datasets and (ii) class imbalance problems. In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist for the presence of 10 common pathologies. A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically. To address the high-class imbalance issue, we propose to modify and apply "Distribution-Balanced loss" for training D-CNNs which reshapes the standard Binary-Cross Entropy loss (BCE) to efficiently learn harder samples by down-weighting the loss assigned to the majority classes. On an independent test set of 777 studies, the proposed approach yields an area under the receiver operating characteristic (AUC) of 0.709 (95% CI, 0.690-0.729). The sensitivity, specificity, and F1-score at the cutoff value are 0.722 (0.694-0.750), 0.579 (0.563-0.595), and 0.389 (0.373-0.405), respectively. These results significantly outperform previous state-of-the-art methods on most of the target diseases. Moreover, our ablation studies validate the effectiveness of the proposed loss function compared to other standard losses, e.g., BCE and Focal Loss, for this learning task. Overall, we demonstrate the potential of D-CNNs in interpreting pediatric CXRs. △ Less

Submitted 14 August, 2021; originally announced August 2021.

Comments: This is a preprint of our paper which was accepted for publication to ICCV Workshop 2021

arXiv:2108.05574 [pdf, other]

Implicit Sparse Regularization: The Impact of Depth and Early Stop**

Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

Abstract: In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stop** is crucial for gradient descent to converge to a sparse model, a phenomenon… ▽ More In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stop** is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stop** and show that for a general depth parameter N, gradient descent with early stop** achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stop** window so that this implicit sparse regularization effect is more likely to take place. △ Less

Submitted 26 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: 32 pages, accepted by NeurIPS 2021. arXiv admin note: text overlap with arXiv:1909.05122 by other authors

arXiv:2107.08704 [pdf, other]

Managing Interference and Leveraging Secondary Reflections Amongst Multiple IRSs

Authors: Tu V. Nguyen, Diep N. Nguyen

Abstract: Intelligent reflecting surface (IRS) has recently been emerging as an enabler for smart radio environment in which passive antenna arrays can be used to actively tailor/control the radio propagation. With multiple IRSs being launched to support various group of users, it is critical to jointly optimize the phase-shifts of all IRSs to mitigate the interference as well as to leverage the secondary r… ▽ More Intelligent reflecting surface (IRS) has recently been emerging as an enabler for smart radio environment in which passive antenna arrays can be used to actively tailor/control the radio propagation. With multiple IRSs being launched to support various group of users, it is critical to jointly optimize the phase-shifts of all IRSs to mitigate the interference as well as to leverage the secondary reflections amongst IRSs. This work considers the uplink of multiple users that are grouped and supported by multiple IRSs to a multi-antenna base station. Each IRS with multiple controllable phase-shift elements is intended to serve a group of near-by users. We first formulate the minimum achievable rate maximization problem by jointly optimizing phase-shifts of elements from all IRSs and the received beamformers at the MIMO base station. The problem turns out to be non-convex. We then derive its solution using the alternating optimization mechanism. Our simulations show that by properly managing interference and leveraging the secondary reflections amongst IRSs, there is a great benefit of deploying more IRSs to support different groups of users to achieve a higher rate per user. In contrast, without properly managing the secondary reflections, increasing the number of IRSs can adversely impact the network throughput, especially for higher transmit power. △ Less

Submitted 7 December, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: 16 pages (submitted to IEEE Conference/Journal)

arXiv:2106.03330 [pdf, other]

doi 10.1007/s00138-022-01278-x

Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation

Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Triet Tran

Abstract: In this paper, we propose Contextual Guided Segmentation (CGS) framework for video instance segmentation in three passes. In the first pass, i.e., preview segmentation, we propose Instance Re-Identification Flow to estimate main properties of each instance (i.e., human/non-human, rigid/deformable, known/unknown category) by propagating its preview mask to other frames. In the second pass, i.e., co… ▽ More In this paper, we propose Contextual Guided Segmentation (CGS) framework for video instance segmentation in three passes. In the first pass, i.e., preview segmentation, we propose Instance Re-Identification Flow to estimate main properties of each instance (i.e., human/non-human, rigid/deformable, known/unknown category) by propagating its preview mask to other frames. In the second pass, i.e., contextual segmentation, we introduce multiple contextual segmentation schemes. For human instance, we develop skeleton-guided segmentation in a frame along with object flow to correct and refine the result across frames. For non-human instance, if the instance has a wide variation in appearance and belongs to known categories (which can be inferred from the initial mask), we adopt instance segmentation. If the non-human instance is nearly rigid, we train FCNs on synthesized images from the first frame of a video sequence. In the final pass, i.e., guided segmentation, we develop a novel fined-grained segmentation method on non-rectangular regions of interest (ROIs). The natural-shaped ROI is generated by applying guided attention from the neighbor frames of the current one to reduce the ambiguity in the segmentation of different overlap** instances. Forward mask propagation is followed by backward mask propagation to further restore missing instance fragments due to re-appeared instances, fast motion, occlusion, or heavy deformation. Finally, instances in each frame are merged based on their depth values, together with human and non-human object interaction and rare instance priority. Experiments conducted on the DAVIS Test-Challenge dataset demonstrate the effectiveness of our proposed framework. We achieved the 3rd consistently in the DAVIS Challenges 2017-2019 with 75.4%, 72.4%, and 78.4% in terms of global score, region similarity, and contour accuracy, respectively. △ Less

Submitted 11 April, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Project page: https://sites.google.com/view/ltnghia/research/vos

Journal ref: Machine Vision and Applications 2022

arXiv:2105.09451 [pdf, other]

doi 10.1016/j.cviu.2019.04.006

Anabranch Network for Camouflaged Object Segmentation

Authors: Trung-Nghia Le, Tam V. Nguyen, Zhongliang Nie, Minh-Triet Tran, Akihiro Sugimoto

Abstract: Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications includ… ▽ More Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications including the preservation of wild animals and the discovery of new species, surveillance systems, search-and-rescue missions in the event of natural disasters such as earthquakes, floods or hurricanes. This paper addresses a new challenging problem of camouflaged object segmentation. To address this problem, we provide a new image dataset of camouflaged objects for benchmarking purposes. In addition, we propose a general end-to-end network, called the Anabranch Network, that leverages both classification and segmentation tasks. Different from existing networks for segmentation, our proposed network possesses the second branch for classification to predict the probability of containing camouflaged object(s) in an image, which is then fused into the main branch for segmentation to boost up the segmentation accuracy. Extensive experiments conducted on the newly built dataset demonstrate the effectiveness of our network using various fully convolutional networks. \url{https://sites.google.com/view/ltnghia/research/camo} △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: Published in CVIU 2019. Project page: https://sites.google.com/view/ltnghia/research/camo

Journal ref: Computer Vision and Image Understanding 184 (2019) 45-56

arXiv:2105.02290 [pdf]

R2U3D: Recurrent Residual 3D U-Net for Lung Segmentation

Authors: Dhaval D. Kadia, Md Zahangir Alom, Ranga Burada, Tam V. Nguyen, Vijayan K. Asari

Abstract: 3D lung segmentation is essential since it processes the volumetric information of the lungs, removes the unnecessary areas of the scan, and segments the actual area of the lungs in a 3D volume. Recently, the deep learning model, such as U-Net outperforms other network architectures for biomedical image segmentation. In this paper, we propose a novel model, namely, Recurrent Residual 3D U-Net (R2U… ▽ More 3D lung segmentation is essential since it processes the volumetric information of the lungs, removes the unnecessary areas of the scan, and segments the actual area of the lungs in a 3D volume. Recently, the deep learning model, such as U-Net outperforms other network architectures for biomedical image segmentation. In this paper, we propose a novel model, namely, Recurrent Residual 3D U-Net (R2U3D), for the 3D lung segmentation task. In particular, the proposed model integrates 3D convolution into the Recurrent Residual Neural Network based on U-Net. It helps learn spatial dependencies in 3D and increases the propagation of 3D volumetric information. The proposed R2U3D network is trained on the publicly available dataset LUNA16 and it achieves state-of-the-art performance on both LUNA16 (testing set) and VESSEL12 dataset. In addition, we show that training the R2U3D model with a smaller number of CT scans, i.e., 100 scans, without applying data augmentation achieves an outstanding result in terms of Soft Dice Similarity Coefficient (Soft-DSC) of 0.9920. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: The paper is under review in a journal

arXiv:2104.02256 [pdf, other]

A clinical validation of VinDr-CXR, an AI system for detecting abnormal chest radiographs

Authors: Ngoc Huy Nguyen, Ha Quy Nguyen, Nghia Trung Nguyen, Thang Viet Nguyen, Hieu Huy Pham, Tuan Ngoc-Minh Nguyen

Abstract: Computer-Aided Diagnosis (CAD) systems for chest radiographs using artificial intelligence (AI) have recently shown a great potential as a second opinion for radiologists. The performances of such systems, however, were mostly evaluated on a fixed dataset in a retrospective manner and, thus, far from the real performances in clinical practice. In this work, we demonstrate a mechanism for validatin… ▽ More Computer-Aided Diagnosis (CAD) systems for chest radiographs using artificial intelligence (AI) have recently shown a great potential as a second opinion for radiologists. The performances of such systems, however, were mostly evaluated on a fixed dataset in a retrospective manner and, thus, far from the real performances in clinical practice. In this work, we demonstrate a mechanism for validating an AI-based system for detecting abnormalities on X-ray scans, VinDr-CXR, at the Phu Tho General Hospital - a provincial hospital in the North of Vietnam. The AI system was directly integrated into the Picture Archiving and Communication System (PACS) of the hospital after being trained on a fixed annotated dataset from other sources. The performance of the system was prospectively measured by matching and comparing the AI results with the radiology reports of 6,285 chest X-ray examinations extracted from the Hospital Information System (HIS) over the last two months of 2020. The normal/abnormal status of a radiology report was determined by a set of rules and served as the ground truth. Our system achieves an F1 score - the harmonic average of the recall and the precision - of 0.653 (95% CI 0.635, 0.671) for detecting any abnormalities on chest X-rays. Despite a significant drop from the in-lab performance, this result establishes a high level of confidence in applying such a system in real-life situations. △ Less

Submitted 6 April, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

Comments: This is a preprint which has been submitted and under review by PLOS One journal

arXiv:2103.17123 [pdf, other]

doi 10.1109/TIP.2021.3130490

Camouflaged Instance Segmentation In-The-Wild: Dataset, Method, and Benchmark Suite

Authors: Trung-Nghia Le, Yubo Cao, Tan-Cong Nguyen, Minh-Quan Le, Khanh-Duy Nguyen, Thanh-Toan Do, Minh-Triet Tran, Tam V. Nguyen

Abstract: This paper pushes the envelope on decomposing camouflaged regions in an image into meaningful components, namely, camouflaged instances. To promote the new task of camouflaged instance segmentation of in-the-wild images, we introduce a dataset, dubbed CAMO++, that extends our preliminary CAMO dataset (camouflaged object segmentation) in terms of quantity and diversity. The new dataset substantiall… ▽ More This paper pushes the envelope on decomposing camouflaged regions in an image into meaningful components, namely, camouflaged instances. To promote the new task of camouflaged instance segmentation of in-the-wild images, we introduce a dataset, dubbed CAMO++, that extends our preliminary CAMO dataset (camouflaged object segmentation) in terms of quantity and diversity. The new dataset substantially increases the number of images with hierarchical pixel-wise ground truths. We also provide a benchmark suite for the task of camouflaged instance segmentation. In particular, we present an extensive evaluation of state-of-the-art instance segmentation methods on our newly constructed CAMO++ dataset in various scenarios. We also present a camouflage fusion learning (CFL) framework for camouflaged instance segmentation to further improve the performance of state-of-the-art methods. The dataset, model, evaluation suite, and benchmark will be made publicly available on our project page: https://sites.google.com/view/ltnghia/research/camo_plus_plus △ Less

Submitted 11 December, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

Comments: TIP acceptance. Project page: https://sites.google.com/view/ltnghia/research/camo_plus_plus

Journal ref: IEEE Transactions on Image Processing 2021

arXiv:2102.12643 [pdf, other]

Provable Compressed Sensing with Generative Priors via Langevin Dynamics

Authors: Thanh V. Nguyen, Gauri Jagatap, Chinmay Hegde

Abstract: Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved… ▽ More Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved good empirical performance, its theoretical behavior is not well understood. In this paper, we introduce the use of stochastic gradient Langevin dynamics (SGLD) for compressed sensing with a generative prior. Under mild assumptions on the generative model, we prove the convergence of SGLD to the true signal. We also demonstrate competitive empirical performance to standard gradient descent. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2012.13762 [pdf, other]

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

Authors: Tuan Hoang, Thanh-Toan Do, Tam V. Nguyen, Ngai-Man Cheung

Abstract: This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it… ▽ More This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it does not update the quantized weights. To address this issue, we propose a novel method that enables {direct} updating of quantized weights {with learnable quantization levels} to minimize the cost function using gradient descent. Second, to obtain low bit-width activations, existing works consider all channels equally. However, the activation quantizers could be biased toward a few channels with high-variance. To address this issue, we propose a method to take into account the quantization errors of individual channels. With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the image classification task, using AlexNet, ResNet and MobileNetV2 architectures on CIFAR-100 and ImageNet datasets. △ Less

Submitted 26 December, 2020; originally announced December 2020.

arXiv:2012.03422 [pdf, ps, other]

A General Conditional BER Expression of Rectangular QAM in the Presence of Phase Noise

Authors: Thanh V. Pham, Thang V. Nguyen, Anh T. Pham

Abstract: In this paper, we newly present a closed-form bit-error rate (BER) expression for an $M$-ary pulse-amplitude modulation ($M$-PAM) over additive white Gaussian noise (AWGN) channels by analytically characterizing the bit decision regions and positions. The obtained expression is then used to derive the conditional BER of a rectangular quadrature amplitude modulation (QAM) for a given value of phase… ▽ More In this paper, we newly present a closed-form bit-error rate (BER) expression for an $M$-ary pulse-amplitude modulation ($M$-PAM) over additive white Gaussian noise (AWGN) channels by analytically characterizing the bit decision regions and positions. The obtained expression is then used to derive the conditional BER of a rectangular quadrature amplitude modulation (QAM) for a given value of phase noise. Numerical results show that the impact of phase noise on the conditional BER performance is proportional to the constellation size. Moreover, it is observed that given a constellation size, the square QAM achieves the lowest phase noise-induced performance loss compared to other rectangular constellations. △ Less

Submitted 4 January, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

arXiv:2009.12111 [pdf, other]

Enhancing MRI Brain Tumor Segmentation with an Additional Classification Network

Authors: Hieu T. Nguyen, Tung T. Le, Thang V. Nguyen, Nhan T. Nguyen

Abstract: Brain tumor segmentation plays an essential role in medical image analysis. In recent studies, deep convolution neural networks (DCNNs) are extremely powerful to tackle tumor segmentation tasks. We propose in this paper a novel training method that enhances the segmentation results by adding an additional classification branch to the network. The whole network was trained end-to-end on the Multimo… ▽ More Brain tumor segmentation plays an essential role in medical image analysis. In recent studies, deep convolution neural networks (DCNNs) are extremely powerful to tackle tumor segmentation tasks. We propose in this paper a novel training method that enhances the segmentation results by adding an additional classification branch to the network. The whole network was trained end-to-end on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2020 training dataset. On the BraTS's validation set, it achieved an average Dice score of 78.43%, 89.99%, and 84.22% respectively for the enhancing tumor, the whole tumor, and the tumor core. △ Less

Submitted 28 October, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

arXiv:2008.12649 [pdf, other]

doi 10.1038/s41524-020-00431-2

Active learning of deep surrogates for PDEs: Application to metasurface design

Authors: Raphaël Pestourie, Youssef Mroueh, Thanh V. Nguyen, Payel Das, Steven G. Johnson

Abstract: Surrogate models for partial-differential equations are widely used in the design of meta-materials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than… ▽ More Surrogate models for partial-differential equations are widely used in the design of meta-materials to rapidly evaluate the behavior of composable components. However, the training cost of accurate surrogates by machine learning can rapidly increase with the number of variables. For photonic-device models, we find that this training becomes especially challenging as design regions grow larger than the optical wavelength. We present an active learning algorithm that reduces the number of training points by more than an order of magnitude for a neural-network surrogate model of optical-surface components compared to random samples. Results show that the surrogate evaluation is over two orders of magnitude faster than a direct solve, and we demonstrate how this can be exploited to accelerate large-scale engineering optimization. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: submitted to npj

Journal ref: npj Computational Materials (2020)6:164

arXiv:2008.00223 [pdf, other]

doi 10.1109/TIP.2020.3014727

Unsupervised Deep Cross-modality Spectral Hashing

Authors: Tuan Hoang, Thanh-Toan Do, Tam V. Nguyen, Ngai-Man Cheung

Abstract: This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based al… ▽ More This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn map** functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods. △ Less

Submitted 18 August, 2020; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: Accepted to IEEE Transaction on Image Processing (TIP) Add Acknowledgement

arXiv:2007.12881 [pdf, other]

doi 10.1109/ACCESS.2021.3064443

MirrorNet: Bio-Inspired Camouflaged Object Segmentation

Authors: **nan Yan, Trung-Nghia Le, Khanh-Duy Nguyen, Minh-Triet Tran, Thanh-Toan Do, Tam V. Nguyen

Abstract: Camouflaged objects are generally difficult to be detected in their natural environment even for human beings. In this paper, we propose a novel bio-inspired network, named the MirrorNet, that leverages both instance segmentation and mirror stream for the camouflaged object segmentation. Differently from existing networks for segmentation, our proposed network possesses two segmentation streams: t… ▽ More Camouflaged objects are generally difficult to be detected in their natural environment even for human beings. In this paper, we propose a novel bio-inspired network, named the MirrorNet, that leverages both instance segmentation and mirror stream for the camouflaged object segmentation. Differently from existing networks for segmentation, our proposed network possesses two segmentation streams: the main stream and the mirror stream corresponding with the original image and its flipped image, respectively. The output from the mirror stream is then fused into the main stream's result for the final camouflage map to boost up the segmentation accuracy. Extensive experiments conducted on the public CAMO dataset demonstrate the effectiveness of our proposed network. Our proposed method achieves 89% in accuracy, outperforming the state-of-the-arts. Project Page: https://sites.google.com/view/ltnghia/research/camo △ Less

Submitted 10 March, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: Accepted to IEEE Access

arXiv:2005.03624 [pdf, other]

Learning Robust Models for e-Commerce Product Search

Authors: Thanh V. Nguyen, Nikhil Rao, Karthik Subbian

Abstract: Showing items that do not match search query intent degrades customer experience in e-commerce. These mismatches result from counterfactual biases of the ranking algorithms toward noisy behavioral signals such as clicks and purchases in the search logs. Mitigating the problem requires a large labeled dataset, which is expensive and time-consuming to obtain. In this paper, we develop a deep, end-to… ▽ More Showing items that do not match search query intent degrades customer experience in e-commerce. These mismatches result from counterfactual biases of the ranking algorithms toward noisy behavioral signals such as clicks and purchases in the search logs. Mitigating the problem requires a large labeled dataset, which is expensive and time-consuming to obtain. In this paper, we develop a deep, end-to-end model that learns to effectively classify mismatches and to generate hard mismatched examples to improve the classifier. We train the model end-to-end by introducing a latent variable into the cross-entropy loss that alternates between using the real and generated samples. This not only makes the classifier more robust but also boosts the overall ranking performance. Our model achieves a relative gain compared to baselines by over 26% in F-score, and over 17% in Area Under PR curve. On live search traffic, our model gains significant improvement in multiple countries. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: This work has been accepted for publication at ACL2020

arXiv:1912.11274 [pdf, other]

doi 10.1063/5.0010274

Enhanced thermoelectricity at the ultra-thin film limit

Authors: Thao T. T. Nguyen, Linh T. Dang, Giang H. Bach, Tung H. Dang, Kien T. Nguyen, Hong T. Pham, Thuat T. Nguyen, Tuyen V. Nguyen, Toan T. Nguyen, Hung Q. Nguyen

Abstract: At the ultra-thin film limit, quantum confinement strongly improves thermoelectric figure of merit in materials such as Sb$_2$Te$_3$ and Bi$_2$Te$_3$. These high quality films have only been realized using well controlled techniques such as molecular beam epitaxy. We report a two fold increase in the Seebeck coefficient for both p-type Sb$_2$Te$_3$ and n-type Bi$_2$Te$_3$ using thermal co-evaporat… ▽ More At the ultra-thin film limit, quantum confinement strongly improves thermoelectric figure of merit in materials such as Sb$_2$Te$_3$ and Bi$_2$Te$_3$. These high quality films have only been realized using well controlled techniques such as molecular beam epitaxy. We report a two fold increase in the Seebeck coefficient for both p-type Sb$_2$Te$_3$ and n-type Bi$_2$Te$_3$ using thermal co-evaporation, an affordable approach. At the thick film limit greater than 100 nm, their Seebeck coefficients are around 100 $μV/K$, similar to results obtained in other work. When the films are thinner than 50 nm, the Seebeck coefficient increases to about 500 $μV/K$. With a total Seebeck coefficient $\sim$ 1 mV/K and an estimate ZT $\sim$ 2, this pair of materials is the first step to a practical micro-cooler at room temperature. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 4 pages, 4 figures

Journal ref: Appl. Phys. Lett., 117(8), 083104, 2020

arXiv:1911.11983 [pdf, ps, other]

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-param… ▽ More A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies. △ Less

Submitted 2 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: Added Sections 3.2 and 3.4 on inductive biases. Fixed an error in deriving the neural tangent kernel in Section 3.3

arXiv:1909.07227 [pdf, other]

A Convolutional Transformation Network for Malware Classification

Authors: Duc-Ly Vu, Trong-Kha Nguyen, Tam V. Nguyen, Tu N. Nguyen, Fabio Massacci, Phu H. Phung

Abstract: Modern malware evolves various detection avoidance techniques to bypass the state-of-the-art detection methods. An emerging trend to deal with this issue is the combination of image transformation and machine learning techniques to classify and detect malware. However, existing works in this field only perform simple image transformation methods that limit the accuracy of the detection. In this pa… ▽ More Modern malware evolves various detection avoidance techniques to bypass the state-of-the-art detection methods. An emerging trend to deal with this issue is the combination of image transformation and machine learning techniques to classify and detect malware. However, existing works in this field only perform simple image transformation methods that limit the accuracy of the detection. In this paper, we introduce a novel approach to classify malware by using a deep network on images transformed from binary samples. In particular, we first develop a novel hybrid image transformation method to convert binaries into color images that convey the binary semantics. The images are trained by a deep convolutional neural network that later classifies the test inputs into benign or malicious categories. Through the extensive experiments, our proposed method surpasses all baselines and achieves 99.14% in terms of accuracy on the testing set. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: 6 pages, 4 figures

arXiv:1906.03567 [pdf, ps, other]

Optimal Energy Efficiency with Delay Constraints for Multi-layer Cooperative Fog Computing Networks

Authors: Thai T. Vu, Diep N. Nguyen, Dinh Thai Hoang, Eryk Dutkiewicz, Thuy V. Nguyen

Abstract: We develop a joint offloading and resource allocation framework for a multi-layer cooperative fog computing network, aiming to minimize the total energy consumption of multiple mobile devices subject to their service delay requirements. The resulting optimization involves both binary (offloading decisions) and real variables (resource allocations), making it an NP-hard and computationally intracta… ▽ More We develop a joint offloading and resource allocation framework for a multi-layer cooperative fog computing network, aiming to minimize the total energy consumption of multiple mobile devices subject to their service delay requirements. The resulting optimization involves both binary (offloading decisions) and real variables (resource allocations), making it an NP-hard and computationally intractable problem. To tackle it, we first propose an improved branch-and-bound algorithm (IBBA) that is implemented in a centralized manner. However, due to the large size of the cooperative fog computing network, the computational complexity of the proposed IBBA is relatively high. To speed up the optimal solution searching as well as to enable its distributed implementation, we then leverage the unique structure of the underlying problem and the parallel processing at fog nodes. To that end, we propose a distributed framework, namely feasibility finding Benders decomposition (FFBD), that decomposes the original problem into a master problem for the offloading decision and subproblems for resource allocation. The master problem (MP) is then equipped with powerful cutting-planes to exploit the fact of resource limitation at fog nodes. The subproblems (SP) for resource allocation can find their closed-form solutions using our fast solution detection method. These (simpler) subproblems can then be solved in parallel at fog nodes. The numerical results show that the FFBD always returns the optimal solution of the problem with significantly less computation time (e.g., compared with the centralized IBBA approach). The FFBD with the fast solution detection method, namely FFBD-F, can reduce up to $60\%$ and $90\%$ of computation time, respectively, compared with those of the conventional FFBD, namely FFBD-S, and IBBA. △ Less

Submitted 23 August, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: Final revision submitted to IEEE Transactions on Communications

arXiv:1904.11820 [pdf, other]

doi 10.1109/TIP.2019.2913509

Simultaneous Feature Aggregating and Hashing for Compact Binary Code Learning

Authors: Thanh-Toan Do, Khoa Le, Tuan Hoang, Huu Le, Tam V. Nguyen, Ngai-Man Cheung

Abstract: Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems, for each image, local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works, the aggregating and t… ▽ More Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems, for each image, local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works, the aggregating and the hashing processes are designed independently. Hence these frameworks may generate suboptimal hash codes. In this paper, we first propose a novel unsupervised hashing framework in which feature aggregating and hashing are designed simultaneously and optimized jointly. Specifically, our joint optimization generates aggregated representations that can be better reconstructed by some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition, the proposed method is flexible. It can be extended for supervised hashing. When the data label is available, the framework can be adapted to learn binary codes which minimize the reconstruction loss w.r.t. label vectors. Furthermore, we also propose a fast version of the state-of-the-art hashing method Binary Autoencoder to be used in our proposed frameworks. Extensive experiments on benchmark datasets under various settings show that the proposed methods outperform state-of-the-art unsupervised and supervised hashing methods. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Accepted to IEEE Trans. on Image Processing (TIP), 2019. arXiv admin note: substantial text overlap with arXiv:1704.00860

arXiv:1903.07175 [pdf, ps, other]

Construction of 2-solitons with logarithmic distance for the one-dimensional cubic Schrodinger system

Authors: Yvan Martel, Tien Vinh Nguyen

Abstract: We consider a system of coupled cubic Schrödinger equations in one space dimension \begin{equation*} \begin{cases} i \partial_t u + \partial_x^2 u +(|u|^2 + ω|v|^2) u =0\\ i \partial_t v + \partial_x^2 v+ (|v|^2 + ω|u|^2) v=0 \end{cases}\quad (t,x)\in {\bf R}\times{\bf R}, \end{equation*} in the non-integrable case $0 < ω< 1$. First, we justify the existence of a symmetric 2-solitary wave with l… ▽ More We consider a system of coupled cubic Schrödinger equations in one space dimension \begin{equation*} \begin{cases} i \partial_t u + \partial_x^2 u +(|u|^2 + ω|v|^2) u =0\\ i \partial_t v + \partial_x^2 v+ (|v|^2 + ω|u|^2) v=0 \end{cases}\quad (t,x)\in {\bf R}\times{\bf R}, \end{equation*} in the non-integrable case $0 < ω< 1$. First, we justify the existence of a symmetric 2-solitary wave with logarithmic distance, more precisely a solution of the system satisfying \[ \lim_{t\to +\infty}\left\| \begin{pmatrix} u(t) \\ v(t)\end{pmatrix} - \begin{pmatrix} e^{it}Q (\cdot - \frac{1}{2} \log (Ωt) - \frac{1}{4} \log \log t) \\ e^{it}Q (\cdot + \frac{1}{2} \log (Ωt) + \frac{1}{4} \log \log t)\end{pmatrix}\right\|_{H^1\times H^1} = 0\] where $Q = \sqrt{2}{\rm sech}$ is the explicit solution of $ Q'' - Q + Q^3 = 0$ and $Ω>0$ is a constant. This result extends to the non-integrable case the existence of symmetric 2-solitons with logarithmic distance known in the integrable case $ω=0$ and $ω=1$. Such strongly interacting symmetric $2$-solitary waves were also previously constructed for the non-integrable scalar nonlinear Schrödinger equation in any space dimension and for any energy-subcritical power nonlinearity. Second, under the conditions $0<c<1$ and $0<ω< \frac 12 c(c+1)$, we construct solutions of the system satisfying \[ \lim_{t\to +\infty}\left\| \begin{pmatrix}u(t) \\ v(t)\end{pmatrix} - \begin{pmatrix}e^{i c^2 t}Q_c (\cdot - \frac{1}{(c+1)c} \log (Ω_c t) ) \\ e^{i t} Q (\cdot + \frac{1}{c+1} \log (Ω_c t))\end{pmatrix} \right\|_{H^1\times H^1}=0\] where $Q_c(x)=cQ(cx)$ and $Ω_c>0$ is a constant. Such logarithmic regime with non-symmetric solitons does not exist in the integrable cases $ω=0$ and $ω=1$ and is still unknown in the non-integrable scalar case. △ Less

Submitted 17 March, 2019; originally announced March 2019.

arXiv:1806.00572 [pdf, ps, other]

Autoencoders Learn Generative Linear Models

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of the… ▽ More We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of these models, we prove that under suitable choices of hyperparameters, architectures, and initialization, autoencoders learned by gradient descent can successfully recover the parameters of the corresponding model. To our knowledge, this is the first result that rigorously studies the dynamics of gradient descent for weight-sharing autoencoders. Our analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks. △ Less

Submitted 15 February, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Experimental study on synthesis data added. Typos fixed

arXiv:1804.09217 [pdf, ps, other]

On Learning Sparsely Used Dictionaries from Incomplete Samples

Authors: Thanh V. Nguyen, Akshay Soni, Chinmay Hegde

Abstract: Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the… ▽ More Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the dictionary learning literature. In this paper, we provide provable approaches for learning - from incomplete samples - a family of dictionaries whose atoms have sufficiently "spread-out" mass. First, we propose a descent-style iterative algorithm that linearly converges to the true dictionary when provided a sufficiently coarse initial estimate. Second, we propose an initialization algorithm that utilizes a small number of extra fully observed samples to produce such a coarse initial estimate. Finally, we theoretically analyze their performance and provide asymptotic statistical and computational guarantees. △ Less

Submitted 24 April, 2018; originally announced April 2018.

arXiv:1802.02899 [pdf, other]

From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval

Authors: Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, Huu Le, Tam V. Nguyen, Ngai-Man Cheung

Abstract: In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural Network (CNN) is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally, in order… ▽ More In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural Network (CNN) is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally, in order to further improve the discriminative power of the descriptors, recent works adopt fine-tuned strategies. In this paper, taking a different approach, we propose a novel, computationally efficient, and competitive framework. Specifically, we firstly propose various strategies to compute masks, namely SIFT-mask, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and eliminate redundant features. Our in-depth analyses demonstrate that proposed masking schemes are effective to address the burstiness drawback and improve retrieval accuracy. Secondly, we propose to employ recent embedding and aggregating methods which can significantly boost the feature discriminability. Regarding the computation and storage efficiency, we include a hashing module to produce very compact binary image representations. Extensive experiments on six image retrieval benchmarks demonstrate that our proposed framework achieves the state-of-the-art retrieval performances. △ Less

Submitted 5 March, 2019; v1 submitted 7 February, 2018; originally announced February 2018.

Comments: Accepted to Transactions on Multimedia Computing Communications and Applications (TOMM)

arXiv:1711.06221 [pdf, other]

A Forward-Backward Approach for Visualizing Information Flow in Deep Networks

Authors: Aditya Balu, Thanh V. Nguyen, Apurva Kokate, Chinmay Hegde, Soumik Sarkar

Abstract: We introduce a new, systematic framework for visualizing information flow in deep networks. Specifically, given any trained deep convolutional network model and a given test image, our method produces a compact support in the image domain that corresponds to a (high-resolution) feature that contributes to the given explanation. Our method is both computationally efficient as well as numerically ro… ▽ More We introduce a new, systematic framework for visualizing information flow in deep networks. Specifically, given any trained deep convolutional network model and a given test image, our method produces a compact support in the image domain that corresponds to a (high-resolution) feature that contributes to the given explanation. Our method is both computationally efficient as well as numerically robust. We present several preliminary numerical results that support the benefits of our framework over existing methods. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

arXiv:1711.03638 [pdf, ps, other]

Provably Accurate Double-Sparse Coding

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, and other machine learning applications. The central goal is to learn an overcomplete dictionary that can sparsely represent a given input dataset. However, a key challenge is that storage, transmission, and processing of the learned dictionary can be untenably high if the data dimension is high. In t… ▽ More Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, and other machine learning applications. The central goal is to learn an overcomplete dictionary that can sparsely represent a given input dataset. However, a key challenge is that storage, transmission, and processing of the learned dictionary can be untenably high if the data dimension is high. In this paper, we consider the double-sparsity model introduced by Rubinstein et al. (2010b) where the dictionary itself is the product of a fixed, known basis and a data-adaptive sparse component. First, we introduce a simple algorithm for double-sparse coding that can be amenable to efficient implementation via neural architectures. Second, we theoretically analyze its performance and demonstrate asymptotic sample complexity and running time benefits over existing (provable) approaches for sparse coding. To our knowledge, our work introduces the first computationally efficient algorithm for double-sparse coding that enjoys rigorous statistical guarantees. Finally, we support our analysis via several numerical experiments on simulated data, confirming that our method can indeed be useful in problem sizes encountered in practical applications. △ Less

Submitted 12 December, 2017; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: 40 pages. An abbreviated conference version appears at AAAI 2018

arXiv:1709.07566 [pdf, other]

Smart Mirror: Intelligent Makeup Recommendation and Synthesis

Authors: Tam V. Nguyen, Luoqi Liu

Abstract: The female facial image beautification usually requires professional editing softwares, which are relatively difficult for common users. In this demo, we introduce a practical system for automatic and personalized facial makeup recommendation and synthesis. First, a model describing the relations among facial features, facial attributes and makeup attributes is learned as the makeup recommendation… ▽ More The female facial image beautification usually requires professional editing softwares, which are relatively difficult for common users. In this demo, we introduce a practical system for automatic and personalized facial makeup recommendation and synthesis. First, a model describing the relations among facial features, facial attributes and makeup attributes is learned as the makeup recommendation model for suggesting the most suitable makeup attributes. Then the recommended makeup attributes are seamlessly synthesized onto the input facial image. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: accepted to ACM MM 2017

arXiv:1709.07565 [pdf, other]

Novel Evaluation Metrics for Seam Carving based Image Retargeting

Authors: Tam V. Nguyen, Guangyu Gao

Abstract: Image retargeting effectively resizes images by preserving the recognizability of important image regions. Most of retargeting methods rely on good importance maps as a cue to retain or remove certain regions in the input image. In addition, the traditional evaluation exhaustively depends on user ratings. There is a legitimate need for a methodological approach for evaluating retargeted results. T… ▽ More Image retargeting effectively resizes images by preserving the recognizability of important image regions. Most of retargeting methods rely on good importance maps as a cue to retain or remove certain regions in the input image. In addition, the traditional evaluation exhaustively depends on user ratings. There is a legitimate need for a methodological approach for evaluating retargeted results. Therefore, in this paper, we conduct a study and analysis on the prominent method in image retargeting, Seam Carving. First, we introduce two novel evaluation metrics which can be considered as the proxy of user ratings. Second, we exploit salient object dataset as a benchmark for this task. We then investigate different types of importance maps for this particular problem. The experiments show that humans in general agree with the evaluation metrics on the retargeted results and some importance map methods are consistently more favorable than others. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: 5 pages

arXiv:1705.08207 [pdf, other]

Salient Object Detection with Semantic Priors

Authors: Tam V. Nguyen, Luoqi Liu

Abstract: Salient object detection has increasingly become a popular topic in cognitive and computational sciences, including computer vision and artificial intelligence research. In this paper, we propose integrating \textit{semantic priors} into the salient object detection process. Our algorithm consists of three basic steps. Firstly, the explicit saliency map is obtained based on the semantic segmentati… ▽ More Salient object detection has increasingly become a popular topic in cognitive and computational sciences, including computer vision and artificial intelligence research. In this paper, we propose integrating \textit{semantic priors} into the salient object detection process. Our algorithm consists of three basic steps. Firstly, the explicit saliency map is obtained based on the semantic segmentation refined by the explicit saliency priors learned from the data. Next, the implicit saliency map is computed based on a trained model which maps the implicit saliency priors embedded into regional features with the saliency values. Finally, the explicit semantic map and the implicit map are adaptively fused to form a pixel-accurate saliency map which uniformly covers the objects of interest. We further evaluate the proposed framework on two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental results demonstrate that our method outperforms other state-of-the-art methods. △ Less

Submitted 23 May, 2017; originally announced May 2017.

Comments: accepted to IJCAI 2017

Showing 1–50 of 60 results for author: Nguyen, T V