-
Building a temperature forecasting model for the city with the regression neural network (RNN)
Authors:
Nguyen Phuc Tran,
Duy Thanh Tran,
Thi Thuy Nga Duong
Abstract:
In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building…
▽ More
In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches
Authors:
Gao Yu Lee,
**kuan Chen,
Tanmoy Dam,
Md Meftahul Ferdaus,
Daniel Puiu Poenar,
Vu N Duong
Abstract:
High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously,…
▽ More
High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Competitive Facility Location under Random Utilities and Routing Constraints
Authors:
Hoang Giang Pham,
Tien Thanh Dam,
Ngan Ha Duong,
Tien Mai,
Minh Hoang Ha
Abstract:
In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantee…
▽ More
In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantees the existence of a tour visiting all chosen locations while adhering to a specified tour length upper bound. Such routing constraints find crucial applications in various real-world scenarios. The problem at hand features a non-linear objective function, resulting from the utilization of random utilities, together with complex routing constraints, making it computationally challenging. To tackle this problem, we explore three types of valid cuts, namely, outer-approximation and submodular cuts to handle the nonlinear objective function, as well as sub-tour elimination cuts to address the complex routing constraints. These lead to the development of two exact solution methods: a nested cutting plane and nested branch-and-cut algorithms, where these valid cuts are iteratively added to a master problem through two nested loops. We also prove that our nested cutting plane method always converges to optimality after a finite number of iterations. Furthermore, we develop a local search-based metaheuristic tailored for solving large-scale instances and show its pros and cons compared to exact methods. Extensive experiments are conducted on problem instances of varying sizes, demonstrating that our approach excels in terms of solution quality and computation time when compared to other baseline approaches.
△ Less
Submitted 9 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Ant Colony Optimization for Cooperative Inspection Path Planning Using Multiple Unmanned Aerial Vehicles
Authors:
Duy Nam Bui,
Thuy Ngan Duong,
Manh Duong Phung
Abstract:
This paper presents a new swarm intelligence-based approach to deal with the cooperative path planning problem of unmanned aerial vehicles (UAVs), which is essential for the automatic inspection of infrastructure. The approach uses a 3D model of the structure to generate viewpoints for the UAVs. The calculation of the viewpoints considers the constraints related to the UAV formation model, camera…
▽ More
This paper presents a new swarm intelligence-based approach to deal with the cooperative path planning problem of unmanned aerial vehicles (UAVs), which is essential for the automatic inspection of infrastructure. The approach uses a 3D model of the structure to generate viewpoints for the UAVs. The calculation of the viewpoints considers the constraints related to the UAV formation model, camera parameters, and requirements for data post-processing. The viewpoints are then used as input to formulate the path planning as an extended traveling salesman problem and the definition of a new cost function. Ant colony optimization is finally used to solve the problem to yield optimal inspection paths. Experiments with 3D models of real structures have been conducted to evaluate the performance of the proposed approach. The results show that our system is not only capable of generating feasible inspection paths for UAVs but also reducing the path length by 29.47\% for complex structures when compared with another heuristic approach. The source code of the algorithm can be found at https://github.com/duynamrcv/aco_3d_ipp.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Unlocking the capabilities of explainable fewshot learning in remote sensing
Authors:
Gao Yu Lee,
Tanmoy Dam,
Md Meftahul Ferdaus,
Daniel Puiu Poenar,
Vu N Duong
Abstract:
Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for imagebased remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, fewshot learning has emerged as a valuable approach for enabling learning with li…
▽ More
Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for imagebased remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, fewshot learning has emerged as a valuable approach for enabling learning with limited data. While previous research has evaluated the effectiveness of fewshot learning methods on satellite based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from UAVs, which are increasingly used in remote sensing studies. In this review, we provide an up to date overview of both existing and newly proposed fewshot classification techniques, along with appropriate datasets that are used for both satellite based and UAV based data. Our systematic approach demonstrates that fewshot learning can effectively adapt to the broader and more diverse perspectives that UAVbased platforms can provide. We also evaluate some SOTA fewshot approaches on a UAV disaster scene classification dataset, yielding promising results. We emphasize the importance of integrating XAI techniques like attention maps and prototype analysis to increase the transparency, accountability, and trustworthiness of fewshot models for remote sensing. Key challenges and future research directions are identified, including tailored fewshot methods for UAVs, extending to unseen tasks like segmentation, and develo** optimized XAI techniques suited for fewshot remote sensing problems. This review aims to provide researchers and practitioners with an improved understanding of fewshot learnings capabilities and limitations in remote sensing, while highlighting open problems to guide future progress in efficient, reliable, and interpretable fewshot methods.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Optical detection of bond-dependent and frustrated spin in the two-dimensional cobalt-based honeycomb antiferromagnet Cu3Co2SbO6
Authors:
Baekjune Kang,
Uksam Choi,
Taek Sun Jung,
Seunghyeon Noh,
Gye-Hyeon Kim,
UiHyeon Seo,
Miju Park,
**-Hyun Choi,
Minjae Kim,
GwangCheol Ji,
Sehwan Song,
Hyesung Jo,
Seokjo Hong,
Nguyen Xuan Duong,
Tae Heon Kim,
Yongsoo Yang,
Sungkyun Park,
Jong Mok Ok,
Jung-Woo Yoo,
Jae Hoon Kim,
Changhee Sohn
Abstract:
Two-dimensional honeycomb antiferromagnet becomes an important class of materials as it can provide a route to Kitaev quantum spin liquid, characterized by massive quantum entanglement and fractional excitations. The signatures of its proximity to Kitaev quantum spin liquid in the honeycomb antiferromagnet includes anisotropic bond-dependent magnetic responses and persistent fluctuation by frustra…
▽ More
Two-dimensional honeycomb antiferromagnet becomes an important class of materials as it can provide a route to Kitaev quantum spin liquid, characterized by massive quantum entanglement and fractional excitations. The signatures of its proximity to Kitaev quantum spin liquid in the honeycomb antiferromagnet includes anisotropic bond-dependent magnetic responses and persistent fluctuation by frustration in paramagnetic regime. Here, we propose Cu3Co2SbO6 heterostructures as an intriguing honeycomb antiferromagnet for quantum spin liquid, wherein bond-dependent and frustrated spins interact with optical excitons. This system exhibits antiferromagnetism at 16 K with different spin-flip magnetic fields between a bond-parallel and bond-perpendicular directions, aligning more closely with the generalized Heisenberg-Kitaev than the XXZ model. Optical spectroscopy reveals a strong excitonic transition coupled to the antiferromagnetism, enabling optical detection of its spin states. Particularly, such spin-exciton coupling presents anisotropic responses between bond-parallel and bond-perpendicular magnetic field as well as a finite spin-spin correlation function around 40 K, higher than twice its Néel temperature. The characteristic temperature that remains barely changed even under strong magnetic fields highlights the robustness of the spin-fluctuation region. Our results demonstrate Cu3Co2SbO6 as a unique candidate for the quantum spin liquid phase, where the spin Hamiltonian and quasiparticle excitations can be probed and potentially controlled by light.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
An approach to extract information from academic transcripts of HUST
Authors:
Nguyen Quang Hieu,
Nguyen Le Quy Duong,
Le Quang Hoa,
Nguyen Quang Dat
Abstract:
In many Vietnamese schools, grades are still being inputted into the database manually, which is not only inefficient but also prone to human error. Thus, the automation of this process is highly necessary, which can only be achieved if we can extract information from academic transcripts. In this paper, we test our improved CRNN model in extracting information from 126 transcripts, with 1008 vert…
▽ More
In many Vietnamese schools, grades are still being inputted into the database manually, which is not only inefficient but also prone to human error. Thus, the automation of this process is highly necessary, which can only be achieved if we can extract information from academic transcripts. In this paper, we test our improved CRNN model in extracting information from 126 transcripts, with 1008 vertical lines, 3859 horizontal lines, and 2139 handwritten test scores. Then, this model is compared to the Baseline model. The results show that our model significantly outperforms the Baseline model with an accuracy of 99.6% in recognizing vertical lines, 100% in recognizing horizontal lines, and 96.11% in recognizing handwritten test scores.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
WATT-EffNet: A Lightweight and Accurate Model for Classifying Aerial Disaster Images
Authors:
Gao Yu Lee,
Tanmoy Dam,
Md Meftahul Ferdaus,
Daniel Puiu Poenar,
Vu N. Duong
Abstract:
Incorporating deep learning (DL) classification models into unmanned aerial vehicles (UAVs) can significantly augment search-and-rescue operations and disaster management efforts. In such critical situations, the UAV's ability to promptly comprehend the crisis and optimally utilize its limited power and processing resources to narrow down search areas is crucial. Therefore, develo** an efficient…
▽ More
Incorporating deep learning (DL) classification models into unmanned aerial vehicles (UAVs) can significantly augment search-and-rescue operations and disaster management efforts. In such critical situations, the UAV's ability to promptly comprehend the crisis and optimally utilize its limited power and processing resources to narrow down search areas is crucial. Therefore, develo** an efficient and lightweight method for scene classification is of utmost importance. However, current approaches tend to prioritize accuracy on benchmark datasets at the expense of computational efficiency. To address this shortcoming, we introduce the Wider ATTENTION EfficientNet (WATT-EffNet), a novel method that achieves higher accuracy with a more lightweight architecture compared to the baseline EfficientNet. The WATT-EffNet leverages width-wise incremental feature modules and attention mechanisms over width-wise features to ensure the network structure remains lightweight. We evaluate our method on a UAV-based aerial disaster image classification dataset and demonstrate that it outperforms the baseline by up to 15 times in terms of classification accuracy and 38.3% in terms of computing efficiency as measured by Floating Point Operations per second (FLOPs). Additionally, we conduct an ablation study to investigate the effect of varying the width of WATT-EffNet on accuracy and computational efficiency. Our code is available at \url{https://github.com/TanmDL/WATT-EffNet}.
△ Less
Submitted 1 May, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Authors:
Xuan-Bac Nguyen,
Chi Nhan Duong,
Marios Savvides,
Kaushik Roy,
Hugh Churchill,
Khoa Luu
Abstract:
Promoting fairness for deep clustering models in unsupervised clustering settings to reduce demographic bias is a challenging goal. This is because of the limitation of large-scale balanced data with well-annotated labels for sensitive or protected attributes. In this paper, we first evaluate demographic bias in deep clustering models from the perspective of cluster purity, which is measured by th…
▽ More
Promoting fairness for deep clustering models in unsupervised clustering settings to reduce demographic bias is a challenging goal. This is because of the limitation of large-scale balanced data with well-annotated labels for sensitive or protected attributes. In this paper, we first evaluate demographic bias in deep clustering models from the perspective of cluster purity, which is measured by the ratio of positive samples within a cluster to their correlation degree. This measurement is adopted as an indication of demographic bias. Then, a novel loss function is introduced to encourage a purity consistency for all clusters to maintain the fairness aspect of the learned clustering model. Moreover, we present a novel attention mechanism, Cross-attention, to measure correlations between multiple clusters, strengthening faraway positive samples and improving the purity of clusters during the learning process. Experimental results on a large-scale dataset with numerous attribute settings have demonstrated the effectiveness of the proposed approach on both clustering accuracy and fairness enhancement on several sensitive attributes.
△ Less
Submitted 18 September, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
CoMaL: Conditional Maximum Likelihood Approach to Self-supervised Domain Adaptation in Long-tail Semantic Segmentation
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Pierce Helton,
Ashley Dowling,
Xin Li,
Khoa Luu
Abstract:
The research in self-supervised domain adaptation in semantic segmentation has recently received considerable attention. Although GAN-based methods have become one of the most popular approaches to domain adaptation, they have suffered from some limitations. They are insufficient to model both global and local structures of a given image, especially in small regions of tail classes. Moreover, they…
▽ More
The research in self-supervised domain adaptation in semantic segmentation has recently received considerable attention. Although GAN-based methods have become one of the most popular approaches to domain adaptation, they have suffered from some limitations. They are insufficient to model both global and local structures of a given image, especially in small regions of tail classes. Moreover, they perform bad on the tail classes containing limited number of pixels or less training samples. In order to address these issues, we present a new self-supervised domain adaptation approach to tackle long-tail semantic segmentation in this paper. Firstly, a new metric is introduced to formulate long-tail domain adaptation in the segmentation problem. Secondly, a new Conditional Maximum Likelihood (CoMaL) approach in an autoregressive framework is presented to solve the problem of long-tail domain adaptation. Although other segmentation methods work under the pixel independence assumption, the long-tailed pixel distributions in CoMaL are generally solved in the context of structural dependency, as that is more realistic. Finally, the proposed method is evaluated on popular large-scale semantic segmentation benchmarks, i.e., "SYNTHIA to Cityscapes" and "GTA to Cityscapes", and outperforms the prior methods by a large margin in both the standard and the proposed evaluation protocols.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View Adaptation
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Ashley Dowling,
Son Lam Phung,
Jackson Cothren,
Khoa Luu
Abstract:
Understanding semantic scene segmentation of urban scenes captured from the Unmanned Aerial Vehicles (UAV) perspective plays a vital role in building a perception model for UAV. With the limitations of large-scale densely labeled data, semantic scene segmentation for UAV views requires a broad understanding of an object from both its top and side views. Adapting from well-annotated autonomous driv…
▽ More
Understanding semantic scene segmentation of urban scenes captured from the Unmanned Aerial Vehicles (UAV) perspective plays a vital role in building a perception model for UAV. With the limitations of large-scale densely labeled data, semantic scene segmentation for UAV views requires a broad understanding of an object from both its top and side views. Adapting from well-annotated autonomous driving data to unlabeled UAV data is challenging due to the cross-view differences between the two data types. Our work proposes a novel Cross-View Adaptation (CROVIA) approach to effectively adapt the knowledge learned from on-road vehicle views to UAV views. First, a novel geometry-based constraint to cross-view adaptation is introduced based on the geometry correlation between views. Second, cross-view correlations from image space are effectively transferred to segmentation space without any requirement of paired on-road and UAV view data via a new Geometry-Constraint Cross-View (GeiCo) loss. Third, the multi-modal bijective networks are introduced to enforce the global structural modeling across views. Experimental results on new cross-view adaptation benchmarks introduced in this work, i.e., SYNTHIA to UAVID and GTA5 to UAVID, show the State-of-the-Art (SOTA) performance of our approach over prior adaptation methods
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Micron-BERT: BERT-based Facial Micro-Expression Recognition
Authors:
Xuan-Bac Nguyen,
Chi Nhan Duong,
Xin Li,
Susan Gauch,
Han-Seok Seo,
Khoa Luu
Abstract:
Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision…
▽ More
Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision problems is designed to learn only from full images or videos, and the architecture cannot accurately detect details of facial micro-expressions. This paper presents Micron-BERT ($μ$-BERT), a novel approach to facial micro-expression recognition. The proposed method can automatically capture these movements in an unsupervised manner based on two key ideas. First, we employ Diagonal Micro-Attention (DMA) to detect tiny differences between two frames. Second, we introduce a new Patch of Interest (PoI) module to localize and highlight micro-expression interest regions and simultaneously reduce noisy backgrounds and distractions. By incorporating these components into an end-to-end deep network, the proposed $μ$-BERT significantly outperforms all previous work in various micro-expression tasks. $μ$-BERT can be trained on a large-scale unlabeled dataset, i.e., up to 8 million images, and achieves high accuracy on new unseen facial micro-expression datasets. Empirical experiments show $μ$-BERT consistently outperforms state-of-the-art performance on four micro-expression benchmarks, including SAMM, CASME II, SMIC, and CASME3, by significant margins. Code will be available at \url{https://github.com/uark-cviu/Micron-BERT}
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Spin State Disproportionation in Insulating Ferromagnetic LaCoO3 Epitaxial Thin Films
Authors:
Shanquan Chen,
Jhong-Yi Chang,
Qinghua Zhang,
Qiuyue Li,
Ting Lin,
Fanqi Meng,
Haoliang Huang,
Shengwei Zeng,
Xinmao Yin,
My Ngoc Duong,
Yalin Lu,
Lang Chen,
Er-Jia Guo,
Hanghui Chen,
Chun-Fu Chang,
Chang-Yang Kuo,
Zuhuang Chen
Abstract:
The origin of insulating ferromagnetism in epitaxial LaCoO3 films under tensile strain remains elusive despite extensive research efforts have been devoted. Surprisingly, the spin state of its Co ions, the main parameter of its ferromagnetism, is still to be determined. Here, we have systematically investigated the spin state in epitaxial LaCoO3 thin films to clarify the mechanism of strain induce…
▽ More
The origin of insulating ferromagnetism in epitaxial LaCoO3 films under tensile strain remains elusive despite extensive research efforts have been devoted. Surprisingly, the spin state of its Co ions, the main parameter of its ferromagnetism, is still to be determined. Here, we have systematically investigated the spin state in epitaxial LaCoO3 thin films to clarify the mechanism of strain induced ferromagnetism using element-specific x-ray absorption spectroscopy and dichroism. Combining with the configuration interaction cluster calculations, we unambiguously demonstrate that Co3+ in LaCoO3 films under compressive strain (on LaAlO3 substrate) are practically a low spin state, whereas Co3+ in LaCoO3 films under tensile strain (on SrTiO3 substrate) have mixed high spin and low spin states with a ratio close to 1:3. From the identification of this spin state ratio, we infer that the dark strips observed by high-resolution scanning transmission electron microscopy indicate the position of Co3+ high spin state, i.e., an observation of a spin state disproportionation in tensile-strained LaCoO3 films. This consequently explains the nature of ferromagnetism in LaCoO3 films.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Sensitivity analysis for transportability in multi-study, multi-outcome settings
Authors:
Ngoc Q. Duong,
Amy J. Pitts,
Soohyun Kim,
Caleb H. Miles
Abstract:
Existing work in data fusion has covered identification of causal estimands when integrating data from heterogeneous sources. These results typically require additional assumptions to make valid estimation and inference. However, there is little literature on transporting and generalizing causal effects in multiple-outcome setting, where the primary outcome is systematically missing on the study l…
▽ More
Existing work in data fusion has covered identification of causal estimands when integrating data from heterogeneous sources. These results typically require additional assumptions to make valid estimation and inference. However, there is little literature on transporting and generalizing causal effects in multiple-outcome setting, where the primary outcome is systematically missing on the study level but for which other outcome variables may serve as proxies. We review an identification result developed in ongoing work that utilizes information from these proxies to obtain more efficient estimators and the corresponding key identification assumption. We then introduce methods for assessing the sensitivity of this approach to the identification assumption.
△ Less
Submitted 7 January, 2023;
originally announced January 2023.
-
Multi-Camera Multi-Object Tracking on the Move via Single-Stage Global Association Approach
Authors:
Pha Nguyen,
Kha Gia Quach,
Chi Nhan Duong,
Son Lam Phung,
Ngan Le,
Khoa Luu
Abstract:
The development of autonomous vehicles generates a tremendous demand for a low-cost solution with a complete set of camera sensors capturing the environment around the car. It is essential for object detection and tracking to address these new challenges in multi-camera settings. In order to address these challenges, this work introduces novel Single-Stage Global Association Tracking approaches to…
▽ More
The development of autonomous vehicles generates a tremendous demand for a low-cost solution with a complete set of camera sensors capturing the environment around the car. It is essential for object detection and tracking to address these new challenges in multi-camera settings. In order to address these challenges, this work introduces novel Single-Stage Global Association Tracking approaches to associate one or more detection from multi-cameras with tracked objects. These approaches aim to solve fragment-tracking issues caused by inconsistent 3D object detection. Moreover, our models also improve the detection accuracy of the standard vision-based 3D object detectors in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method by outperforming prior vision-based tracking methods in multi-camera settings.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Binary-Continuous Sum-of-ratios Optimization: Discretization, Approximations, and Convex Reformulations
Authors:
Tien Mai,
Ngan Ha Duong,
Thuy Anh Ta
Abstract:
We study a class of non-convex sum-of-ratios programs which can be used for decision-making in prominent areas such as product assortment and price optimization, facility location, and security games. Such an optimization problem involves both continuous and binary decision variables and is known to be highly non-convex and intractable to solve. We explore a discretization approach to approximate…
▽ More
We study a class of non-convex sum-of-ratios programs which can be used for decision-making in prominent areas such as product assortment and price optimization, facility location, and security games. Such an optimization problem involves both continuous and binary decision variables and is known to be highly non-convex and intractable to solve. We explore a discretization approach to approximate the optimization problem and show that the approximate program can be reformulated as mixed-integer linear or second-order cone programs, which can be conveniently handled by an off-the-shelf solver (e.g., CPLEX or GUROBI). We further establish (mild) conditions under which solutions to the approximate problem converge to optimal solutions as the number of discretization points increases. We also provide approximation abounds for solutions obtained from the approximated problem. We show how our approach applies to product assortment and price optimization, maximum covering facility location, and Bayesian Stackelberg security games and provide experimental results to evaluate the efficiency of our approach.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Reversibly controlled ternary polar states and ferroelectric bias promoted by boosting square-tensile-strain
Authors:
Jun Han Lee,
Nguyen Xuan Duong,
Min-Hyoung Jung,
Hyun-Jae Lee,
Ahyoung Kim,
Youngki Yeo,
Junhyung Kim,
Gye-Hyeon Kim,
Byeong-Gwan Cho,
Jaegyu Kim,
Furqan Ul Hassan Naqvi,
Jong-Seong Bae,
Jeehoon Kim,
Chang Won Ahn,
Young-Min Kim,
Tae Kwon Song,
Jae-Hyeon Ko,
Tae-Yeong Koo,
Changhee Sohn,
Kibog Park,
Chan-Ho Yang,
Sang Mo Yang,
Jun Hee Lee,
Hu Young Jeong,
Tae Heon Kim
, et al. (1 additional authors not shown)
Abstract:
Interaction between dipoles often emerges intriguing physical phenomena, such as exchange bias in the magnetic heterostructures and magnetoelectric effect in multiferroics, which lead to advances in multifunctional heterostructures. However, the defect-dipole tends to be considered the undesired to deteriorate the electronic functionality. Here, we report deterministic switching between the ferroe…
▽ More
Interaction between dipoles often emerges intriguing physical phenomena, such as exchange bias in the magnetic heterostructures and magnetoelectric effect in multiferroics, which lead to advances in multifunctional heterostructures. However, the defect-dipole tends to be considered the undesired to deteriorate the electronic functionality. Here, we report deterministic switching between the ferroelectric and the pinched states by exploiting a new substrate of cubic perovskite, BaZrO$_{3}$, which boosts square-tensile-strain to BaTiO$_{3}$ and promotes four-variants in-plane spontaneous polarization with oxygen vacancy creation. First-principles calculations propose a complex of an oxygen vacancy and two Ti$^{3+}$ ions coins a charge-neutral defect-dipole. Cooperative control of the defect-dipole and the spontaneous polarization reveals ternary in-plane polar states characterized by biased/pinched hysteresis loops. Furthermore, we experimentally demonstrate that three electrically controlled polar-ordering states lead to switchable and non-volatile dielectric states for application of non-destructive electro-dielectric memory. This discovery opens a new route to develop functional materials via manipulating defect-dipoles and offers a novel platform to advance heteroepitaxy beyond the prevalent perovskite substrates.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Vec2Face-v2: Unveil Human Faces from their Blackbox Features via Attention-based Network in Face Recognition
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Ngan Le,
Marios Savvides,
Khoa Luu
Abstract:
In this work, we investigate the problem of face reconstruction given a facial feature representation extracted from a blackbox face recognition engine. Indeed, it is a very challenging problem in practice due to the limitations of abstracted information from the engine. We, therefore, introduce a new method named Attention-based Bijective Generative Adversarial Networks in a Distillation framewor…
▽ More
In this work, we investigate the problem of face reconstruction given a facial feature representation extracted from a blackbox face recognition engine. Indeed, it is a very challenging problem in practice due to the limitations of abstracted information from the engine. We, therefore, introduce a new method named Attention-based Bijective Generative Adversarial Networks in a Distillation framework (DAB-GAN) to synthesize the faces of a subject given his/her extracted face recognition features. Given any unconstrained unseen facial features of a subject, the DAB-GAN can reconstruct his/her facial images in high definition. The DAB-GAN method includes a novel attention-based generative structure with the newly defined Bijective Metrics Learning approach. The framework starts by introducing a bijective metric so that the distance measurement and metric learning process can be directly adopted in the image domain for an image reconstruction task. The information from the blackbox face recognition engine will be optimally exploited using the global distillation process. Then an attention-based generator is presented for a highly robust generator to synthesize realistic faces with ID preservation. We have evaluated our method on the challenging face recognition databases, i.e., CelebA, LFW, CFP-FP, CP-LFW, AgeDB, CA-LFW, and consistently achieved state-of-the-art results. The advancement of DAB-GAN is also proven in both image realism and ID preservation properties.
△ Less
Submitted 1 September, 2023; v1 submitted 11 September, 2022;
originally announced September 2022.
-
Depth Perspective-aware Multiple Object Tracking
Authors:
Kha Gia Quach,
Huu Le,
Pha Nguyen,
Chi Nhan Duong,
Tien Dai Bui,
Khoa Luu
Abstract:
This paper aims to tackle Multiple Object Tracking (MOT), an important problem in computer vision but remains challenging due to many practical issues, especially occlusions. Indeed, we propose a new real-time Depth Perspective-aware Multiple Object Tracking (DP-MOT) approach to tackle the occlusion problem in MOT. A simple yet efficient Subject-Ordered Depth Estimation (SODE) is first proposed to…
▽ More
This paper aims to tackle Multiple Object Tracking (MOT), an important problem in computer vision but remains challenging due to many practical issues, especially occlusions. Indeed, we propose a new real-time Depth Perspective-aware Multiple Object Tracking (DP-MOT) approach to tackle the occlusion problem in MOT. A simple yet efficient Subject-Ordered Depth Estimation (SODE) is first proposed to automatically order the depth positions of detected subjects in a 2D scene in an unsupervised manner. Using the output from SODE, a new Active pseudo-3D Kalman filter, a simple but effective extension of Kalman filter with dynamic control variables, is then proposed to dynamically update the movement of objects. In addition, a new high-order association approach is presented in the data association step to incorporate first-order and second-order relationships between the detected objects. The proposed approach consistently achieves state-of-the-art performance compared to recent MOT methods on standard MOT benchmarks.
△ Less
Submitted 27 February, 2023; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Joint Location and Cost Planning in Maximum Capture Facility Location under Multiplicative Random Utility Maximization
Authors:
Ngan Ha Duong,
Tien Thanh Dam,
Thuy Anh Ta,
Tien Mai
Abstract:
We study a joint facility location and cost planning problem in a competitive market under random utility maximization (RUM) models. The objective is to locate new facilities and make decisions on the costs (or budgets) to spend on the new facilities, aiming to maximize an expected captured customer demand, assuming that customers choose a facility among all available facilities according to a RUM…
▽ More
We study a joint facility location and cost planning problem in a competitive market under random utility maximization (RUM) models. The objective is to locate new facilities and make decisions on the costs (or budgets) to spend on the new facilities, aiming to maximize an expected captured customer demand, assuming that customers choose a facility among all available facilities according to a RUM model. We examine two RUM frameworks in the discrete choice literature, namely, the additive and multiplicative RUM. While the former has been widely used in facility location problems, we are the first to explore the latter in the context. We numerically show that the two RUM frameworks can well approximate each other in the context of the cost optimization problem. In addition, we show that, under the additive RUM framework, the resultant cost optimization problem becomes highly non-convex and may have several local optima. In contrast, the use of the multiplicative RUM brings several advantages to the competitive facility location problem. For instance, the cost optimization problem under the multiplicative RUM can be solved efficiently by a general convex optimization solver or can be reformulated as a conic quadratic program and handled by a conic solver available in some off-the-shelf solvers such as CPLEX or GUROBI. Furthermore, we consider a joint location and cost optimization problem under the multiplicative RUM and propose three approaches to solve the problem, namely, an equivalent conic reformulation, a multi-cut outer-approximation algorithm, and a local search heuristic. We provide numerical experiments based on synthetic instances of various sizes to evaluate the performances of the proposed algorithms in solving the cost optimization, and the joint location and cost optimization problems.
△ Less
Submitted 11 February, 2023; v1 submitted 15 May, 2022;
originally announced May 2022.
-
Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles
Authors:
Pha Nguyen,
Kha Gia Quach,
Chi Nhan Duong,
Ngan Le,
Xuan-Bac Nguyen,
Khoa Luu
Abstract:
The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction app…
▽ More
The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets via cross-attention motion modeling and appearance re-identification. This approach aims at solving issues caused by inconsistent 3D object detection. Moreover, our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Authors:
Thanh-Dat Truong,
Quoc-Huy Bui,
Chi Nhan Duong,
Han-Seok Seo,
Son Lam Phung,
Xin Li,
Khoa Luu
Abstract:
Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., h…
▽ More
Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., how does the temporal ordering of video frames affect the recognition results? This work presents a novel end-to-end Transformer-based Directed Attention (DirecFormer) framework for robust action recognition. The method takes a simple but novel perspective of Transformer-based approach to understand the right order of sequence actions. Therefore, the contributions of this work are three-fold. Firstly, we introduce the problem of ordered temporal learning issues to the action recognition problem. Secondly, a new Directed Attention mechanism is introduced to understand and provide attentions to human actions in the right order. Thirdly, we introduce the conditional dependency in action sequence modeling that includes orders and classes. The proposed approach consistently achieves the state-of-the-art (SOTA) results compared with the recent action recognition methods, on three standard large-scale benchmarks, i.e. Jester, Kinetics-400 and Something-Something-V2.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Authors:
Ngoc Phuong Anh Duong,
Alexandre Almin,
Léo Lemarié,
B Ravi Kiran
Abstract:
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL bas…
▽ More
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Broadband photon pair generation from a single lithium niobate microcube
Authors:
Ngoc My Hanh Duong,
Gregoire Saerens,
Flavia Timpu,
Maria Teresa Buscaglia,
Vincenzo Buscaglia,
Andrea Morandi,
Jolanda S. Muller,
Andreas Maeder,
Fabian Kaufmann,
Alexander Sonltsev,
Rachel Grange
Abstract:
Nonclassical light sources are highly sought after as they are an integral part of quantum communication and quantum computation devices. Typical sources rely on bulk crystals that are not compact and have limited bandwidth due to phase-matching conditions. In this work, we demonstrate the generation of photon pairs from a free-standing lithium niobate microcube at the telecommunication wavelength…
▽ More
Nonclassical light sources are highly sought after as they are an integral part of quantum communication and quantum computation devices. Typical sources rely on bulk crystals that are not compact and have limited bandwidth due to phase-matching conditions. In this work, we demonstrate the generation of photon pairs from a free-standing lithium niobate microcube at the telecommunication wavelength through the spontaneous parametric down-conversion process. The maximum photon pair generation rate obtained from a single microcube with the size of ~4 microns is ~80 Hz, resulting in an efficiency of ~1.2 GHz/Wm per unit volume, which is an order of magnitude higher than the efficiency of photon-pair generation in bulky nonlinear crystals. The microcubes are synthesized through a solvothermal method, offering the possibility for scalable devices via bottom-up assembly. Our work constitutes an important step forward in the realization of compact nonclassical light sources with broadband tunability for various applications in quantum communication, quantum computing, and quantum metrology.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Ngan Le,
Son Lam Phung,
Chase Rainwater,
Khoa Luu
Abstract:
Semantic segmentation aims to predict pixel-level labels. It has become a popular task in various computer vision applications. While fully supervised segmentation methods have achieved high accuracy on large-scale vision datasets, they are unable to generalize on a new test environment or a new domain well. In this work, we first introduce a new Un-aligned Domain Score to measure the efficiency o…
▽ More
Semantic segmentation aims to predict pixel-level labels. It has become a popular task in various computer vision applications. While fully supervised segmentation methods have achieved high accuracy on large-scale vision datasets, they are unable to generalize on a new test environment or a new domain well. In this work, we first introduce a new Un-aligned Domain Score to measure the efficiency of a learned model on a new target domain in unsupervised manner. Then, we present the new Bijective Maximum Likelihood(BiMaL) loss that is a generalized form of the Adversarial Entropy Minimization without any assumption about pixel independence. We have evaluated the proposed BiMaL on two domains. The proposed BiMaL approach consistently outperforms the SOTA methods on empirical experiments on "SYNTHIA to Cityscapes", "GTA5 to Cityscapes", and "SYNTHIA to Vistas".
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
The Right to Talk: An Audio-Visual Transformer Approach
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
The De Vu,
Hoang Anh Pham,
Bhiksha Raj,
Ngan Le,
Khoa Luu
Abstract:
Turn-taking has played an essential role in structuring the regulation of a conversation. The task of identifying the main speaker (who is properly taking his/her turn of speaking) and the interrupters (who are interrupting or reacting to the main speaker's utterances) remains a challenging task. Although some prior methods have partially addressed this task, there still remain some limitations. F…
▽ More
Turn-taking has played an essential role in structuring the regulation of a conversation. The task of identifying the main speaker (who is properly taking his/her turn of speaking) and the interrupters (who are interrupting or reacting to the main speaker's utterances) remains a challenging task. Although some prior methods have partially addressed this task, there still remain some limitations. Firstly, a direct association of Audio and Visual features may limit the correlations to be extracted due to different modalities. Secondly, the relationship across temporal segments hel** to maintain the consistency of localization, separation, and conversation contexts is not effectively exploited. Finally, the interactions between speakers that usually contain the tracking and anticipatory decisions about the transition to a new speaker are usually ignored. Therefore, this work introduces a new Audio-Visual Transformer approach to the problem of localization and highlighting the main speaker in both audio and visual channels of a multi-speaker conversation video in the wild. The proposed method exploits different types of correlations presented in both visual and audio signals. The temporal audio-visual relationships across spatial-temporal space are anticipated and optimized via the self-attention mechanism in a Transformerstructure. Moreover, a newly collected dataset is introduced for the main speaker detection. To the best of our knowledge, it is one of the first studies that is able to automatically localize and highlight the main speaker in both visual and audio channels in multi-speaker conversation videos.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
RadGraph: Extracting Clinical Entities and Relations from Radiology Reports
Authors:
Saahil Jain,
Ashwin Agrawal,
Adriel Saporta,
Steven QH Truong,
Du Nguyen Duong,
Tan Bui,
Pierre Chambon,
Yuhao Zhang,
Matthew P. Lungren,
Andrew Y. Ng,
Curtis P. Langlotz,
Pranav Rajpurkar
Abstract:
Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a devel…
▽ More
Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with map**s to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.
△ Less
Submitted 29 August, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking
Authors:
Kha Gia Quach,
Pha Nguyen,
Huu Le,
Thanh-Dat Truong,
Chi Nhan Duong,
Minh-Triet Tran,
Khoa Luu
Abstract:
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Develo** a robust MC-MOT system, however, is still highly challenging due to many practical…
▽ More
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Develo** a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlap** ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks
Authors:
Alexey Ozerov,
Ngoc Duong
Abstract:
Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. Thus, several resource-adaptable or flexible approaches were recently proposed…
▽ More
Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. Thus, several resource-adaptable or flexible approaches were recently proposed that train at the same time a big model and several resource-specific sub-models. Inplace knowledge distillation (IPKD) became a popular method to train those models and consists in distilling the knowledge from a larger model (teacher) to all other sub-models (students). In this work a novel generic training method called IPKD with teacher assistant (IPKD-TA) is introduced, where sub-models themselves become teacher assistants teaching smaller sub-models. We evaluated the proposed IPKD-TA training method using two state-of-the-art flexible models (MSDNet and Slimmable MobileNet-V1) with two popular image classification benchmarks (CIFAR-10 and CIFAR-100). Our results demonstrate that the IPKD-TA is on par with the existing state of the art while improving it in most cases.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Colossal topological Hall effect at the transition between isolated and lattice-phase interfacial skyrmions
Authors:
M. Raju,
A. P. Petrović,
A. Yagil,
K. S. Denisov,
N. K. Duong,
B. Göbel,
E. Şaşıoğlu,
O. M. Auslaender,
I. Mertig,
I. V. Rozhansky,
C. Panagopoulos
Abstract:
The topological Hall effect is used extensively to study chiral spin textures in various materials. However, the factors controlling its magnitude in technologically-relevant thin films remain uncertain. Using variable temperature magnetotransport and real-space magnetic imaging in a series of Ir/Fe/Co/Pt heterostructures, here we report that the chiral spin fluctuations at the phase boundary betw…
▽ More
The topological Hall effect is used extensively to study chiral spin textures in various materials. However, the factors controlling its magnitude in technologically-relevant thin films remain uncertain. Using variable temperature magnetotransport and real-space magnetic imaging in a series of Ir/Fe/Co/Pt heterostructures, here we report that the chiral spin fluctuations at the phase boundary between isolated skyrmions and a disordered skyrmion lattice result in a power-law enhancement of the topological Hall resistivity by up to three orders of magnitude. Our work reveals the dominant role of skyrmion stability and configuration in determining the magnitude of the topological Hall effect.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Magnetization Reversal Signatures of Hybrid and Pure Néel Skyrmions in Thin Film Multilayers
Authors:
Nghiep Khoan Duong,
Riccardo Tomasello,
M. Raju,
Alexander P. Petrović,
Stefano Chiappini,
Giovanni Finocchio,
Christos Panagopoulos
Abstract:
We report a study of the magnetization reversals and skyrmion configurations in two systems - Pt/Co/MgO and Ir/Fe/Co/Pt multilayers, where magnetic skyrmions are stabilized by a combination of dipolar and Dzyaloshinskii-Moriya interactions (DMI). First Order Reversal Curve (FORC) diagrams of low-DMI Pt/Co/MgO and high-DMI Ir/Fe/Co/Pt exhibit stark differences, which are identified by micromagnetic…
▽ More
We report a study of the magnetization reversals and skyrmion configurations in two systems - Pt/Co/MgO and Ir/Fe/Co/Pt multilayers, where magnetic skyrmions are stabilized by a combination of dipolar and Dzyaloshinskii-Moriya interactions (DMI). First Order Reversal Curve (FORC) diagrams of low-DMI Pt/Co/MgO and high-DMI Ir/Fe/Co/Pt exhibit stark differences, which are identified by micromagnetic simulations to be indicative of hybrid and pure Néel skyrmions, respectively. Tracking the evolution of FORC features in multilayers with dipolar interactions and DMI, we find that the negative FORC valley, typically accompanying the positive FORC peak near saturation, disappears under both reduced dipolar interactions and enhanced DMI. As these conditions favor the formation of pure Neel skyrmions, we propose that the resultant FORC feature - a single positive FORC peak near saturation - can act as a fingerprint for pure Néel skyrmions in multilayers. Our study thus expands on the utility of FORC analysis as a tool for characterizing spin topology in multilayer thin films.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
On the hidden treasure of dialog in video question answering
Authors:
Deniz Engin,
François Schnitzler,
Ngoc Q. K. Duong,
Yannis Avrithis
Abstract:
High-level understanding of stories in video such as movies and TV shows from raw data is extremely challenging. Modern video question answering (VideoQA) systems often use additional human-made sources like plot synopses, scripts, video descriptions or knowledge bases. In this work, we present a new approach to understand the whole story without such external sources. The secret lies in the dialo…
▽ More
High-level understanding of stories in video such as movies and TV shows from raw data is extremely challenging. Modern video question answering (VideoQA) systems often use additional human-made sources like plot synopses, scripts, video descriptions or knowledge bases. In this work, we present a new approach to understand the whole story without such external sources. The secret lies in the dialog: unlike any prior work, we treat dialog as a noisy source to be converted into text description via dialog summarization, much like recent methods treat video. The input of each modality is encoded by transformers independently, and a simple fusion method combines all modalities, using soft temporal attention for localization over long inputs. Our model outperforms the state of the art on the KnowIT VQA dataset by a large margin, without using question-specific human annotation or human-made plot summaries. It even outperforms human evaluators who have never watched any whole episode before. Code is available at https://engindeniz.github.io/dialogsummary-videoqa
△ Less
Submitted 19 August, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
-
Skyrmion-(Anti)Vortex Coupling in a Chiral Magnet-Superconductor Heterostructure
Authors:
A. P. Petrović,
M. Raju,
X. Y. Tee,
A. Louat,
I. Maggio-Aprile,
R. M. Menezes,
M. J. Wyszyński,
N. K. Duong,
M. Reznikov,
Ch. Renner,
M. V. Milošević,
C. Panagopoulos
Abstract:
We report experimental coupling of chiral magnetism and superconductivity in [IrFeCoPt]/Nb heterostructures. The stray field of skyrmions with radius ~50nm is sufficient to nucleate antivortices in a 25nm Nb film, with unique signatures in the magnetization, critical current and flux dynamics, corroborated via simulations. We also detect a thermally-tunable Rashba-Edelstein exchange coupling in th…
▽ More
We report experimental coupling of chiral magnetism and superconductivity in [IrFeCoPt]/Nb heterostructures. The stray field of skyrmions with radius ~50nm is sufficient to nucleate antivortices in a 25nm Nb film, with unique signatures in the magnetization, critical current and flux dynamics, corroborated via simulations. We also detect a thermally-tunable Rashba-Edelstein exchange coupling in the isolated skyrmion phase. This realization of a strongly interacting skyrmion-(anti)vortex system opens a path towards controllable topological hybrid materials, unattainable to date.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
3D-Printed Terahertz Topological Waveguides
Authors:
Muhammad Talal Ali Khan,
Haisu Li,
Nathan Nam Minh Duong,
Andrea Blanco-Redondo,
Shaghik Atakaramians
Abstract:
Compact and robust waveguide chips are crucial for new integrated terahertz applications, such as high-speed interconnections between processors and broadband short-range wireless communications. Progress on topological photonic crystals shows potential to improve integrated terahertz systems that suffer from high losses around sharp bends. Robust terahertz topological transport through sharp bend…
▽ More
Compact and robust waveguide chips are crucial for new integrated terahertz applications, such as high-speed interconnections between processors and broadband short-range wireless communications. Progress on topological photonic crystals shows potential to improve integrated terahertz systems that suffer from high losses around sharp bends. Robust terahertz topological transport through sharp bends on a silicon chip has been recently reported over a relatively narrow bandwidth. Here, we report the experimental demonstration of topological terahertz planar air-channel metallic waveguides which can be integrated into an on-chip interconnect. Our platform can be fabricated by a simple, cost-effective technique combining 3D-printing and gold-sputtering. The relative size of the measured topological bandgap is ~12.5%, which entails significant improvement over all-silicon terahertz topological waveguides (~7.8%). We further demonstrate robust THz propagation around defects and delay lines. Our work provides a promising path towards compact integrated terahertz devices as a next frontier for terahertz wireless communications.
△ Less
Submitted 21 September, 2020;
originally announced October 2020.
-
Self-Attention Generative Adversarial Network for Speech Enhancement
Authors:
Huy Phan,
Huy Le Nguyen,
Oliver Y. Chén,
Philipp Koch,
Ngoc Q. K. Duong,
Ian McLoughlin,
Alfred Mertins
Abstract:
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we…
▽ More
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead.
△ Less
Submitted 6 February, 2021; v1 submitted 18 October, 2020;
originally announced October 2020.
-
On Multitask Loss Function for Audio Event Detection and Localization
Authors:
Huy Phan,
Lam Pham,
Philipp Koch,
Ngoc Q. K. Duong,
Ian McLoughlin,
Alfred Mertins
Abstract:
Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a multitask regression model, in which both (multi-l…
▽ More
Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for model training. We show that the common combination of heterogeneous loss functions causes the network to underfit the data whereas the homogeneous mean squared error loss leads to better convergence and performance. Experiments on the development and validation sets of the DCASE 2020 SELD task demonstrate that the proposed system also outperforms the DCASE 2020 SELD baseline across all the detection and localization metrics, reducing the overall SELD error (the combined metric) by approximately 10% absolute.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Optical repum** of resonantly excited quantum emitters in hexagonal boron nitride
Authors:
Simon J. U. White,
Ngoc My Hanh Duong,
Alexander S. Solntsev,
Je-Hyung Kim,
Mehran Kianinia,
Igor Aharonovich
Abstract:
Resonant excitation of solid-state quantum emitters enables coherent control of quantum states and generation of coherent single photons, which are required for scalable quantum photonics applications. However, these systems can often decay to one or more intermediate dark states or spectrally jump, resulting in the lack of photons on resonance. Here, we present an optical co-excitation scheme whi…
▽ More
Resonant excitation of solid-state quantum emitters enables coherent control of quantum states and generation of coherent single photons, which are required for scalable quantum photonics applications. However, these systems can often decay to one or more intermediate dark states or spectrally jump, resulting in the lack of photons on resonance. Here, we present an optical co-excitation scheme which uses a weak non-resonant laser to reduce transitions to a dark state and amplify the photoluminescence from quantum emitters in hexagonal boron nitride (hBN). Utilizing a two-laser repum** scheme, we achieve optically stable resonance fluorescence of hBN emitters and an overall increase of ON time by an order of magnitude compared to only resonant excitation. Our results are important for the deployment of atom-like defects in hBN as reliable building blocks for quantum photonic applications.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Large few-layer hexagonal boron nitride flakes for nonlinear optics
Authors:
Nils Bernhardt,
Sejeong Kim,
Johannes E. Froch,
Simon White,
Ngoc My Hanh Duong,
Zhe He,
Bo Chen,
** Liu,
Igor Aharonovich,
Alexander S. Solntsev
Abstract:
Hexagonal boron nitride (hBN) is a layered dielectric material with a wide range of applications in optics and photonics. In this work, we demonstrate a fabrication method for few-layer hBN flakes with areas up to 5000 $\rm μm$. We show that hBN in this form can be integrated with photonic microstructures: as an example, we use a circular Bragg grating (CBG). The layer quality of the exfoliated hB…
▽ More
Hexagonal boron nitride (hBN) is a layered dielectric material with a wide range of applications in optics and photonics. In this work, we demonstrate a fabrication method for few-layer hBN flakes with areas up to 5000 $\rm μm$. We show that hBN in this form can be integrated with photonic microstructures: as an example, we use a circular Bragg grating (CBG). The layer quality of the exfoliated hBN flake on a CBG is confirmed by second-harmonic generation (SHG) microscopy. We show that the SHG signal is uniform across the hBN sample outside the CBG and is amplified in the centre of the CBG.
△ Less
Submitted 11 July, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Persistent starspot signals on M dwarfs: multi-wavelength Doppler observations with the Habitable-zone Planet Finder and Keck/HIRES
Authors:
Paul Robertson,
Gudmundur Stefansson,
Suvrath Mahadevan,
Michael Endl,
William D. Cochran,
Corey Beard,
Chad F. Bender,
Scott A. Diddams,
Nicholas Duong,
Eric B. Ford,
Connor Fredrick,
Samuel Halverson,
Fred Hearty,
Rae Holcomb,
Lydia Juan,
Shubham Kanodia,
Jack Lubin,
Andrew J. Metcalf,
Andrew Monson,
Joe P. Ninan,
Jonathan Palafoutas,
Lawrence W. Ramsey,
Arpita Roy,
Christian Schwab,
Ryan C. Terrien
, et al. (1 additional authors not shown)
Abstract:
Young, rapidly-rotating M dwarfs exhibit prominent starspots, which create quasiperiodic signals in their photometric and Doppler spectroscopic measurements. The periodic Doppler signals can mimic radial velocity (RV) changes expected from orbiting exoplanets. Exoplanets can be distinguished from activity-induced false positives by the chromaticity and long-term incoherence of starspot signals, bu…
▽ More
Young, rapidly-rotating M dwarfs exhibit prominent starspots, which create quasiperiodic signals in their photometric and Doppler spectroscopic measurements. The periodic Doppler signals can mimic radial velocity (RV) changes expected from orbiting exoplanets. Exoplanets can be distinguished from activity-induced false positives by the chromaticity and long-term incoherence of starspot signals, but these qualities are poorly constrained for fully-convective M stars. Coherent photometric starspot signals on M dwarfs may persist for hundreds of rotations, and the wavelength dependence of starspot RV signals may not be consistent between stars due to differences in their magnetic fields and active regions. We obtained precise multi-wavelength RVs of four rapidly-rotating M dwarfs (AD Leo, G 227-22, GJ 1245B, GJ 3959) using the near-infrared (NIR) Habitable-zone Planet Finder, and the optical Keck/HIRES spectrometer. Our RVs are complemented by photometry from Kepler, TESS, and the Las Cumbres Observatory (LCO) network of telescopes. We found that all four stars exhibit large spot-induced Doppler signals at their rotation periods, and investigated the longevity and optical-to-NIR chromaticity for these signals. The phase curves remain coherent much longer than is typical for Sunlike stars. Their chromaticity varies, and one star (GJ 3959) exhibits optical and NIR RV modulation consistent in both phase and amplitude. In general, though, we find that the NIR amplitudes are lower than their optical counterparts. We conclude that starspot modulation for rapidly-rotating M stars frequently remains coherent for hundreds of stellar rotations, and gives rise to Doppler signals that, due to this coherence, may be mistaken for exoplanets.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
LIAAD: Lightweight Attentive Angular Distillation for Large-scale Age-Invariant Face Recognition
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Kha Gia Quach,
Ngan Le,
Tien D. Bui,
Khoa Luu
Abstract:
Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) tasks. However, these methods have reached some limitations with (1) the requirement of large-scale face recognition (FR) training data with age labels, which is limited in practice; (2) heavy deep network architectures for high performance; and (3) their evaluations are usually taken place on age-rela…
▽ More
Disentangled representations have been commonly adopted to Age-invariant Face Recognition (AiFR) tasks. However, these methods have reached some limitations with (1) the requirement of large-scale face recognition (FR) training data with age labels, which is limited in practice; (2) heavy deep network architectures for high performance; and (3) their evaluations are usually taken place on age-related face databases while neglecting the standard large-scale FR databases to guarantee robustness. This work presents a novel Lightweight Attentive Angular Distillation (LIAAD) approach to Large-scale Lightweight AiFR that overcomes these limitations. Given two high-performance heavy networks as teachers with different specialized knowledge, LIAAD introduces a learning paradigm to efficiently distill the age-invariant attentive and angular knowledge from those teachers to a lightweight student network making it more powerful with higher FR accuracy and robust against age factor. Consequently, LIAAD approach is able to take the advantages of both FR datasets with and without age labels to train an AiFR model. Far apart from prior distillation methods mainly focusing on accuracy and compression ratios in closed-set problems, our LIAAD aims to solve the open-set problem, i.e. large-scale face recognition. Evaluations on LFW, IJB-B and IJB-C Janus, AgeDB and MegaFace-FGNet with one million distractors have demonstrated the efficiency of the proposed approach on light-weight structure. This work also presents a new longitudinal face aging (LogiFace) database \footnote{This database will be made available} for further studies in age-related facial problems in future.
△ Less
Submitted 11 September, 2022; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Vec2Face: Unveil Human Faces from their Blackbox Features in Face Recognition
Authors:
Chi Nhan Duong,
Thanh-Dat Truong,
Kha Gia Quach,
Hung Bui,
Kaushik Roy,
Khoa Luu
Abstract:
Unveiling face images of a subject given his/her high-level representations extracted from a blackbox Face Recognition engine is extremely challenging. It is because the limitations of accessible information from that engine including its structure and uninterpretable extracted features. This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Ad…
▽ More
Unveiling face images of a subject given his/her high-level representations extracted from a blackbox Face Recognition engine is extremely challenging. It is because the limitations of accessible information from that engine including its structure and uninterpretable extracted features. This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Adversarial Networks in a Distillation framework (DiBiGAN), for synthesizing faces of an identity given that person's features. In order to effectively address this problem, this work firstly introduces a bijective metric so that the distance measurement and metric learning process can be directly adopted in image domain for an image reconstruction task. Secondly, a distillation process is introduced to maximize the information exploited from the blackbox face recognition engine. Then a Feature-Conditional Generator Structure with Exponential Weighting Strategy is presented for a more robust generator that can synthesize realistic faces with ID preservation. Results on several benchmarking datasets including CelebA, LFW, AgeDB, CFP-FP against matching engines have demonstrated the effectiveness of DiBiGAN on both image realism and ID preservation properties.
△ Less
Submitted 15 March, 2020;
originally announced March 2020.
-
Optical Thermometry with Quantum Emitters in Hexagonal Boron Nitride
Authors:
Yongliang Chen,
Thinh Ngoc Tran,
Ngoc My Hanh Duong,
Chi Li,
Milos Toth,
Carlo Bradac,
Igor Aharonovich,
Alexander Solntsev,
Toan Trong Tran
Abstract:
Nanoscale optical thermometry is a promising non-contact route for measuring local temperature with both high sensitivity and spatial resolution. In this work, we present a deterministic optical thermometry technique based on quantum emitters in nanoscale hexagonal boron-nitride. We show that these nanothermometers exhibit better performance than that of homologous, all-optical nanothermometers bo…
▽ More
Nanoscale optical thermometry is a promising non-contact route for measuring local temperature with both high sensitivity and spatial resolution. In this work, we present a deterministic optical thermometry technique based on quantum emitters in nanoscale hexagonal boron-nitride. We show that these nanothermometers exhibit better performance than that of homologous, all-optical nanothermometers both in sensitivity and range of working temperature. We demonstrate their effectiveness as nanothermometers by monitoring the local temperature at specific locations in a variety of custom-built micro-circuits. This work opens new avenues for nanoscale temperature measurements and heat flow studies in miniaturized, integrated devices.
△ Less
Submitted 8 March, 2020;
originally announced March 2020.
-
Improved sensitivity and quantification for ${}^{29}$Si NMR experiments on solids using UDEFT (Uniform Driven Equilibrium Fourier Transform)
Authors:
Nghia Tuan Duong,
Julien Trébosc,
Olivier Lafon,
Jean-Paul Amoureux
Abstract:
We demonstrate the possibility to use UDEFT (Uniform Driven Equilibrium Fourier Transform) technique in order to improve the sensitivity and the quantification of one-dimensional ${}^{29}$Si NMR experiments under Magic-Angle Spinning (MAS). We derive an analytical expression of the signal-to-noise ratios of UDEFT and single-pulse (SP) experiments subsuming the contributions of transient and steady…
▽ More
We demonstrate the possibility to use UDEFT (Uniform Driven Equilibrium Fourier Transform) technique in order to improve the sensitivity and the quantification of one-dimensional ${}^{29}$Si NMR experiments under Magic-Angle Spinning (MAS). We derive an analytical expression of the signal-to-noise ratios of UDEFT and single-pulse (SP) experiments subsuming the contributions of transient and steady-state regimes. Using numerical spin dynamics simulations and experiments on ${}^{29}$Si-enriched amorphous silica and borosilicate glass, we show that 59${}_{180}$298${}_{0}$59${}_{180}$ refocusing composite $π$-pulse and the adiabatic inversion using tanh/tan modulation improve the robustness of UDEFT technique to rf-inhomogeneity, offset, and chemical shift anisotropy. These pulses combined with a two-step phase cycling limit the pulse imperfections and the artifacts produced by stimulated echoes. The sensitivity of SP, UDEFT and CPMG (Carr-Purcell Meiboom-Gill) techniques are compared experimentally on functionalized and non-functionalized mesoporous silica. Furthermore, experiments on a flame retardant material prove that UDEFT technique provides a better quantification of ${}^{29}$Si sites with higher sensitivity than SP method.
△ Less
Submitted 7 March, 2020;
originally announced March 2020.
-
Infinitesimal CR automorphisms and stability groups of nonminimal infinite type models in $\mathbb C^2$
Authors:
Van Thu Ninh,
Thi Ngoc Oanh Duong,
Van Hoang Pham,
Hyeseon Kim
Abstract:
We determine infinitesimal $\mathrm{CR}$ automorphisms and stability groups of real hypersurfaces in $\mathbb C^2$ in the case when the hypersurface is nonminimal and of infinite type at the reference point.
We determine infinitesimal $\mathrm{CR}$ automorphisms and stability groups of real hypersurfaces in $\mathbb C^2$ in the case when the hypersurface is nonminimal and of infinite type at the reference point.
△ Less
Submitted 20 April, 2020; v1 submitted 30 August, 2019;
originally announced August 2019.
-
Integrated on chip platform with quantum emitters in layered materials
Authors:
Sejeong Kim,
Ngoc My Hanh Duong,
Minh Nguyen,
Tsung-Ju Lu,
Mehran Kianinia,
Noah Mendelson,
Alexander Solntsev,
Carlo Bradac,
Dirk R. Englund,
Igor Aharonovich
Abstract:
Integrated quantum photonic circuitry is an emerging topic that requires efficient coupling of quantum light sources to waveguides and optical resonators. So far, great effort has been devoted to engineering on-chip systems from three-dimensional crystals such as diamond or gallium arsenide. In this study, we demonstrate room temperature coupling of quantum emitters embedded within a layered hexag…
▽ More
Integrated quantum photonic circuitry is an emerging topic that requires efficient coupling of quantum light sources to waveguides and optical resonators. So far, great effort has been devoted to engineering on-chip systems from three-dimensional crystals such as diamond or gallium arsenide. In this study, we demonstrate room temperature coupling of quantum emitters embedded within a layered hexagonal boron nitride to an on-chip aluminium nitride waveguide. We achieved 1.2% light coupling efficiency of the device and realise transmission of single photons through the waveguide. Our results serve as a foundation for the integration of layered materials with on-chip components and for the realisation of integrated quantum photonic circuitry.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Domain Generalization via Universal Non-volume Preserving Models
Authors:
Thanh-Dat Truong,
Chi Nhan Duong,
Khoa Luu,
Minh-Triet Tran,
Ngan Le
Abstract:
Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted, or fine-tuned. Therefore, recent deep learning techniques, such as domain adaptation, feature transferring, and fine-tuning, c…
▽ More
Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted, or fine-tuned. Therefore, recent deep learning techniques, such as domain adaptation, feature transferring, and fine-tuning, cannot be applied. This paper presents a novel approach to the problem of domain generalization in the context of deep learning. The proposed method is evaluated on different datasets in various problems, i.e. (i) digit recognition on MNIST, SVHN, and MNIST-M, (ii) face recognition on Extended Yale-B, CMU-PIE and CMU-MPIE, and (iii) pedestrian recognition on RGB and Thermal image datasets. The experimental results show that our proposed method consistently improves performance accuracy. It can also be easily incorporated with any other CNN frameworks within an end-to-end deep network design for object detection and recognition problems to improve their performance.
△ Less
Submitted 10 April, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Image Alignment in Unseen Domains via Domain Deep Generalization
Authors:
Thanh-Dat Truong,
Khoa Luu,
Chi Nhan Duong,
Ngan Le,
Minh-Triet Tran
Abstract:
Image alignment across domains has recently become one of the realistic and popular topics in the research community. In this problem, a deep learning-based image alignment method is usually trained on an available largescale database. During the testing steps, this trained model is deployed on unseen images collected under different camera conditions and modalities. The delivered deep network mod…
▽ More
Image alignment across domains has recently become one of the realistic and popular topics in the research community. In this problem, a deep learning-based image alignment method is usually trained on an available largescale database. During the testing steps, this trained model is deployed on unseen images collected under different camera conditions and modalities. The delivered deep network models are unable to be updated, adapted or fine-tuned in these scenarios. Thus, recent deep learning techniques, e.g. domain adaptation, feature transferring, and fine-tuning, are unable to be deployed. This paper presents a novel deep learning based approach to tackle the problem of across unseen modalities. The proposed network is then applied to image alignment as an illustration. The proposed approach is designed as an end-to-end deep convolutional neural network to optimize the deep models to improve the performance. The proposed network has been evaluated in digit recognition when the model is trained on MNIST and then tested on unseen domain MNIST-M. Finally, the proposed method is benchmarked in image alignment problem when training on RGB images and testing on Depth and X-Ray images.
△ Less
Submitted 31 May, 2019; v1 submitted 28 May, 2019;
originally announced May 2019.
-
ShrinkTeaNet: Million-scale Lightweight Face Recognition via Shrinking Teacher-Student Networks
Authors:
Chi Nhan Duong,
Khoa Luu,
Kha Gia Quach,
Ngan Le
Abstract:
Large-scale face recognition in-the-wild has been recently achieved matured performance in many real work applications. However, such systems are built on GPU platforms and mostly deploy heavy deep network architectures. Given a high-performance heavy network as a teacher, this work presents a simple and elegant teacher-student learning paradigm, namely ShrinkTeaNet, to train a portable student ne…
▽ More
Large-scale face recognition in-the-wild has been recently achieved matured performance in many real work applications. However, such systems are built on GPU platforms and mostly deploy heavy deep network architectures. Given a high-performance heavy network as a teacher, this work presents a simple and elegant teacher-student learning paradigm, namely ShrinkTeaNet, to train a portable student network that has significantly fewer parameters and competitive accuracy against the teacher network. Far apart from prior teacher-student frameworks mainly focusing on accuracy and compression ratios in closed-set problems, our proposed teacher-student network is proved to be more robust against open-set problem, i.e. large-scale face recognition. In addition, this work introduces a novel Angular Distillation Loss for distilling the feature direction and the sample distributions of the teacher's hypersphere to its student. Then ShrinkTeaNet framework can efficiently guide the student's learning process with the teacher's knowledge presented in both intermediate and last stages of the feature embedding. Evaluations on LFW, CFP-FP, AgeDB, IJB-B and IJB-C Janus, and MegaFace with one million distractors have demonstrated the efficiency of the proposed approach to learn robust student networks which have satisfying accuracy and compact sizes. Our ShrinkTeaNet is able to support the light-weight architecture achieving high performance with 99.77% on LFW and 95.64% on large-scale Megaface protocols.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Fast Flow Reconstruction via Robust Invertible nxn Convolution
Authors:
Thanh-Dat Truong,
Khoa Luu,
Chi Nhan Duong,
Ngan Le,
Minh-Triet Tran
Abstract:
Flow-based generative models have recently become one of the most efficient approaches to model data generation. Indeed, they are constructed with a sequence of invertible and tractable transformations. Glow first introduced a simple type of generative flow using an invertible $1 \times 1$ convolution. However, the $1 \times 1$ convolution suffers from limited flexibility compared to the standard…
▽ More
Flow-based generative models have recently become one of the most efficient approaches to model data generation. Indeed, they are constructed with a sequence of invertible and tractable transformations. Glow first introduced a simple type of generative flow using an invertible $1 \times 1$ convolution. However, the $1 \times 1$ convolution suffers from limited flexibility compared to the standard convolutions. In this paper, we propose a novel invertible $n \times n$ convolution approach that overcomes the limitations of the invertible $1 \times 1$ convolution. In addition, our proposed network is not only tractable and invertible but also uses fewer parameters than standard convolutions. The experiments on CIFAR-10, ImageNet and Celeb-HQ datasets, have shown that our invertible $n \times n$ convolution helps to improve the performance of generative models significantly.
△ Less
Submitted 6 August, 2022; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Stabilizing zero-field skyrmions in Ir/Fe/Co/Pt thin film multilayers by magnetic history control
Authors:
Nghiep Khoan Duong,
M. Raju,
A. P. Petrovic,
R. Tomasello,
G. Finocchio,
Christos Panagopoulos
Abstract:
We present a study of the stability of room-temperature skyrmions in [Ir/Fe/Co/Pt] thin film multilayers, using the First Order Reversal Curve (FORC) technique and magnetic force microscopy (MFM). FORC diagrams reveal irreversible changes in magnetization upon field reversals, which can be correlated with the evolution of local magnetic textures probed by MFM. Using this approach, we have identifi…
▽ More
We present a study of the stability of room-temperature skyrmions in [Ir/Fe/Co/Pt] thin film multilayers, using the First Order Reversal Curve (FORC) technique and magnetic force microscopy (MFM). FORC diagrams reveal irreversible changes in magnetization upon field reversals, which can be correlated with the evolution of local magnetic textures probed by MFM. Using this approach, we have identified two different mechanisms - (1) skyrmion merger and (2) skyrmion nucleation followed by stripe propagation - which facilitate magnetization reversal in a changing magnetic field. Analysing the signatures of these mechanisms in the FORC diagram allows us to identify magnetic "histories" - i.e. precursor field sweep protocols - capable of enhancing the final zero-field skyrmion density. Our results indicate that FORC measurements can play a useful role in characterizing spin topology in thin film multilayers, and are particularly suitable for identifying samples in which skyrmion populations can be stabilized at zero field.
△ Less
Submitted 23 February, 2019;
originally announced February 2019.