Skip to main content

Showing 1–50 of 66 results for author: Do, M

.
  1. arXiv:2403.15444  [pdf, other

    eess.SP cs.AI cs.CV cs.LG eess.IV

    A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition

    Authors: Abhi Kamboj, Minh Do

    Abstract: Despite living in a multi-sensory world, most AI models are limited to textual and visual understanding of human motion and behavior. In fact, full situational awareness of human motion could best be understood through a combination of sensors. In this survey we investigate how knowledge can be transferred and utilized amongst modalities for Human Activity/Action Recognition (HAR), i.e. cross-moda… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  2. arXiv:2402.08138  [pdf, other

    cs.CV

    H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

    Authors: Minyoung Park, Mirae Do, YeonJae Shin, Jaeseok Yoo, Jongkwang Hong, Joongrock Kim, Chul Lee

    Abstract: Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric in… ▽ More

    Submitted 8 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  3. Inclusive normalization of face images to passport format

    Authors: Hongliu Cao, Minh Nhat Do, Alexis Ravanel, Eoin Thomas

    Abstract: Face recognition has been used more and more in real world applications in recent years. However, when the skin color bias is coupled with intra-personal variations like harsh illumination, the face recognition task is more likely to fail, even during human inspection. Face normalization methods try to deal with such challenges by removing intra-personal variations from an input image while keepin… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  4. arXiv:2312.06797  [pdf, other

    cs.CV

    Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input

    Authors: Trung-Hieu Hoang, Mona Zehni, Huy Phan, Duc Minh Vo, Minh N. Do

    Abstract: Despite the promising performance of current 3D human pose estimation techniques, understanding and enhancing their generalization on challenging in-the-wild videos remain an open problem. In this work, we focus on the robustness of 2D-to-3D pose lifters. To this end, we develop two benchmark datasets, namely Human3.6M-C and HumanEva-I-C, to examine the robustness of video-based 3D pose lifters to… ▽ More

    Submitted 15 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  5. arXiv:2311.18193  [pdf, other

    cs.CV

    Persistent Test-time Adaptation in Episodic Testing Scenarios

    Authors: Trung-Hieu Hoang, Duc Minh Vo, Minh N. Do

    Abstract: Current test-time adaptation (TTA) approaches aim to adapt to environments that change continuously. Yet, when the environments not only change but also recur in a correlated manner over time, such as in the case of day-night surveillance cameras, it is unclear whether the adaptability of these methods is sustained after a long run. This study aims to examine the error accumulation of TTA models w… ▽ More

    Submitted 16 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  6. arXiv:2311.13627  [pdf, other

    cs.CV cs.AI

    Vamos: Versatile Action Models for Video Understanding

    Authors: Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

    Abstract: What makes good representations for video understanding, such as anticipating future activities, or answering video-conditioned questions? While earlier approaches focus on end-to-end learning directly from video pixels, we propose to revisit text-based representations, such as general-purpose video captions, which are interpretable and can be directly consumed by large language models (LLMs). Int… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Under submission. Code and models are released at https://brown-palm.github.io/Vamos/

  7. arXiv:2310.10855  [pdf, other

    astro-ph.IM astro-ph.CO

    Generation of realistic input parameters for simulating atmospheric point-spread functions at astronomical observatories

    Authors: Claire-Alice Hébert, Joshua E. Meyers, My H. Do, Patricia R. Burchat, the LSST Dark Energy Science Collaboration

    Abstract: High-fidelity simulated astronomical images are an important tool in develo** and measuring the performance of image-processing algorithms, particularly for high precision measurements of cosmic shear -- correlated distortions of images of distant galaxies due to weak gravitational lensing caused by the large-scale mass distribution in the Universe. For unbiased measurements of cosmic shear, all… ▽ More

    Submitted 21 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 14 pages, 7 figures

  8. arXiv:2309.02688  [pdf, other

    physics.atom-ph

    High resolution spectroscopy of thulium atoms implanted in solid noble gas crystals

    Authors: Vinod Gaire, Mi Y Do, Yiting Pei, Anthony Semenova, Colin V. Parker

    Abstract: Optically active defects in solid-state systems have many applications in quantum information and sensing. However, unlike free atoms, which have fixed optical transition frequencies, the inhomogeneous broadening of the transitions in solid-state environments limit their use as identical scatterers for such applications. Here we show that crystals of argon and neon prepared in a closed-cycle cryos… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  9. arXiv:2307.16368  [pdf, other

    cs.CV

    AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

    Authors: Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

    Abstract: Can we better anticipate an actor's future actions (e.g. mix eggs) by knowing what commonly happens after his/her current action (e.g. crack eggs)? What if we also know the longer-term goal of the actor (e.g. making egg fried rice)? The long-term action anticipation (LTA) task aims to predict an actor's future behavior from video observations in the form of verb and noun sequences, and it is cruci… ▽ More

    Submitted 31 March, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: ICLR 2024 Camera Ready

  10. arXiv:2305.16316  [pdf, other

    cs.CV

    Making Vision Transformers Truly Shift-Equivariant

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh

    Abstract: For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e., not shift invariant. To address this shortcoming, we introduce novel data-adaptive designs for each of the modules in ViTs, such as tokenization, self-attention… ▽ More

    Submitted 28 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  11. arXiv:2305.07298  [pdf, ps, other

    math.PR math.NA

    Tamed-adaptive Euler-Maruyama approximation for SDEs with superlinearly growing and piecewise continuous drift, superlinearly growing and locally Hölder continuous diffusion

    Authors: Minh-Thang Do, Hoang-Long Ngo, Nhat-An Pho

    Abstract: In this paper, we consider stochastic differential equations whose drift coefficient is superlinearly growing and piece-wise continuous, and whose diffusion coefficient is superlinearly growing and locally Hölder continuous. We first prove the existence and uniqueness of the solution to such stochastic differential equations and then propose a tamed-adaptive Euler-Maruyama approximation scheme. We… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    MSC Class: 60H35; 60H10

  12. arXiv:2303.05105  [pdf, other

    cs.CV

    MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation

    Authors: Minh-Quan Le, Tam V. Nguyen, Trung-Nghia Le, Thanh-Toan Do, Minh N. Do, Minh-Triet Tran

    Abstract: Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (\eg mean of $K-$shot) for pr… ▽ More

    Submitted 21 January, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: Accepted at AAAI 2024 (oral presentation)

  13. arXiv:2302.05356  [pdf, other

    math.NA

    Approximation and Structured Prediction with Sparse Wasserstein Barycenters

    Authors: Minh-Hieu Do, Jean Feydy, Olga Mula

    Abstract: We develop a general theoretical and algorithmic framework for sparse approximation and structured prediction in $\mathcal{P}_2(Ω)$ with Wasserstein barycenters. The barycenters are sparse in the sense that they are computed from an available dictionary of measures but the approximations only involve a reduced number of atoms. We show that the best reconstruction from the class of sparse barycente… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  14. FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource Constrained Devices using Divide and Collaborative Training

    Authors: Quan Nguyen, Hieu H. Pham, Kok-Seng Wong, Phi Le Nguyen, Truong Thao Nguyen, Minh N. Do

    Abstract: We introduce FedDCT, a novel distributed learning paradigm that enables the usage of large, high-performance CNNs on resource-limited edge devices. As opposed to traditional FL approaches, which require each client to train the full-size neural network independently during each training round, the proposed FedDCT allows a cluster of several clients to collaboratively train a large deep learning mo… ▽ More

    Submitted 18 September, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Update v2: Final version as published in IEEE Transactions on Network and Service Management 2023

  15. arXiv:2210.08001  [pdf, other

    cs.CV cs.AI cs.LG

    Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

    Authors: Renan A. Rojas-Gomez, Teck-Yian Lim, Alexander G. Schwing, Minh N. Do, Raymond A. Yeh

    Abstract: We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on ima… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  16. arXiv:2210.02313  [pdf, other

    cs.CV

    Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

    Authors: Trong-Tung Nguyen, Hieu H. Pham, Phi Le Nguyen, Thanh Hung Nguyen, Minh Do

    Abstract: Classifying pill categories from real-world images is crucial for various smart healthcare applications. Although existing approaches in image classification might achieve a good performance on fixed pill categories, they fail to handle novel instances of pill categories that are frequently presented to the learning algorithm. To this end, a trivial solution is to train the model with novel classe… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted for publication in the Asian Conference on Computer Vision (ACCV 2022)

  17. arXiv:2208.09190  [pdf, other

    cs.DC

    Co-scheduling Ensembles of In Situ Workflows

    Authors: Tu Mai Anh Do, Loïc Pottier, Rafael Ferreira da Silva, Frédéric Suter, Silvina Caíno-Lores, Michela Taufer, Ewa Deelman

    Abstract: Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging, even with modern supercomputers. To overcome the timescale limitation, the simulation of a long MD trajectory is replaced by multiple short-range simulations th… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Comments: 12 pages, 5 figures, technical report

  18. Exploiting Domain Transferability for Collaborative Inter-level Domain Adaptive Object Detection

    Authors: Mirae Do, Seogkyu Jeon, Pilhyeon Lee, Kibeom Hong, Yu-seung Ma, Hyeran Byun

    Abstract: Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations. To tackle the problem, previous works focus on aligning features extracted from partial levels (e.g., image-level, instance-level, RPN-level) in a two-stage detector via adversarial training. However, individual levels in the object detection… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted to Expert Systems with Applications. The first three authors contributed equally

    Journal ref: Expert Systems with Applications 205 (2022): 117697

  19. arXiv:2207.05249  [pdf, other

    cs.CV

    Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

    Authors: Khoi-Nguyen C. Mac, Minh N. Do, Minh P. Vo

    Abstract: Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision an… ▽ More

    Submitted 14 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

  20. Towards a Comprehensive Solution for a Vision-based Digitized Neurological Examination

    Authors: Trung-Hieu Hoang, Mona Zehni, Huai** Xu, George Heintz, Christopher Zallek, Minh N. Do

    Abstract: The ability to use digitally recorded and quantified neurological exam information is important to help healthcare systems deliver better care, in-person and via telehealth, as they compensate for a growing shortage of neurologists. Current neurological digital biomarker pipelines, however, are narrowed down to a specific neurological exam component or applied for assessing specific conditions. In… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

  21. arXiv:2112.15067  [pdf, other

    cs.DC

    SIM-SITU: A Framework for the Faithful Simulation of in-situ Workflows

    Authors: Valentin Honoré, Tu Mai Anh Do, Loïc Pottier, Rafael Ferreira da Silva, Ewa Deelman, Frédéric Suter

    Abstract: The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As a result, the classical post-hoc analysis of simul… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

  22. arXiv:2111.13987  [pdf, other

    cs.LG eess.SP q-bio.QM stat.AP

    Multi-modality fusion using canonical correlation analysis methods: Application in breast cancer survival prediction from histology and genomics

    Authors: Vaishnavi Subramanian, Tanveer Syeda-Mahmood, Minh N. Do

    Abstract: The availability of multi-modality datasets provides a unique opportunity to characterize the same object of interest using multiple viewpoints more comprehensively. In this work, we investigate the use of canonical correlation analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two modalities. We study a simple graphical model for the generation of two-modality data. We analytic… ▽ More

    Submitted 27 November, 2021; originally announced November 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2111.12299  [pdf, other

    cs.LG

    EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

    Authors: Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh

    Abstract: In hardware-aware Differentiable Neural Architecture Search (DNAS), it is challenging to compute gradients of hardware metrics to perform architecture search. Existing works rely on linear approximations with limited support to customized hardware accelerators. In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully a… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: 8 pages, 5 figures

  24. arXiv:2108.06031  [pdf, other

    eess.IV eess.SP

    Multimodal Unrolled Robust PCA for Background Foreground Separation

    Authors: Spencer Markowitz, Corey Snyder, Yonina C. Eldar, Minh N. Do

    Abstract: Background foreground separation (BFS) is a popular computer vision problem where dynamic foreground objects are separated from the static background of a scene. Typically, this is performed using consumer cameras because of their low cost, human interpretability, and high resolution. Yet, cameras and the BFS algorithms that process their data have common failure modes due to lighting changes, hig… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  25. arXiv:2106.06927  [pdf, other

    cs.CV cs.LG cs.NE

    Inverting Adversarially Robust Networks for Image Synthesis

    Authors: Renan A. Rojas-Gomez, Raymond A. Yeh, Minh N. Do, Anh Nguyen

    Abstract: Despite unconditional feature inversion being the foundation of many image synthesis applications, training an inverter demands a high computational budget, large decoding capacity and imposing conditions such as autoregressive priors. To address these limitations, we propose the use of adversarially robust representations as a perceptual primitive for feature inversion. We train an adversarially… ▽ More

    Submitted 21 October, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: Accepted at the 16th Asian Conference on Computer Vision (ACCV 2022)

  26. Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

    Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  27. arXiv:2103.08010  [pdf

    cs.SE

    On the combination of static analysis for software security assessment -- a case study of an open-source e-government project

    Authors: Anh Nguyen-Duc, Manh Viet Do, Quan Luong Hong, Kiem Nguyen Khac

    Abstract: Static Application Security Testing (SAST) is a popular quality assurance technique in software engineering. However, integrating SAST tools into industry-level product development and security assessment poses various technical and managerial challenges. In this work, we reported a longitudinal case study of adopting SAST as a part of a human-driven security assessment for an open-source e-govern… ▽ More

    Submitted 23 March, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

  28. arXiv:2103.05432  [pdf, other

    cs.LG eess.SP q-bio.GN stat.AP

    Multimodal fusion using sparse CCA for breast cancer survival prediction

    Authors: Vaishnavi Subramanian, Tanveer Syeda-Mahmood, Minh N. Do

    Abstract: Effective understanding of a disease such as cancer requires fusing multiple sources of information captured across physical scales by multimodal data. In this work, we propose a novel feature embedding module that derives from canonical correlation analyses to account for intra-modality and inter-modality correlations. Experiments on simulated and real data demonstrate how our proposed module can… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: Accepted for poster presentation at International Symposium on Biomedical Imaging (ISBI) 2021. 4 pages, 1 figure, 4 tables

  29. arXiv:2012.00282  [pdf, other

    cs.CV

    FairFaceGAN: Fairness-aware Facial Image-to-Image Translation

    Authors: Sunhee Hwang, Sungho Park, Dohyung Kim, Mirae Do, Hyeran Byun

    Abstract: In this paper, we introduce FairFaceGAN, a fairness-aware facial Image-to-Image translation model, mitigating the problem of unwanted translation in protected attributes (e.g., gender, age, race) during facial attributes editing. Unlike existing models, FairFaceGAN learns fair representations with two separate latents - one related to the target attributes to translate, and the other unrelated to… ▽ More

    Submitted 2 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: The 31st British Machine Vision Conference (BMVC 2020)

  30. arXiv:2011.05097  [pdf, other

    cs.LG stat.ML

    Two-stage Training of Graph Neural Networks for Graph Classification

    Authors: Manh Tuan Do, Noseong Park, Kijung Shin

    Abstract: Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed a… ▽ More

    Submitted 8 April, 2022; v1 submitted 10 November, 2020; originally announced November 2020.

  31. arXiv:2006.07060  [pdf, other

    cs.SI physics.soc-ph

    Structural Patterns and Generative Models of Real-world Hypergraphs

    Authors: Manh Tuan Do, Se-eun Yoon, Bryan Hooi, Kijung Shin

    Abstract: Graphs have been utilized as a powerful tool to model pairwise relationships between people or objects. Such structure is a special type of a broader concept referred to as hypergraph, in which each hyperedge may consist of an arbitrary number of nodes, rather than just two. A large number of real-world datasets are of this form - for example, list of recipients of emails sent from an organization… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: to be published in the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '20)

  32. arXiv:2002.10917  [pdf, other

    quant-ph cs.AI cs.ET

    Planning for Compilation of a Quantum Algorithm for Graph Coloring

    Authors: Minh Do, Zhihui Wang, Bryan O'Gorman, Davide Venturelli, Eleanor Rieffel, Jeremy Frank

    Abstract: The problem of compiling general quantum algorithms for implementation on near-term quantum processors has been introduced to the AI community. Previous work demonstrated that temporal planning is an attractive approach for part of this compilationtask, specifically, the routing of circuits that implement the Quantum Alternating Operator Ansatz (QAOA) applied to the MaxCut problem on a quantum pro… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Comments: 8 pages, 4 tables, 5 figures

    Journal ref: The 24th European Conference on Artificial Intelligence (ECAI 2020)

  33. arXiv:2002.01982  [pdf, other

    eess.IV cs.LG eess.SP q-bio.GN

    Multimodal fusion of imaging and genomics for lung cancer recurrence prediction

    Authors: Vaishnavi Subramanian, Minh N. Do, Tanveer Syeda-Mahmood

    Abstract: Lung cancer has a high rate of recurrence in early-stage patients. Predicting the post-surgical recurrence in lung cancer patients has traditionally been approached using single modality information of genomics or radiology images. We investigate the potential of multimodal fusion for this task. By combining computed tomography (CT) images and genomics, we demonstrate improved prediction of recurr… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

    Comments: Accepted for presentation at International Symposium on Biomedical Imaging (ISBI) 2020 (Iowa City). 5 pages, last page references

  34. arXiv:1911.03603  [pdf, other

    cs.RO cs.CV

    Dense 3D Reconstruction for Visual Tunnel Inspection using Unmanned Aerial Vehicle

    Authors: Ramanpreet Singh Pahwa, Kennard Yanting Chan, Jiamin Bai, Vincensius Billy Saputra, Minh N. Do, Shaohui Foong

    Abstract: Advances in Unmanned Aerial Vehicle (UAV) opens venues for application such as tunnel inspection. Owing to its versatility to fly inside the tunnels, it can quickly identify defects and potential problems related to safety. However, long tunnels, especially with repetitive or uniform structures pose a significant problem for UAV navigation. Furthermore, post-processing visual data from the camera… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: 8 pages, 12 figures

    Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

  35. arXiv:1812.03407  [pdf, other

    cs.CV

    Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

    Authors: Thanh-Dat Truong, Chi Nhan Duong, Khoa Luu, Minh-Triet Tran, Minh Do

    Abstract: Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted or fine-tuned. Therefore, recent deep learning techniques, such as: domain adaptation, feature transferring, and fine-tuning, c… ▽ More

    Submitted 8 December, 2018; originally announced December 2018.

  36. arXiv:1811.08815  [pdf, other

    cs.CV

    Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

    Authors: Khoi-Nguyen C. Mac, Dhiraj Joshi, Raymond A. Yeh, **jun Xiong, Rogerio S. Feris, Minh N. Do

    Abstract: Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus on produc… ▽ More

    Submitted 6 November, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: Accepted at ICCV 2019 as oral

  37. arXiv:1806.10278  [pdf, other

    cs.CV

    Feature-less Stitching of Cylindrical Tunnel

    Authors: Ramanpreet Singh Pahwa, Wei Kiat Leong, Shaohui Foong, Karianto Leman, Minh N. Do

    Abstract: Traditional image stitching algorithms use transforms such as homography to combine different views of a scene. They usually work well when the scene is planar or when the camera is only rotated, kee** its position static. This severely limits their use in real world scenarios where an unmanned aerial vehicle (UAV) potentially hovers around and flies in an enclosed area while rotating to capture… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: 6 pages

  38. arXiv:1805.12301  [pdf, other

    stat.ML cs.CV cs.LG

    Rotation Equivariance and Invariance in Convolutional Neural Networks

    Authors: Benjamin Chidester, Minh N. Do, Jian Ma

    Abstract: Performance of neural networks can be significantly improved by encoding known invariance for particular tasks. Many image classification tasks, such as those related to cellular imaging, exhibit invariance to rotation. We present a novel scheme using the magnitude response of the 2D-discrete-Fourier transform (2D-DFT) to encode rotational invariance in neural networks, along with a new, efficient… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

  39. arXiv:1803.11209  [pdf, other

    cs.CV

    Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

    Authors: Raymond A. Yeh, **jun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all po… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to NIPS 2017

  40. arXiv:1803.11185  [pdf, other

    cs.CV

    Unsupervised Textual Grounding: Linking Words to Image Concepts

    Authors: Raymond A. Yeh, Minh N. Do, Alexander G. Schwing

    Abstract: Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale da… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR 2018

  41. arXiv:1803.06775  [pdf, other

    quant-ph cs.AI cs.ET eess.SY

    Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

    Authors: Kyle E. C. Booth, Minh Do, J. Christopher Beck, Eleanor Rieffel, Davide Venturelli, Jeremy Frank

    Abstract: Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary… ▽ More

    Submitted 18 March, 2018; originally announced March 2018.

    Comments: 9 pages, 2 figures, Proceedings of the 28th International Conference of Automated Planning and Scheduling 2018 (ICAPS-18)

  42. arXiv:1802.08950  [pdf, other

    eess.SP

    Multi-Segment Reconstruction Using Invariant Features

    Authors: Mona Zehni, Minh N. Do, Zhizhen Zhao

    Abstract: Multi-segment reconstruction (MSR) problem consists of recovering a signal from noisy segments with unknown positions of the observation windows. One example arises in DNA sequence assembly, which is typically solved by matching short reads to form longer sequences. Instead of trying to locate the segment within the sequence through pair-wise matching, we propose a new approach that uses shift-inv… ▽ More

    Submitted 24 February, 2018; originally announced February 2018.

    Comments: 5 pages, 3 figures

  43. arXiv:1802.08910  [pdf, other

    eess.SP eess.IV q-bio.CB q-bio.QM stat.AP

    Correlating Cellular Features with Gene Expression using CCA

    Authors: Vaishnavi Subramanian, Benjamin Chidester, Jian Ma, Minh N. Do

    Abstract: To understand the biology of cancer, joint analysis of multiple data modalities, including imaging and genomics, is crucial. The involved nature of gene-microenvironment interactions necessitates the use of algorithms which treat both data types equally. We propose the use of canonical correlation analysis (CCA) and a sparse variant as a preliminary discovery tool for identifying connections acros… ▽ More

    Submitted 24 February, 2018; originally announced February 2018.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging (ISBI) 2018

  44. Tracking objects using 3D object proposals

    Authors: Ramanpreet Singh Pahwa, Tian Tsong Ng, Minh N. Do

    Abstract: 3D object proposals, quickly detected regions in a 3D scene that likely contain an object of interest, are an effective approach to improve the computational efficiency and accuracy of the object detection framework. In this work, we propose a novel online method that uses our previously developed 3D object proposals, in a RGB-D video sequence, to match and track static objects in the scene using… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

    Comments: 4 pages, 4 figures, published in APSIPA 2017

    Journal ref: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

  45. Locating 3D Object Proposals: A Depth-Based Online Approach

    Authors: Ramanpreet Singh Pahwa, Jiangbo Lu, Nianjuan Jiang, Tian Tsong Ng, Minh N. Do

    Abstract: 2D object proposals, quickly detected regions in an image that likely contain an object of interest, are an effective approach for improving the computational efficiency and accuracy of object detection in color images. In this work, we propose a novel online method that generates 3D object proposals in a RGB-D video sequence. Our main observation is that depth images provide important information… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

    Comments: 14 pages, 12 figures, journal

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2016

  46. Calibration of depth cameras using denoised depth images

    Authors: Ramanpreet Singh Pahwa, Minh N. Do, Tian Tsong Ng, Binh-Son Hua

    Abstract: Depth sensing devices have created various new applications in scientific and commercial research with the advent of Microsoft Kinect and PMD (Photon Mixing Device) cameras. Most of these applications require the depth cameras to be pre-calibrated. However, traditional calibration methods using a checkerboard do not work very well for depth cameras due to the low image resolution. In this paper, w… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

    Comments: 5 pages, 3 figures, conference

    Journal ref: 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 3459-3463

  47. arXiv:1705.08927  [pdf, other

    quant-ph cs.AI cs.ET eess.SY

    Compiling quantum circuits to realistic hardware architectures using temporal planners

    Authors: Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank

    Abstract: To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quant… ▽ More

    Submitted 21 December, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

    Comments: updated manuscript, more planners and results

    Journal ref: 2017 Quantum Sci. Technol. - also related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop 2017

  48. arXiv:1609.04541  [pdf, other

    stat.ML cs.CV cs.DS

    Matrix Product State for Higher-Order Tensor Compression and Classification

    Authors: Johann A. Bengua, Ho N. Phien, Hoang D. Tuan, Minh N. Do

    Abstract: This paper introduces matrix product state (MPS) decomposition as a new and systematic method to compress multidimensional data represented by higher-order tensors. It solves two major bottlenecks in tensor compression: computation and compression quality. Regardless of tensor order, MPS compresses tensors to matrices of moderate dimension which can be used for classification. Mainly based on a su… ▽ More

    Submitted 15 September, 2016; originally announced September 2016.

    Comments: 12 pages, 4 figures

  49. arXiv:1607.07539  [pdf, other

    cs.CV

    Semantic Image Inpainting with Deep Generative Models

    Authors: Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do

    Abstract: Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditionin… ▽ More

    Submitted 13 July, 2017; v1 submitted 26 July, 2016; originally announced July 2016.

  50. arXiv:1607.03967  [pdf, other

    cs.LG cs.CV cs.DS

    Concatenated image completion via tensor augmentation and completion

    Authors: Johann A. Bengua, Hoang D. Tuan, Ho N. Phien, Minh N. Do

    Abstract: This paper proposes a novel framework called concatenated image completion via tensor augmentation and completion (ICTAC), which recovers missing entries of color images with high accuracy. Typical images are second- or third-order tensors (2D/3D) depending if they are grayscale or color, hence tensor completion algorithms are ideal for their recovery. The proposed framework performs image complet… ▽ More

    Submitted 13 July, 2016; originally announced July 2016.

    Comments: 7 pages, 6 figures, submitted to ICSPCS 2016