Skip to main content

Showing 1–40 of 40 results for author: Vo, M

.
  1. arXiv:2312.06797  [pdf, other

    cs.CV

    Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input

    Authors: Trung-Hieu Hoang, Mona Zehni, Huy Phan, Duc Minh Vo, Minh N. Do

    Abstract: Despite the promising performance of current 3D human pose estimation techniques, understanding and enhancing their generalization on challenging in-the-wild videos remain an open problem. In this work, we focus on the robustness of 2D-to-3D pose lifters. To this end, we develop two benchmark datasets, namely Human3.6M-C and HumanEva-I-C, to examine the robustness of video-based 3D pose lifters to… ▽ More

    Submitted 15 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  2. arXiv:2311.18193  [pdf, other

    cs.CV

    Persistent Test-time Adaptation in Episodic Testing Scenarios

    Authors: Trung-Hieu Hoang, Duc Minh Vo, Minh N. Do

    Abstract: Current test-time adaptation (TTA) approaches aim to adapt to environments that change continuously. Yet, when the environments not only change but also recur in a correlated manner over time, such as in the case of day-night surveillance cameras, it is unclear whether the adaptability of these methods is sustained after a long run. This study aims to examine the error accumulation of TTA models w… ▽ More

    Submitted 16 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  3. arXiv:2311.15879  [pdf, other

    cs.CV

    EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

    Authors: Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-aug… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  4. arXiv:2311.12897  [pdf, other

    cs.GR

    A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

    Authors: Kai Katsumata, Duc Minh Vo, Hideki Nakayama

    Abstract: 3D Gaussian Splatting (3DGS) has shown remarkable success in synthesizing novel views given multiple views of a static scene. Yet, 3DGS faces challenges when applied to dynamic scenes because 3D Gaussian parameters need to be updated per timestep, requiring a large amount of memory and at least a dozen observations per timestep. To address these limitations, we present a compact dynamic 3D Gaussia… ▽ More

    Submitted 4 July, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 17 pages, 11 figures, ECCV 2024

  5. arXiv:2310.14602  [pdf, ps, other

    cs.CL

    Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering

    Authors: Tam Minh Vo, Khiem Vinh Tran

    Abstract: Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape co… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  6. arXiv:2308.10005  [pdf, other

    cs.CV

    Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts

    Authors: Jiaxuan Li, Duc Minh Vo, Hideki Nakayama

    Abstract: Bias mitigation in image classification has been widely researched, and existing methods have yielded notable results. However, most of these methods implicitly assume that a given image contains only one type of known or unknown bias, failing to consider the complexities of real-world biases. We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless o… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  7. arXiv:2307.08995  [pdf, other

    cs.CV

    Revisiting Latent Space of GAN Inversion for Real Image Editing

    Authors: Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

    Abstract: The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images while maint… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 10 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2306.00241

  8. arXiv:2307.08319  [pdf, other

    cs.CV

    Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data

    Authors: Kai Katsumata, Duc Minh Vo, Tatsuya Harada, Hideki Nakayama

    Abstract: Label-noise or curated unlabeled data is used to compensate for the assumption of clean labeled data in training the conditional generative adversarial network; however, satisfying such an extended assumption is occasionally laborious or impractical. As a step towards generative modeling accessible to everyone, we introduce a novel conditional image generation framework that accepts noisy-labeled… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 10 pages, 13 figures

  9. arXiv:2307.05058  [pdf, ps, other

    math.CO

    Multi-parameter Szemerédi-Trotter-type theorems and applications in finite fields

    Authors: Hung Le, Steven Senger, Minh-Quan Vo

    Abstract: We prove some novel multi-parameter point-line incidence estimates in vector spaces over finite fields. While these could be seen as special cases of higher-dimensional incidence results, they outperform their more general counterparts in those contexts. We go on to present a number of applications to illustrate their use in combinatorial problems from geometry and number theory.

    Submitted 7 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    MSC Class: 51A20

  10. arXiv:2306.00241  [pdf, other

    cs.CV

    Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

    Authors: Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

    Abstract: The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and $\mathcal{Z}^+$ and integrate them into seminal GAN inversion methods to improve editing quality. Besides faithful r… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 5 pages, 9 figures, AI4CC Workshop at CVPR 2023

  11. arXiv:2305.16487  [pdf, other

    cs.CV cs.AI

    EgoHumans: An Egocentric 3D Multi-Human Benchmark

    Authors: Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani

    Abstract: We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocen… ▽ More

    Submitted 18 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to ICCV 2023 (Oral)

  12. Multipole Expansion for the Electron-Nucleus Scattering at High Energies in the Unified Electroweak Theory

    Authors: Z. P. Luong, M. T. Vo

    Abstract: The paper presents the multipole expansion for the electron-nucleus scattering cross section at high energies within the framework of the unified electroweak theory. The electroweak currents of the nucleus are expanded into the simple components with definite angular momenta, called the multipole form factors. The multipole expansion of the cross section is a consequence of the above expansion. Be… ▽ More

    Submitted 5 August, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: 8 pages, 2 figures

  13. arXiv:2304.06602  [pdf, other

    cs.CV

    A-CAP: Anticipation Captioning with Commonsense Knowledge

    Authors: Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time. In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonse… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  14. arXiv:2303.10229  [pdf, other

    math.CO math.MG

    Distinct Distances in $R^3$ Between Quadratic and Orthogonal Curves

    Authors: Toby Aldape, **gyi Liu, Gregory Pylypovych, Adam Sheffer, Minh-Quan Vo

    Abstract: We study the minimum number of distinct distances between point sets on two curves in $R^3$. Assume that one curve contains $m$ points and the other $n$ points. Our main results: (a) When the curves are conic sections, we characterize all cases where the number of distances is $O(m+n)$. This includes new constructions for points on two parabolas, two ellipses, and one ellipse and one hyperbola.… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

  15. arXiv:2211.13470  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient Zero-shot Visual Search via Target and Context-aware Transformer

    Authors: Zhiwei Ding, Xuezhe Ren, Erwan David, Melissa Vo, Gabriel Kreiman, Mengmi Zhang

    Abstract: Visual search is a ubiquitous challenge in natural vision, including daily tasks such as finding a friend in a crowd or searching for a car in a parking lot. Human rely heavily on relevant target features to perform goal-directed visual search. Meanwhile, context is of critical importance for locating a target object in complex scenes as it helps narrow down the search area and makes the search pr… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  16. arXiv:2210.08459  [pdf, other

    cs.CL

    StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

    Authors: Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama

    Abstract: Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel \textbf{Story} \textbf{E}valuation method that mimics human preference when judging a story, namely \textbf{StoryER}, which consists of three sub-tasks: \textbf{R}anking, \textbf{R}ating and \textbf{R}easoning. Given eith… ▽ More

    Submitted 21 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: accepted by EMNLP 2022

  17. arXiv:2209.07629  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

    Authors: Dang-Linh Trinh, Minh-Cong Vo, Guee-Sang Lee

    Abstract: The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop \& Competition. Our proposed method contains three stages. Firstly, we extract the latent features from the raw audio signal and its Mel-spectrogram by self-supervised learning methods. Then, the features from the raw signal are fed to… ▽ More

    Submitted 26 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  18. Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese

    Authors: Thao T. B. Nguyen, Tam M. Vo, Thang V. Nguyen, Hieu H. Pham, Ha Q. Nguyen

    Abstract: We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: This work has been provisionally accepted for publication by Plos One journal

  19. arXiv:2207.05249  [pdf, other

    cs.CV

    Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

    Authors: Khoi-Nguyen C. Mac, Minh N. Do, Minh P. Vo

    Abstract: Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision an… ▽ More

    Submitted 14 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

  20. arXiv:2207.04320  [pdf, other

    cs.CV

    Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

    Authors: Shihao Zou, Yuanlu Xu, Chao Li, Lingni Ma, Li Cheng, Minh Vo

    Abstract: Multi-person pose understanding from RGB videos involves three complex tasks: pose estimation, tracking and motion forecasting. Intuitively, accurate multi-person pose estimation facilitates robust tracking, and robust tracking builds crucial history for correct motion forecasting. Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separate… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

  21. arXiv:2206.08929  [pdf, other

    cs.CV cs.AI

    TAVA: Template-free Animatable Volumetric Actors

    Authors: Ruilong Li, Julian Tanke, Minh Vo, Michael Zollhofer, Jurgen Gall, Angjoo Kanazawa, Christoph Lassner

    Abstract: Coordinate-based volumetric representations have the potential to generate photo-realistic virtual avatars from images. However, virtual avatars also need to be controllable even to a novel pose that may not have been observed. Traditional techniques, such as LBS, provide such a function; yet it usually requires a hand-designed body template, 3D scan data, and limited appearance models. On the oth… ▽ More

    Submitted 20 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/facebookresearch/tava; Project Website: https://www.liruilong.cn/projects/tava/

  22. arXiv:2204.14249  [pdf, other

    cs.CV

    OSSGAN: Open-Set Semi-Supervised Image Generation

    Authors: Kai Katsumata, Duc Minh Vo, Hideki Nakayama

    Abstract: We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. Unlike the existing semi-superv… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022

  23. arXiv:2204.01695  [pdf, other

    cs.CV

    LISA: Learning Implicit Shape and Appearance of Hands

    Authors: Enric Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris Sweeney, Richard Newcombe, Lingni Ma

    Abstract: This paper proposes a do-it-all neural model of human hands, named LISA. The model can capture accurate hand shape and appearance, generalize to arbitrary hand subjects, provide dense surface correspondences, be reconstructed from images in the wild and easily animated. We train LISA by minimizing the shape and appearance losses on a large set of multi-view RGB image sequences annotated with coars… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Published at CVPR 2022

  24. arXiv:2203.14499  [pdf, other

    cs.CV

    NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

    Authors: Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Novel object captioning aims at describing objects absent from training data, with the key ingredient being the provision of object vocabulary to the model. Although existing methods heavily rely on an object detection model, we view the detection step as vocabulary retrieval from an external knowledge in the form of embeddings for any object's definition from Wiktionary, where we use in the retri… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  25. arXiv:2203.08456  [pdf, other

    cs.CV

    PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression

    Authors: Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

    Abstract: We push forward neural network compression research by exploiting a novel challenging task of large-scale conditional generative adversarial networks (GANs) compression. To this end, we propose a gradually shrinking GAN (PPCD-GAN) by introducing progressive pruning residual block (PP-Res) and class-aware distillation. The PP-Res is an extension of the conventional residual block where each convolu… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: accepted at WACV 2022

  26. arXiv:2202.10753  [pdf, other

    cs.CV cs.AI cs.LG eess.IV physics.data-an

    Convolutional Neural Network Modelling for MODIS Land Surface Temperature Super-Resolution

    Authors: Binh Minh Nguyen, Ganglin Tian, Minh-Triet Vo, Aurélie Michel, Thomas Corpetti, Carlos Granero-Belinchon

    Abstract: Nowadays, thermal infrared satellite remote sensors enable to extract very interesting information at large scale, in particular Land Surface Temperature (LST). However such data are limited in spatial and/or temporal resolutions which prevents from an analysis at fine scales. For example, MODIS satellite provides daily acquisitions with 1Km spatial resolutions which is not sufficient to deal with… ▽ More

    Submitted 1 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

  27. arXiv:2112.12761  [pdf, other

    cs.CV cs.GR

    BANMo: Building Animatable 3D Neural Models from Many Casual Videos

    Authors: Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

    Abstract: Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articul… ▽ More

    Submitted 3 April, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 camera-ready version (last update: May 2022)

  28. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  29. arXiv:2108.10165  [pdf, other

    cs.CV

    ODAM: Object Detection, Association, and Map** using Posed RGB Video

    Authors: Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe

    Abstract: Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Map** using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them t… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Accepted in ICCV 2021 as oral

  30. arXiv:2104.07267  [pdf, other

    cs.CV

    ContactOpt: Optimizing Contact to Improve Grasps

    Authors: Patrick Grady, Chengcheng Tang, Christopher D. Twigg, Minh Vo, Samarth Brahmbhatt, Charles C. Kemp

    Abstract: Physical contact between hands and objects plays a critical role in human grasps. We show that optimizing the pose of a hand to achieve expected contact with an object can improve hand poses inferred via image-based methods. Given a hand mesh and an object mesh, a deep model trained on ground truth contact data infers desirable contact across the surfaces of the meshes. Then, ContactOpt efficientl… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2021

  31. arXiv:2012.12890  [pdf, other

    cs.CV

    ANR: Articulated Neural Rendering for Virtual Avatars

    Authors: Amit Raj, Julian Tanke, James Hays, Minh Vo, Carsten Stoll, Christoph Lassner

    Abstract: The combination of traditional rendering with neural networks in Deferred Neural Rendering (DNR) provides a compelling balance between computational complexity and realism of the resulting images. Using skinned meshes for rendering articulating objects is a natural extension for the DNR framework and would open it up to a plethora of applications. However, in this case the neural shading step must… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

  32. arXiv:2008.00158  [pdf, ps, other

    cs.CV

    TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

    Authors: Tiancheng Zhi, Christoph Lassner, Tony Tung, Carsten Stoll, Srinivasa G. Narasimhan, Minh Vo

    Abstract: We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. TexMesh enables high quality free-viewpoint rendering of humans. Given the RGB frames, the captured environment map, and the coarse per-frame human mesh from RGB-D tracking, our method reconstructs spatiotemporally consistent and detailed per-frame meshes along with a… ▽ More

    Submitted 20 September, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: ECCV 2020

  33. arXiv:2007.12806  [pdf, other

    cs.CV

    Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the Wild

    Authors: Minh Vo, Yaser Sheikh, Srinivasa G. Narasimhan

    Abstract: Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly opt… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: Accepted to IEEE TPAMI

  34. arXiv:2007.03672  [pdf, other

    cs.CV

    Long-term Human Motion Prediction with Scene Context

    Authors: Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, Jitendra Malik

    Abstract: Human movement is goal-directed and influenced by the spatial layout of the objects in the scene. To plan future human motion, it is crucial to perceive the environment -- imagine how hard it is to navigate a new room with lights off. Existing works on predicting human motion do not pay attention to the scene context and thus struggle in long-term prediction. In this work, we propose a novel three… ▽ More

    Submitted 31 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: ECCV 2020 Oral. Dataset & Code: https://github.com/ZheC/GTA-IM-Dataset Video: https://people.eecs.berkeley.edu/~zhecao/hmp/index.html

  35. arXiv:2005.13532  [pdf, other

    cs.CV cs.GR

    4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

    Authors: Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, Srinivasa Narasimhan

    Abstract: We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras. Key to our approach is the use of self-supervised neural networks specific to the scene to compose static and dynamic aspects of an event. Though captured from discrete viewpoints, this model enables us to move around the space-time of the event continuously. This… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Project Page - http://www.cs.cmu.edu/~aayushb/Open4D/

  36. Two-Stream FCNs to Balance Content and Style for Style Transfer

    Authors: Duc Minh Vo, Akihiro Sugimoto

    Abstract: Style transfer is to render given image contents in given styles, and it has an important role in both computer vision fundamental research and industrial applications. Following the success of deep learning based approaches, this problem has been re-launched recently, but still remains a difficult task because of trade-off between preserving contents and faithful rendering of styles. Indeed, how… ▽ More

    Submitted 7 May, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: published in Machine Vision and Applications

  37. arXiv:1908.01741  [pdf, other

    cs.CV

    Visual-Relation Conscious Image Generation from Structured-Text

    Authors: Duc Minh Vo, Akihiro Sugimoto

    Abstract: We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs. Our visual-relation layout module uses relations among entities in the structured-text in two ways: comprehensive usage and individual usage. We comprehensively use all available relations together to localize initial bou… ▽ More

    Submitted 18 July, 2020; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: accepted at ECCV 2020

  38. Self-supervised Multi-view Person Association and Its Applications

    Authors: Minh Vo, Ersin Yumer, Kalyan Sunkavalli, Sunil Hadap, Yaser Sheikh, Srinivasa Narasimhan

    Abstract: Reliable markerless motion tracking of people participating in a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework… ▽ More

    Submitted 18 April, 2020; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Accepted to IEEE TPAMI

  39. arXiv:1704.05314  [pdf, ps, other

    math.AP

    The local backward heat problem

    Authors: Thi Minh Nhat Vo

    Abstract: In this paper, we study the local backward problem of a linear heat equation with time-dependent coefficients under the Dirichlet boundary condition. Precisely, we recover the initial data from the observation on a subdomain at some later time. Thanks to the "optimal filtering" method of Seidman, we can solve the global backward problem, which determines the solution at initial time from the known… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.

  40. The Drift Chambers Of The Nomad Experiment

    Authors: M. Anfreville, P. Astier, M. Authier, A. Baldisseri, M. Banner, N. Besson, J. Bouchez, A. Castera, O. Cloue, J. Dumarchez, L. Dumps, E. Gangler, J. Gosset, C. Hagner, C. Jollec, C. Lachaud, A. Letessier, J. M. Levy, L. Linssen, J. P. Meyer, J. P. Ouriet, J. P. Passerieux, T. Pedrol, A. Placci, J. Poinsignon , et al. (8 additional authors not shown)

    Abstract: We present a detailed description of the drift chambers used as an active target and a tracking device in the NOMAD experiment at CERN. The main characteristics of these chambers are a large area, a self supporting structure made of light composite materials and a low cost. A spatial resolution of 150 microns has been achieved with a single hit efficiency of 97%.

    Submitted 8 June, 2001; v1 submitted 9 April, 2001; originally announced April 2001.

    Comments: 42 pages, 26 figures

    Report number: LPNHE-01-01

    Journal ref: Nucl.Instrum.Meth. A481 (2002) 339-364