Skip to main content

Showing 1–45 of 45 results for author: Hoai, M

.
  1. arXiv:2406.02774  [pdf, other

    cs.CV

    Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

    Authors: Qiaomu Miao, Alexandros Graikos, **gwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras

    Abstract: Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process. We propose the first semi-supervised method for gaze following by introducing two novel priors to the task. We obtain the first prior using a large pretrained Visual Question Answering (VQA) model, where we compute Grad-… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2404.13819  [pdf, other

    cs.CV

    HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild

    Authors: Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang, Minh Hoai

    Abstract: We address the challenging task of identifying, segmenting, and tracking hand-held objects, which is crucial for applications such as human action segmentation and performance evaluation. This task is particularly challenging due to heavy occlusion, rapid motion, and the transitory nature of objects being hand-held, where an object may be held, released, and subsequently picked up again. To tackle… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2404.07122  [pdf, other

    cs.CV

    Driver Attention Tracking and Analysis

    Authors: Dat Viet Thanh Nguyen, Anh Tran, Hoai Nam Vu, Cuong Pham, Minh Hoai

    Abstract: We propose a novel method to estimate a driver's points-of-gaze using a pair of ordinary cameras mounted on the windshield and dashboard of a car. This is a challenging problem due to the dynamics of traffic environments with 3D scenes of unknown depths. This problem is further complicated by the volatile distance between the driver and the camera system. To tackle these challenges, we develop a n… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2403.16205  [pdf, other

    cs.CV

    Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains

    Authors: Bang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang Nguyen, Minh Hoai

    Abstract: This paper presents an innovative framework designed to train an image deblurring algorithm tailored to a specific camera device. This algorithm works by transforming a blurry input image, which is challenging to deblur, into another blurry image that is more amenable to deblurring. The transformation process, from one blurry state to another, leverages unpaired data consisting of sharp and blurry… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  5. arXiv:2403.01693  [pdf, other

    cs.CV cs.AI

    HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

    Authors: Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai

    Abstract: Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. To generate images with realistic hands, we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings… ▽ More

    Submitted 21 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Revisions: 1. Added a link to project page in the abstract, 2. Updated references and related work, 3. Fixed some grammatical errors

  6. arXiv:2312.17330  [pdf, other

    cs.CV cs.AI

    Count What You Want: Exemplar Identification and Few-shot Counting of Human Actions in the Wild

    Authors: Yifeng Huang, Duc Duy Nguyen, Lam Nguyen, Cuong Pham, Minh Hoai

    Abstract: This paper addresses the task of counting human actions of interest using sensor data from wearable devices. We propose a novel exemplar-based framework, allowing users to provide exemplars of the actions they want to count by vocalizing predefined sounds ''one'', ''two'', and ''three''. Our method first localizes temporal positions of these utterances from the audio sequence. These positions serv… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  7. arXiv:2309.05277  [pdf, other

    cs.CV

    Interactive Class-Agnostic Object Counting

    Authors: Yifeng Huang, Viresh Ranjan, Minh Hoai

    Abstract: We propose a novel framework for interactive class-agnostic object counting, where a human user can interactively provide feedback to improve the accuracy of a counter. Our framework consists of two main components: a user-friendly visualizer to gather feedback and an efficient mechanism to incorporate it. In each iteration, we produce a density map to show the current prediction result, and we se… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  8. arXiv:2304.01686  [pdf, other

    cs.CV cs.AI

    HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering

    Authors: Bang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang Nguyen, Minh Hoai

    Abstract: We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervi… ▽ More

    Submitted 5 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  9. arXiv:2303.15274  [pdf, other

    cs.CV

    Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

    Authors: Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

    Abstract: Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target… ▽ More

    Submitted 2 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  10. arXiv:2303.09383  [pdf, other

    cs.CV cs.AI

    Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

    Authors: Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

    Abstract: Most models of visual attention aim at predicting either top-down or bottom-up control, as studied using different visual search and free-viewing tasks. In this paper we propose the Human Attention Transformer (HAT), a single model that predicts both forms of attention control. HAT uses a novel transformer-based architecture and a simplified foveated retina that collectively create a spatio-tempor… ▽ More

    Submitted 30 March, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  11. arXiv:2211.11062  [pdf, other

    cs.CV

    Patch-level Gaze Distribution Prediction for Gaze Following

    Authors: Qiaomu Miao, Minh Hoai, Dimitris Samaras

    Abstract: Gaze following aims to predict where a person is looking in a scene, by predicting the target location, or indicating that the target is located outside the image. Recent works detect the gaze target by training a heatmap regression task with a pixel-wise mean-square error (MSE) loss, while formulating the in/out prediction task as a binary classification task. This training formulation puts a str… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

    Comments: Accepted to WACV 2023

  12. arXiv:2210.15904  [pdf, other

    cs.CV cs.AI cs.GR

    Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

    Authors: Bach Tran, Binh-Son Hua, Anh Tuan Tran, Minh Hoai

    Abstract: Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utiliz… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: ACCV 2022 paper. 14 pages of content, 4 pages of references, 6 pages of supplementary material

  13. arXiv:2210.05991  [pdf, other

    cs.CV cs.CL cs.LG

    Text-Derived Knowledge Helps Vision: A Simple Cross-modal Distillation for Video-based Action Anticipation

    Authors: Sayontan Ghosh, Tanvi Aggarwal, Minh Hoai, Niranjan Balasubramanian

    Abstract: Anticipating future actions in a video is useful for many autonomous and assistive technologies. Most prior action anticipation work treat this as a vision modality problem, where the models learn the task information primarily from the video features in the action anticipation datasets. However, knowledge about action sequences can also be obtained from external textual data. In this work, we sho… ▽ More

    Submitted 21 February, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

  14. arXiv:2207.10988  [pdf, other

    cs.CV

    Few-shot Object Counting and Detection

    Authors: Thanh Nguyen, Chau Pham, Khoi Nguyen, Minh Hoai

    Abstract: We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. To address this challenging problem, we introduce a novel… ▽ More

    Submitted 28 July, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022; The first two authors contribute equally

  15. arXiv:2207.01166  [pdf, other

    cs.CV cs.AI

    Target-absent Human Attention

    Authors: Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

    Abstract: The prediction of human gaze behavior is important for building human-computer interactive systems that can anticipate a user's attention. Computer vision models have been developed to predict the fixations made by people as they search for target objects. But what about when the image has no target? Equally important is to know how people search when they cannot find a target, and when they would… ▽ More

    Submitted 1 November, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV2022

  16. arXiv:2205.14212  [pdf, other

    cs.CV

    Exemplar Free Class Agnostic Counting

    Authors: Viresh Ranjan, Minh Hoai

    Abstract: We tackle the task of Class Agnostic Counting, which aims to count objects in a novel object category at test time without any access to labeled training data for that category. All previous class agnostic counting methods cannot work in a fully automated setting, and require computationally expensive test time adaptation. To address these challenges, we propose a visual counter which operates in… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  17. arXiv:2109.02288  [pdf, other

    cs.CV

    Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images

    Authors: Long-Nhat Ho, Anh Tuan Tran, Quynh Phung, Minh Hoai

    Abstract: Recovering the 3D structure of an object from a single image is a challenging task due to its ill-posed nature. One approach is to utilize the plentiful photos of the same object category to learn a strong 3D shape prior for the object. This approach has successfully been demonstrated by a recent work of Wu et al. (2020), which obtained impressive 3D reconstruction networks with unsupervised learn… ▽ More

    Submitted 7 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted to the main ICCV 2021 conference

  18. arXiv:2104.08391  [pdf, other

    cs.CV

    Learning To Count Everything

    Authors: Viresh Ranjan, Udbhav Sharma, Thu Nguyen, Minh Hoai

    Abstract: Existing works on visual counting primarily focus on one specific category at a time, such as people, animals, and cells. In this paper, we are interested in counting everything, that is to count objects from any category given only a few annotated instances from that category. To this end, we pose counting as a few-shot regression task. To tackle this task, we present a novel method that takes a… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  19. arXiv:2104.03778  [pdf, other

    cs.CV cs.AI

    Progressive Semantic Segmentation

    Authors: Chuong Huynh, Anh Tran, Khoa Luu, Minh Hoai

    Abstract: The objective of this work is to segment high-resolution images without overloading GPU memory usage or losing the fine details in the output segmentation map. The memory constraint means that we must either downsample the big image or divide the image into local patches for separate processing. However, the former approach would lose the fine details, while the latter can be ambiguous due to the… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR'21

  20. arXiv:2104.01867  [pdf, other

    cs.CV

    Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer

    Authors: Thao Nguyen, Anh Tran, Minh Hoai

    Abstract: Makeup transfer is the task of applying on a source face the makeup style from a reference image. Real-life makeups are diverse and wild, which cover not only color-changing but also patterns, such as stickers, blushes, and jewelries. However, existing works overlooked the latter components and confined makeup transfer to color manipulation, focusing only on light makeup styles. In this work, we p… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR'21

  21. arXiv:2104.00317  [pdf, other

    cs.CV cs.AI

    Explore Image Deblurring via Blur Kernel Space

    Authors: Phong Tran, Anh Tran, Quynh Phung, Minh Hoai

    Abstract: This paper introduces a method to encode the blur operators of an arbitrary dataset of sharp-blur image pairs into a blur kernel space. Assuming the encoded kernel space is close enough to in-the-wild blur operators, we propose an alternating optimization algorithm for blind image deblurring. It approximates an unseen blur operator by a kernel in the encoded space and searches for the correspondin… ▽ More

    Submitted 3 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR'21

  22. arXiv:2103.00871  [pdf, other

    cs.CV

    FineNet: Frame Interpolation and Enhancement for Face Video Deblurring

    Authors: Phong Tran, Anh Tran, Thao Nguyen, Minh Hoai

    Abstract: The objective of this work is to deblur face videos. We propose a method that tackles this problem from two directions: (1) enhancing the blurry frames, and (2) treating the blurry frames as missing values and estimate them by interpolation. These approaches are complementary to each other, and their combination outperforms individual ones. We also introduce a novel module that leverages the struc… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  23. arXiv:2012.12482  [pdf, other

    cs.CV

    Localization in the Crowd with Topological Constraints

    Authors: Shahira Abousamra, Minh Hoai, Dimitris Samaras, Chao Chen

    Abstract: We address the problem of crowd localization, i.e., the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i.e., predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topologica… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: AAAI 2021

  24. arXiv:2011.08543  [pdf, other

    cs.LG cs.CL cs.CV

    Structural and Functional Decomposition for Personality Image Captioning in a Communication Game

    Authors: Thu Nguyen, Duy Phung, Minh Hoai, Thien Huu Nguyen

    Abstract: Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait. In this work, we introduce a novel formulation for PIC based on a communication game between a speaker and a listener. The speaker attempts to generate natural language captions while the listener encourages the generated captions to contain discriminative information about the i… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: 10 pages, EMNLP-Findings 2020

    Journal ref: EMNLP-Findings 2020

  25. arXiv:2010.09676  [pdf, other

    cs.CV

    Detecting Hands and Recognizing Physical Contact in the Wild

    Authors: Supreeth Narasimhaswamy, Trung Nguyen, Minh Hoai

    Abstract: We investigate a new problem of detecting hands and recognizing their physical contact state in unconstrained conditions. This is a challenging inference task given the need to reason beyond the local appearance of hands. The lack of training annotations indicating which object or parts of an object the hand is in contact with further complicates the task. We propose a novel convolutional network… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  26. arXiv:2009.14411  [pdf, other

    cs.CV

    Uncertainty Estimation and Sample Selection for Crowd Counting

    Authors: Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai

    Abstract: We present a method for image-based crowd counting, one that can predict a crowd density map together with the uncertainty values pertaining to the predicted density map. To obtain prediction uncertainty, we model the crowd density values using Gaussian distributions and develop a convolutional neural network architecture to predict these distributions. A key advantage of our method over existing… ▽ More

    Submitted 4 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: ACCV 2020

  27. arXiv:2009.13077  [pdf, other

    cs.CV

    Distribution Matching for Crowd Counting

    Authors: Boyu Wang, Huidong Liu, Dimitris Samaras, Minh Hoai

    Abstract: In crowd counting, each training image contains multiple people, where each person is annotated by a dot. Existing crowd counting methods need to use a Gaussian to smooth each annotated dot or to estimate the likelihood of every pixel given the annotated point. In this paper, we show that imposing Gaussians to annotations hurts generalization performance. Instead, we propose to use Distribution Ma… ▽ More

    Submitted 25 October, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2020

  28. arXiv:2009.06502  [pdf, other

    cs.CV

    A Study of Human Gaze Behavior During Visual Crowd Counting

    Authors: Raji Annadi, Yupei Chen, Viresh Ranjan, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

    Abstract: In this paper, we describe our study on how humans allocate their attention during visual crowd counting. Using an eye tracker, we collect gaze behavior of human participants who are tasked with counting the number of people in crowd images. Analyzing the collected gaze behavior of ten human participants on thirty crowd images, we observe some common approaches for visual counting. For an image of… ▽ More

    Submitted 27 September, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

  29. arXiv:2009.02256  [pdf, other

    cs.CV cs.GR cs.LG

    Interactive Visual Study of Multiple Attributes Learning Model of X-Ray Scattering Images

    Authors: Xinyi Huang, Suphanut Jamonnak, Ye Zhao, Boyu Wang, Minh Hoai, Kevin Yager, Wei Xu

    Abstract: Existing interactive visualization tools for deep learning are mostly applied to the training, debugging, and refinement of neural network models working on natural images. However, visual analytics tools are lacking for the specific application of x-ray image classification with multiple structural attributes. In this paper, we present an interactive system for domain scientists to visually study… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: IEEE SciVis Conference 2020

    Journal ref: IEEE Transactions on Visualization & Computer Graphics 2020

  30. arXiv:2005.14310  [pdf, other

    cs.CV

    Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning

    Authors: Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai

    Abstract: Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) mo… ▽ More

    Submitted 25 June, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 16 pages, 13 figures, CVPR 2020

  31. arXiv:2001.11921  [pdf, other

    cs.CV

    Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning

    Authors: Gregory J. Zelinsky, Yupei Chen, Seoyoung Ahn, Hossein Adeli, Zhibo Yang, Lihan Huang, Dimitrios Samaras, Minh Hoai

    Abstract: Understanding how goal states control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used thi… ▽ More

    Submitted 31 January, 2020; originally announced January 2020.

  32. arXiv:1910.04357  [pdf, other

    cs.LG cs.CV cs.HC eess.IV stat.ML

    Visual Understanding of Multiple Attributes Learning Model of X-Ray Scattering Images

    Authors: Xinyi Huang, Suphanut Jamonnak, Ye Zhao, Boyu Wang, Minh Hoai, Kevin Yager, Wei Xu

    Abstract: This extended abstract presents a visualization system, which is designed for domain scientists to visually understand their deep learning model of extracting multiple attributes in x-ray scattering images. The system focuses on studying the model behaviors related to multiple structural attributes. It allows users to explore the images in the feature space, the classification output of different… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: 5 pages, 2 figures, ICCV conference co-held XAIC workshop 2019

  33. arXiv:1904.05410  [pdf, other

    cs.CV

    Attentive Action and Context Factorization

    Authors: Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai

    Abstract: We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual elements. To address this challenge, we utilize conjugate samples of human actions, which are video clips that are contextually similar to human action samples but d… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: 10 pages, 6 figures

  34. arXiv:1904.04882  [pdf, other

    cs.CV

    Contextual Attention for Hand Detection in the Wild

    Authors: Supreeth Narasimhaswamy, Zhengwei Wei, Yang Wang, Justin Zhang, Minh Hoai

    Abstract: We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This ne… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: 9 pages, 9 figures

  35. arXiv:1904.04868  [pdf, other

    cs.CV

    Knowledge Distillation for Human Action Anticipation

    Authors: Vinh Tran, Yang Wang, Minh Hoai

    Abstract: We consider the task of training a neural network to anticipate human actions in video. This task is challenging given the complexity of video data, the stochastic nature of the future, and the limited amount of annotated training data. In this paper, we propose a novel knowledge distillation framework that uses an action recognition network to supervise the training of an action anticipation netw… ▽ More

    Submitted 3 October, 2021; v1 submitted 9 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 figures

    Journal ref: ICIP 2021

  36. arXiv:1902.07262  [pdf, other

    cs.CV

    BusyHands: A Hand-Tool Interaction Database for Assembly Tasks Semantic Segmentation

    Authors: Roy Shilkrot, Zhi Chai, Minh Hoai

    Abstract: Visual segmentation has seen tremendous advancement recently with ready solutions for a wide variety of scene types, including human hands and other body parts. However, focus on segmentation of human hands while performing complex tasks, such as manual assembly, is still severely lacking. Segmenting hands from tools, work pieces, background and other body parts is extremely difficult because of s… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

    Comments: 10 pages, 8 figures

  37. arXiv:1901.02840  [pdf, other

    cs.CV

    GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

    Authors: Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai

    Abstract: Graphics Interchange Format (GIF) is a highly portable graphics format that is ubiquitous on the Internet. Despite their small sizes, GIF images often contain undesirable visual artifacts such as flat color regions, false contours, color shift, and dotted patterns. In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild. We focus on… ▽ More

    Submitted 8 April, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

    Comments: to appear in CVPR 2019

  38. arXiv:1808.03840  [pdf, other

    cs.CL

    Fake Sentence Detection as a Training Task for Sentence Encoding

    Authors: Viresh Ranjan, Heeyoung Kwon, Niranjan Balasubramanian, Minh Hoai

    Abstract: Sentence encoders are typically trained on language modeling tasks with large unlabeled datasets. While these encoders achieve state-of-the-art results on many sentence-level tasks, they are difficult to train with long training cycles. We introduce fake sentence detection as a new training task for learning sentence encoders. We automatically generate fake sentences by corrupting original sentenc… ▽ More

    Submitted 23 August, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

  39. arXiv:1807.09959  [pdf, other

    cs.CV

    Iterative Crowd Counting

    Authors: Viresh Ranjan, Hieu Le, Minh Hoai

    Abstract: In this work, we tackle the problem of crowd counting in images. We present a Convolutional Neural Network (CNN) based density estimation approach to solve this problem. Predicting a high resolution density map in one go is a challenging task. Hence, we present a two branch CNN architecture for generating high resolution density maps, where the first branch generates a low resolution density map,… ▽ More

    Submitted 26 July, 2018; originally announced July 2018.

    Comments: ECCV 2018

  40. arXiv:1712.01361  [pdf, other

    cs.CV

    A+D Net: Training a Shadow Detector with Adversarial Shadow Attenuation

    Authors: Hieu Le, Tomas F. Yago Vicente, Vu Nguyen, Minh Hoai, Dimitris Samaras

    Abstract: We propose a novel GAN-based framework for detecting shadows in images, in which a shadow detection network (D-Net) is trained together with a shadow attenuation network (A-Net) that generates adversarial training examples. The A-Net modifies the original training images constrained by a simplified physical shadow model and is focused on fooling the D-Net's shadow predictions. Hence, it is effecti… ▽ More

    Submitted 27 July, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

  41. arXiv:1708.05465  [pdf, other

    cs.CV

    Eigen Evolution Pooling for Human Action Recognition

    Authors: Yang Wang, Vinh Tran, Minh Hoai

    Abstract: We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pooling… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

  42. arXiv:1702.04037  [pdf, other

    cs.CV

    Evolution-Preserving Dense Trajectory Descriptors

    Authors: Yang Wang, Vinh Tran, Minh Hoai

    Abstract: Recently Trajectory-pooled Deep-learning Descriptors were shown to achieve state-of-the-art human action recognition results on a number of datasets. This paper improves their performance by applying rank pooling to each trajectory, encoding the temporal evolution of deep learning features computed along the trajectory. This leads to Evolution-Preserving Trajectory (EPT) descriptors, a novel type… ▽ More

    Submitted 13 February, 2017; originally announced February 2017.

  43. arXiv:1611.03313  [pdf, other

    cs.CV

    X-ray Scattering Image Classification Using Deep Learning

    Authors: Boyu Wang, Kevin Yager, Dantong Yu, Minh Hoai

    Abstract: Visual inspection of x-ray scattering images is a powerful technique for probing the physical structure of materials at the molecular scale. In this paper, we explore the use of deep learning to develop methods for automatically analyzing x-ray scattering images. In particular, we apply Convolutional Neural Networks and Convolutional Autoencoders for x-ray scattering image classification. To acqui… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

  44. arXiv:1605.09452  [pdf, other

    cs.CV

    Latent Bi-constraint SVM for Video-based Object Recognition

    Authors: Yang Liu, Minh Hoai, Mang Shao, Tae-Kyun Kim

    Abstract: We address the task of recognizing objects from video input. This important problem is relatively unexplored, compared with image-based object recognition. To this end, we make the following contributions. First, we introduce two comprehensive datasets for video-based object recognition. Second, we propose Latent Bi-constraint SVM (LBSVM), a maximum-margin framework for video-based object recognit… ▽ More

    Submitted 30 May, 2016; originally announced May 2016.

  45. arXiv:1604.06397  [pdf, other

    cs.CV

    Improving Human Action Recognition by Non-action Classification

    Authors: Yang Wang, Minh Hoai

    Abstract: In this paper we consider the task of recognizing human actions in realistic video where human actions are dominated by irrelevant factors. We first study the benefits of removing non-action video segments, which are the ones that do not portray any human action. We then learn a non-action classifier and use it to down-weight irrelevant video segments. The non-action classifier is trained using Ac… ▽ More

    Submitted 21 April, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

    Comments: appears in CVPR16