Skip to main content

Showing 1–50 of 65 results for author: Huang, T S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2104.14082  [pdf, other

    cs.CV

    Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

    Authors: Jiachen Li, Bowen Cheng, Rogerio Feris, **jun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi

    Abstract: Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric. In this paper, we present \textbf{Pseudo-Intersection-over-Union~(Pseudo-IoU)}: a simple metric that brings… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 Workshop

  2. arXiv:2012.03400  [pdf, other

    cs.CV

    CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

    Authors: Yang Fu, Linjie Yang, Ding Liu, Thomas S. Huang, Humphrey Shi

    Abstract: Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by… ▽ More

    Submitted 6 December, 2020; originally announced December 2020.

    Comments: Accepted to AAAI 2021

  3. arXiv:2006.15710  [pdf, other

    eess.IV cs.CV cs.LG

    Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

    Authors: Hanchao Yu, Xiao Chen, Humphrey Shi, Terrence Chen, Thomas S. Huang, Shanhui Sun

    Abstract: Cardiac motion estimation plays a key role in MRI cardiac feature tracking and function assessment such as myocardium strain. In this paper, we propose Motion Pyramid Networks, a novel deep learning-based approach for accurate and efficient cardiac motion estimation. We predict and fuse a pyramid of motion fields from multiple scales of feature representations to generate a more refined motion fie… ▽ More

    Submitted 15 September, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

    Comments: Accepted by MICCAI2020

  4. arXiv:2006.04357  [pdf, other

    cs.CV

    Neural Sparse Representation for Image Restoration

    Authors: Yuchen Fan, Jiahui Yu, Yiqun Mei, Yulun Zhang, Yun Fu, Ding Liu, Thomas S. Huang

    Abstract: Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neur… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  5. arXiv:2006.01424  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

    Authors: Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, Humphrey Shi

    Abstract: Deep convolution-based single image super-resolution (SISR) networks embrace the benefits of learning from large-scale external image resources for local recovery, yet most existing works have ignored the long-range feature-wise similarities in natural images. Some recent works have successfully leveraged this intrinsic feature correlation by exploring non-local attention modules. However, none of… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: CVPR2020

  6. arXiv:2005.01056  [pdf, other

    eess.IV cs.CV

    NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

    Authors: Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, **g Liu, Kwang** Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He , et al. (38 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best percept… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: CVPRW 2020

  7. arXiv:2004.13824  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Pyramid Attention Networks for Image Restoration

    Authors: Yiqun Mei, Yuchen Fan, Yulun Zhang, Jiahui Yu, Yuqian Zhou, Ding Liu, Yun Fu, Thomas S. Huang, Humphrey Shi

    Abstract: Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scal… ▽ More

    Submitted 3 June, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  8. arXiv:2004.09754  [pdf, other

    cs.CV cs.LG eess.IV

    The 1st Agriculture-Vision Challenge: Methods and Results

    Authors: Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao , et al. (17 additional authors not shown)

    Abstract: The first Agriculture-Vision Challenge aims to encourage research in develo** novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agricultu… ▽ More

    Submitted 23 April, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 Workshop

  9. arXiv:2004.00794  [pdf, other

    cs.CV cs.LG

    Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation

    Authors: Zhonghao Wang, Yunchao Wei, Rogerior Feris, **jun Xiong, Wen-Mei Hwu, Thomas S. Huang, Humphrey Shi

    Abstract: Learning segmentation from synthetic data and adapting to real data can significantly relieve human efforts in labelling pixel-level masks. A key challenge of this task is how to alleviate the data distribution discrepancy between the source and target domains, i.e. reducing domain shift. The common approach to this problem is to minimize the discrepancy between feature distributions from differen… ▽ More

    Submitted 9 June, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

    Comments: CVPRW 2020

  10. arXiv:2003.13623  [pdf, other

    cs.CV cs.LG

    Laplacian Denoising Autoencoder

    Authors: Jianbo Jiao, Linchao Bao, Yunchao Wei, Shengfeng He, Honghui Shi, Rynson Lau, Thomas S. Huang

    Abstract: While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale. Therefore, learning robust representations with unlabeled data is critical in relieving human effort and vital for many downstream tasks. Recent advances in unsupervised and self-supervised learni… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  11. arXiv:2003.08040  [pdf, other

    cs.CV cs.LG eess.IV

    Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

    Authors: Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerio Feris, **jun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

    Abstract: We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work. State-of-the-art approaches prove that performing semantic-level alignment is helpful in tackling the domain shift issue. Based on the observation that stuff categories usually share similar appeara… ▽ More

    Submitted 9 June, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  12. arXiv:2003.06849  [pdf, other

    cs.CV

    Deep Affinity Net: Instance Segmentation via Affinity

    Authors: Xingqian Xu, Mang Tik Chiu, Thomas S. Huang, Honghui Shi

    Abstract: Most of the modern instance segmentation approaches fall into two categories: region-based approaches in which object bounding boxes are detected first and later used in crop** and segmenting instances; and keypoint-based approaches in which individual instances are represented by a set of keypoints followed by a dense pixel clustering around those keypoints. Despite the maturity of these two pa… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

  13. arXiv:2003.00872  [pdf, other

    cs.CV eess.IV

    AlignSeg: Feature-Aligned Segmentation Networks

    Authors: Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi

    Abstract: Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by 1) step-by-step downsampling operations, and 2) indiscrimina… ▽ More

    Submitted 1 March, 2021; v1 submitted 24 February, 2020; originally announced March 2020.

    Comments: Accepted by TPAMI 2021

  14. arXiv:2001.01306  [pdf, other

    cs.CV cs.CY cs.LG eess.IV

    Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

    Authors: Mang Tik Chiu, Xingqian Xu, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S. Huang, Honghui Shi

    Abstract: The success of deep learning in visual recognition tasks has driven advancements in multiple fields of research. Particularly, increasing attention has been drawn towards its application in agriculture. Nevertheless, while visual pattern recognition on farmlands carries enormous economic values, little progress has been made to merge computer vision and crop sciences due to the lack of suitable ag… ▽ More

    Submitted 19 March, 2020; v1 submitted 5 January, 2020; originally announced January 2020.

    Comments: CVPR 2020

  15. arXiv:1912.09028  [pdf, other

    cs.CV

    Scale-wise Convolution for Image Restoration

    Authors: Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang

    Abstract: While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e.g. multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling s… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: AAAI 2020

  16. arXiv:1911.10194  [pdf, other

    cs.CV

    Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

    Authors: Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

    Abstract: In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respect… ▽ More

    Submitted 11 March, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: CVPR 2020

  17. arXiv:1911.07346  [pdf, other

    cs.LG cs.CV stat.ML

    Any-Precision Deep Neural Networks

    Authors: Haichao Yu, Haoxiang Li, Honghui Shi, Thomas S. Huang, Gang Hua

    Abstract: We present any-precision deep neural networks (DNNs), which are trained with a new method that allows the learned DNNs to be flexible in numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-widths, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that… ▽ More

    Submitted 15 January, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

    Comments: AAAI 2021

  18. arXiv:1910.04751  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Panoptic-DeepLab

    Authors: Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

    Abstract: We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation… ▽ More

    Submitted 23 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: This work is presented at ICCV 2019 Joint COCO and Mapillary Recognition Challenge Workshop

  19. arXiv:1909.04110  [pdf, other

    cs.CV cs.AI cs.LG

    One-to-one Map** for Unpaired Image-to-image Translation

    Authors: Zengming Shen, S. Kevin Zhou, Yifan Chen, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang

    Abstract: Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the map** between two image domains is unique or one-to-one. Here we propose a self-inverse… ▽ More

    Submitted 14 January, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted by WACV 2020

  20. arXiv:1909.04104  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

    Authors: Zengming Shen, Yifan Chen, S. Kevin Zhou, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang

    Abstract: The one-to-one map** is necessary for many bidirectional image-to-image translation applications, such as MRI image synthesis as MRI images are unique to the patient. State-of-the-art approaches for image synthesis from domain X to domain Y learn a convolutional neural network that meticulously maps between the domains. A different network is typically implemented to map along the opposite direc… ▽ More

    Submitted 16 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: 10 pages, 9 figures

  21. arXiv:1908.10357  [pdf, other

    cs.CV cs.LG eess.IV

    HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

    Authors: Bowen Cheng, Bin Xiao, **gdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang

    Abstract: Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation… ▽ More

    Submitted 12 March, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: CVPR 2020

  22. arXiv:1902.03264  [pdf, other

    cs.LG cs.AI stat.ML

    FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary

    Authors: Yingzhen Yang, Jiahui Yu, Nebojsa Jojic, Jun Huan, Thomas S. Huang

    Abstract: We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. The proposed method reduces the number of parameters of each convolutional layer by learning a 1D vector termed Filter Summary (FS). The convolutional filters are located in FS as overlap** 1D segments, and nearby filters in FS share weigh… ▽ More

    Submitted 10 April, 2020; v1 submitted 8 February, 2019; originally announced February 2019.

    Comments: published at ICLR 2020

  23. arXiv:1902.00873  [pdf, other

    cs.LG stat.ML

    An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity

    Authors: Yingzhen Yang, Jiahui Yu, Xingjian Li, Jun Huan, Thomas S. Huang

    Abstract: Regularization of Deep Neural Networks (DNNs) for the sake of improving their generalization capability is important and challenging. The development in this line benefits theoretical foundation of DNNs and promotes their usability in different areas of artificial intelligence. In this paper, we investigate the role of Rademacher complexity in improving generalization of DNNs and propose a novel r… ▽ More

    Submitted 16 November, 2019; v1 submitted 3 February, 2019; originally announced February 2019.

    Comments: Updated the link to the open source PaddlePaddle code of LRC Regularization as well as the author list

  24. arXiv:1811.11721  [pdf, other

    cs.CV

    CCNet: Criss-Cross Attention for Semantic Segmentation

    Authors: Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang

    Abstract: Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking… ▽ More

    Submitted 9 July, 2020; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: IEEE TPAMI 2020 & ICCV 2019

  25. arXiv:1811.09347  [pdf, other

    cs.CV cs.AI cs.LG

    A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

    Authors: Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, **jun Xiong, Wen-Mei Hwu, Thomas S. Huang, Humphrey Shi

    Abstract: While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively. Driven by this fact, we investigate the training paradigms where the samples are not drawn from independent and identical distribution. We propose a data… ▽ More

    Submitted 14 October, 2020; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: Technical report

  26. arXiv:1809.01826  [pdf, other

    cs.CV

    Connecting Image Denoising and High-Level Vision Tasks via Deep Learning

    Authors: Ding Liu, Bihan Wen, Jianbo Jiao, Xianming Liu, Zhangyang Wang, Thomas S. Huang

    Abstract: Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: text overlap with arXiv:1706.04284

  27. arXiv:1806.02919  [pdf, other

    cs.CV

    Non-Local Recurrent Network for Image Restoration

    Authors: Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, Thomas S. Huang

    Abstract: Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration. However, it remains unclear and challenging to make use of this intrinsic property via deep networks. In this paper, we propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restor… ▽ More

    Submitted 11 December, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: NIPS 2018

  28. arXiv:1805.04574  [pdf, other

    cs.CV

    Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation

    Authors: Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang

    Abstract: Despite the remarkable progress, weakly supervised segmentation approaches are still inferior to their fully supervised counterparts. We obverse the performance gap mainly comes from their limitation on learning to produce high-quality dense object localization maps from image-level supervision. To mitigate such a gap, we revisit the dilated convolution [1] and reveal how it can be utilized in a n… ▽ More

    Submitted 27 May, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: Accepted by CVPR 2018 as Spotlight

  29. arXiv:1805.02704  [pdf, other

    cs.CV

    Image Super-Resolution via Dual-State Recurrent Networks

    Authors: Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, Thomas S. Huang

    Abstract: Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks. Inspired by these recent discoveries, we note that many state-of-the-art deep SR architectures can be reformulated as a single-state recurrent neural network (RNN) with finite unfoldings. In this paper, we explore new structures for SR based on this compact RNN view, leadi… ▽ More

    Submitted 7 May, 2018; originally announced May 2018.

  30. arXiv:1801.07892  [pdf, other

    cs.CV cs.GR

    Generative Image Inpainting with Contextual Attention

    Authors: Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang

    Abstract: Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly bor… ▽ More

    Submitted 21 March, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: Accepted in CVPR 2018; add CelebA-HQ results; open sourced; interactive demo available: http://jhyu.me/demo

  31. Enhance Visual Recognition under Adverse Conditions via Deep Networks

    Authors: Ding Liu, Bowen Cheng, Zhangyang Wang, Haichao Zhang, Thomas S. Huang

    Abstract: Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies hav… ▽ More

    Submitted 2 April, 2019; v1 submitted 20 December, 2017; originally announced December 2017.

    Journal ref: IEEE Transactions on Image Processing 2019

  32. arXiv:1710.02224  [pdf, other

    cs.AI cs.LG

    Dilated Recurrent Neural Networks

    Authors: Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

    Abstract: Learning with recurrent neural networks (RNNs) on long sequences is a notoriously difficult task. There are three major challenges: 1) complex dependencies, 2) vanishing and exploding gradients, and 3) efficient parallelization. In this paper, we introduce a simple yet effective RNN connection structure, the DilatedRNN, which simultaneously tackles all of these challenges. The proposed architectur… ▽ More

    Submitted 1 November, 2017; v1 submitted 5 October, 2017; originally announced October 2017.

    Comments: Accepted by NIPS 2017

  33. arXiv:1709.03126  [pdf, other

    cs.CV cs.AI cs.LG

    Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

    Authors: Bowen Cheng, Zhangyang Wang, Zhaobin Zhang, Zhu Li, Ding Liu, Jianchao Yang, Shuai Huang, Thomas S. Huang

    Abstract: Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition reliability. We develop a novel framework to achieve robust emotion recognition from low bit rate video.… ▽ More

    Submitted 10 September, 2017; originally announced September 2017.

    Comments: Accepted by the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII2017)

  34. arXiv:1709.01231  [pdf, ps, other

    stat.ML cs.LG

    Discriminative Similarity for Clustering and Semi-Supervised Learning

    Authors: Yingzhen Yang, Feng Liang, Nebojsa Jojic, Shuicheng Yan, Jiashi Feng, Thomas S. Huang

    Abstract: Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. T… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  35. arXiv:1709.01230  [pdf, ps, other

    math.OC cs.LG

    On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation

    Authors: Yingzhen Yang, Jiashi Feng, Nebojsa Jojic, Jianchao Yang, Thomas S. Huang

    Abstract: We study the proximal gradient descent (PGD) method for $\ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the $\ell^{0}$ sparse approximation problem under conditions weaker than Rest… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  36. arXiv:1706.04284  [pdf, other

    cs.CV

    When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

    Authors: Ding Liu, Bihan Wen, Xianming Liu, Zhangyang Wang, Thomas S. Huang

    Abstract: Conventionally, image denoising and high-level vision tasks are handled separately in computer vision. In this paper, we cope with the two jointly and explore the mutual influence between them. First we propose a convolutional neural network for image denoising which achieves the state-of-the-art performance. Second we propose a deep neural network solution that cascades two modules for image deno… ▽ More

    Submitted 16 April, 2018; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: the 27th International Joint Conference on Artificial Intelligence (2018)

  37. arXiv:1704.06001  [pdf, other

    cs.LG cs.CV stat.ML

    Fast Generation for Convolutional Autoregressive Models

    Authors: Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang

    Abstract: Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a naïve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environmen… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: Accepted at ICLR 2017 Workshop

  38. arXiv:1612.02766  [pdf, other

    cs.CV

    Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation

    Authors: Xianming Liu, Amy Zhang, Tobias Tiecke, Andreas Gros, Thomas S. Huang

    Abstract: Learning from weakly-supervised data is one of the main challenges in machine learning and computer vision, especially for tasks such as image semantic segmentation where labeling is extremely expensive and subjective. In this paper, we propose a novel neural network architecture to perform weakly-supervised learning by suppressing irrelevant neuron activations. It localizes objects of interest by… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: 9 pages, 4 figures

  39. arXiv:1611.09482  [pdf, other

    cs.SD cs.DS cs.LG

    Fast Wavenet Generation Algorithm

    Authors: Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang

    Abstract: This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advanta… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: Technical Report

  40. arXiv:1608.06374  [pdf, other

    cs.LG cs.CV

    Deep Double Sparsity Encoder: Learning to Sparsify Not Only Features But Also Parameters

    Authors: Zhangyang Wang, Thomas S. Huang

    Abstract: This paper emphasizes the significance to jointly exploit the problem structure and the parameter structure, in the context of deep modeling. As a specific and interesting example, we describe the deep double sparsity encoder (DDSE), which is inspired by the double sparsity model for dictionary learning. DDSE simultaneously sparsities the output features and the learned model parameters, under one… ▽ More

    Submitted 1 October, 2016; v1 submitted 22 August, 2016; originally announced August 2016.

  41. arXiv:1608.04062  [pdf, ps, other

    cs.LG cs.CV

    Stacked Approximated Regression Machine: A Simple Deep Learning Approach

    Authors: Zhangyang Wang, Shiyu Chang, Qing Ling, Shuai Huang, Xia Hu, Honghui Shi, Thomas S. Huang

    Abstract: With the agreement of my coauthors, I Zhangyang Wang would like to withdraw the manuscript "Stacked Approximated Regression Machine: A Simple Deep Learning Approach". Some experimental procedures were not included in the manuscript, which makes a part of important claims not meaningful. In the relevant research, I was solely responsible for carrying out the experiments; the other coauthors joined… ▽ More

    Submitted 8 September, 2016; v1 submitted 14 August, 2016; originally announced August 2016.

    Comments: This manuscript has been withdrawn by the authors. Please see the updated text for details

  42. arXiv:1607.06182  [pdf, other

    cs.SI cs.IR cs.LG

    Streaming Recommender Systems

    Authors: Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A. Hasegawa-Johnson, Thomas S. Huang

    Abstract: The increasing popularity of real-world recommender systems produces data continuously and rapidly, and it becomes more realistic to study recommender systems under streaming scenarios. Data streams present distinct properties such as temporally ordered, continuous and high-velocity, which poses tremendous challenges to traditional recommender systems. In this paper, we investigate the problem of… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

  43. arXiv:1604.01475  [pdf, other

    cs.LG cs.CV

    Learning A Deep $\ell_\infty$ Encoder for Hashing

    Authors: Zhangyang Wang, Yingzhen Yang, Shiyu Chang, Qing Ling, Thomas S. Huang

    Abstract: We investigate the $\ell_\infty$-constrained representation which demonstrates robustness to quantization errors, utilizing the tool of deep learning. Based on the Alternating Direction Method of Multipliers (ADMM), we formulate the original convex minimization problem as a feed-forward neural network, named \textit{Deep $\ell_\infty$ Encoder}, by introducing the novel Bounded Linear Unit (BLU) ne… ▽ More

    Submitted 5 April, 2016; originally announced April 2016.

    Comments: To be presented at IJCAI'16

  44. arXiv:1602.08465  [pdf, other

    cs.CV

    Seq-NMS for Video Object Detection

    Authors: Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

    Abstract: Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modificatio… ▽ More

    Submitted 22 August, 2016; v1 submitted 26 February, 2016; originally announced February 2016.

    Comments: Technical Report for Imagenet VID Competition 2015

  45. arXiv:1602.07377  [pdf, other

    cs.CV

    How Deep Neural Networks Can Improve Emotion Recognition on Video Data

    Authors: Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, Thomas S. Huang

    Abstract: We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion reco… ▽ More

    Submitted 9 January, 2017; v1 submitted 23 February, 2016; originally announced February 2016.

    Comments: Accepted at ICIP 2016. Fixed typo in Experiments section

  46. arXiv:1601.04155  [pdf, other

    cs.CV cs.LG cs.NE

    Brain-Inspired Deep Networks for Image Aesthetics Assessment

    Authors: Zhangyang Wang, Shiyu Chang, Florin Dolcos, Diane Beck, Ding Liu, Thomas S. Huang

    Abstract: Image aesthetics assessment has been challenging due to its subjective nature. Inspired by the scientific advances in the human visual perception and neuroaesthetics, we design Brain-Inspired Deep Networks (BDN) for this task. BDN first learns attributes through the parallel supervised pathways, on a variety of selected feature dimensions. A high-level synthesis network is trained to associate and… ▽ More

    Submitted 14 March, 2016; v1 submitted 16 January, 2016; originally announced January 2016.

  47. arXiv:1601.04153  [pdf, other

    cs.CV cs.AI cs.LG

    Studying Very Low Resolution Recognition Using Deep Networks

    Authors: Zhangyang Wang, Shiyu Chang, Yingzhen Yang, Ding Liu, Thomas S. Huang

    Abstract: Visual recognition research often assumes a sufficient resolution of the region of interest (ROI). That is usually violated in practice, inspiring us to explore the Very Low Resolution Recognition (VLRR) problem. Typically, the ROI in a VLRR problem can be smaller than $16 \times 16$ pixels, and is challenging to be recognized even by human experts. We attempt to solve the VLRR problem using deep… ▽ More

    Submitted 31 March, 2016; v1 submitted 16 January, 2016; originally announced January 2016.

  48. arXiv:1601.04149  [pdf, other

    cs.CV cs.AI cs.LG

    $\mathbf{D^3}$: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images

    Authors: Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, Thomas S. Huang

    Abstract: In this paper, we design a Deep Dual-Domain ($\mathbf{D^3}$) based fast restoration model to remove artifacts of JPEG compressed images. It leverages the large learning capacity of deep networks, as well as the problem-specific expertise that was hardly incorporated in the past design of deep architectures. For the latter, we take into consideration both the prior knowledge of the JPEG compression… ▽ More

    Submitted 9 April, 2016; v1 submitted 16 January, 2016; originally announced January 2016.

  49. arXiv:1510.08520  [pdf, ps, other

    cs.LG cs.CV

    Learning with $\ell^{0}$-Graph: $\ell^{0}$-Induced Sparse Subspace Clustering

    Authors: Yingzhen Yang, Jiashi Feng, Jianchao Yang, Thomas S. Huang

    Abstract: Sparse subspace clustering methods, such as Sparse Subspace Clustering (SSC) \cite{ElhamifarV13} and $\ell^{1}$-graph \cite{YanW09,ChengYYFH10}, are effective in partitioning the data that lie in a union of subspaces. Most of those methods use $\ell^{1}$-norm or $\ell^{2}$-norm with thresholding to impose the sparsity of the constructed sparse similarity graph, and certain assumptions, e.g. indepe… ▽ More

    Submitted 18 November, 2015; v1 submitted 28 October, 2015; originally announced October 2015.

  50. arXiv:1510.02969  [pdf, other

    cs.CV cs.LG cs.NE

    Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

    Authors: Pooya Khorrami, Tom Le Paine, Thomas S. Huang

    Abstract: Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn. In this work, not only do we show that CNNs can achieve strong performance, but we also introduce an approach to… ▽ More

    Submitted 15 March, 2017; v1 submitted 10 October, 2015; originally announced October 2015.

    Comments: Accepted at ICCV 2015 CV4AC Workshop. Corrected numbers in Tables 2 and 3