Search | arXiv e-print repository

Learning Deep $\ell_0$ Encoders

Authors: Zhangyang Wang, Qing Ling, Thomas S. Huang

Abstract: Despite its nonconvex nature, $\ell_0$ sparse approximation is desirable in many theoretical and application cases. We study the $\ell_0$ sparse approximation problem with the tool of deep learning, by proposing Deep $\ell_0$ Encoders. Two typical forms, the $\ell_0$ regularized problem and the $M$-sparse problem, are investigated. Based on solid iterative algorithms, we model them as feed-forward… ▽ More Despite its nonconvex nature, $\ell_0$ sparse approximation is desirable in many theoretical and application cases. We study the $\ell_0$ sparse approximation problem with the tool of deep learning, by proposing Deep $\ell_0$ Encoders. Two typical forms, the $\ell_0$ regularized problem and the $M$-sparse problem, are investigated. Based on solid iterative algorithms, we model them as feed-forward neural networks, through introducing novel neurons and pooling functions. Enforcing such structural priors acts as an effective network regularization. The deep encoders also enjoy faster inference, larger learning capacity, and better scalability compared to conventional sparse coding solutions. Furthermore, under task-driven losses, the models can be conveniently optimized from end to end. Numerical results demonstrate the impressive performances of the proposed encoders. △ Less

Submitted 22 November, 2015; v1 submitted 1 September, 2015; originally announced September 2015.

Comments: Full paper at AAAI 2016

arXiv:1509.00151 [pdf, other]

Learning A Task-Specific Deep Architecture For Clustering

Authors: Zhangyang Wang, Shiyu Chang, Jiayu Zhou, Meng Wang, Thomas S. Huang

Abstract: While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a… ▽ More While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a carefully crafted deep model benefiting from both. A feed-forward network structure, named TAGnet, is constructed based on a graph-regularized sparse coding algorithm. It is then trained with task-specific loss functions from end to end. We discover that connecting deep learning to sparse coding benefits not only the model performance, but also its initialization and interpretation. Moreover, by introducing auxiliary clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet and obtain a further performance boost. Extensive experiments demonstrate that the proposed model gains remarkable margins over several state-of-the-art methods. △ Less

Submitted 16 October, 2015; v1 submitted 1 September, 2015; originally announced September 2015.

arXiv:1507.03196 [pdf, other]

DeepFont: Identify Your Font from An Image

Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

Abstract: As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by develo** the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting o… ▽ More As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by develo** the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting of both labeled synthetic data and partially labeled real-world data. Next, to combat the domain mismatch between available training and testing data, we introduce a Convolutional Neural Network (CNN) decomposition approach, using a domain adaptation technique based on a Stacked Convolutional Auto-Encoder (SCAE) that exploits a large corpus of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. Moreover, we study a novel learning-based model compression approach, in order to reduce the DeepFont model size without sacrificing its performance. The DeepFont system achieves an accuracy of higher than 80% (top-5) on our collected dataset, and also produces a good font similarity measure for font selection and suggestion. We also achieve around 6 times compression of the model without any visible loss of recognition accuracy. △ Less

Submitted 12 July, 2015; originally announced July 2015.

Comments: To Appear in ACM Multimedia as a full paper

arXiv:1504.05632 [pdf, other]

Self-Tuned Deep Super Resolution

Authors: Zhangyang Wang, Yingzhen Yang, Zhaowen Wang, Shiyu Chang, Wei Han, Jianchao Yang, Thomas S. Huang

Abstract: Deep learning has been successfully applied to image super resolution (SR). In this paper, we propose a deep joint super resolution (DJSR) model to exploit both external and self similarities for SR. A Stacked Denoising Convolutional Auto Encoder (SDCAE) is first pre-trained on external examples with proper data augmentations. It is then fine-tuned with multi-scale self examples from each input, w… ▽ More Deep learning has been successfully applied to image super resolution (SR). In this paper, we propose a deep joint super resolution (DJSR) model to exploit both external and self similarities for SR. A Stacked Denoising Convolutional Auto Encoder (SDCAE) is first pre-trained on external examples with proper data augmentations. It is then fine-tuned with multi-scale self examples from each input, where the reliability of self examples is explicitly taken into account. We also enhance the model performance by sub-model training and selection. The DJSR model is extensively evaluated and compared with state-of-the-arts, and show noticeable performance improvements both quantitatively and perceptually on a wide range of images. △ Less

Submitted 21 April, 2015; originally announced April 2015.

arXiv:1504.00028 [pdf, other]

Real-World Font Recognition Using Deep Network and Domain Adaptation

Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

Abstract: We address a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous methods (Chen et al. (2014)). In this paper, we refer to Convolutional Neural… ▽ More We address a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous methods (Chen et al. (2014)). In this paper, we refer to Convolutional Neural Networks, and use an adaptation technique based on a Stacked Convolutional Auto-Encoder that exploits unlabeled real-world images combined with synthetic data. The proposed method achieves an accuracy of higher than 80% (top-5) on a real-world dataset. △ Less

Submitted 31 March, 2015; originally announced April 2015.

arXiv:1503.03621 [pdf, ps, other]

Designing A Composite Dictionary Adaptively From Joint Examples

Authors: Zhangyang Wang, Yingzhen Yang, Jianchao Yang, Thomas S. Huang

Abstract: We study the complementary behaviors of external and internal examples in image restoration, and are motivated to formulate a composite dictionary design framework. The composite dictionary consists of the global part learned from external examples, and the sample-specific part learned from internal examples. The dictionary atoms in both parts are further adaptively weighted to emphasize their mod… ▽ More We study the complementary behaviors of external and internal examples in image restoration, and are motivated to formulate a composite dictionary design framework. The composite dictionary consists of the global part learned from external examples, and the sample-specific part learned from internal examples. The dictionary atoms in both parts are further adaptively weighted to emphasize their model statistics. Experiments demonstrate that the joint utilization of external and internal examples leads to substantial improvements, with successful applications in image denoising and super resolution. △ Less

Submitted 8 September, 2015; v1 submitted 12 March, 2015; originally announced March 2015.

arXiv:1503.01647 [pdf, other]

Decentralized Recommender Systems

Authors: Zhangyang Wang, Xianming Liu, Shiyu Chang, Jiayu Zhou, Guo-Jun Qi, Thomas S. Huang

Abstract: This paper proposes a decentralized recommender system by formulating the popular collaborative filleting (CF) model into a decentralized matrix completion form over a set of users. In such a way, data storages and computations are fully distributed. Each user could exchange limited information with its local neighborhood, and thus it avoids the centralized fusion. Advantages of the proposed syste… ▽ More This paper proposes a decentralized recommender system by formulating the popular collaborative filleting (CF) model into a decentralized matrix completion form over a set of users. In such a way, data storages and computations are fully distributed. Each user could exchange limited information with its local neighborhood, and thus it avoids the centralized fusion. Advantages of the proposed system include a protection on user privacy, as well as better scalability and robustness. We compare our proposed algorithm with several state-of-the-art algorithms on the FlickerUserFavor dataset, and demonstrate that the decentralized algorithm can gain a competitive performance to others. △ Less

Submitted 5 March, 2015; originally announced March 2015.

arXiv:1503.01138 [pdf, other]

doi 10.1109/TIP.2015.2462113

Learning Super-Resolution Jointly from External and Internal Examples

Authors: Zhangyang Wang, Yingzhen Yang, Zhaowen Wang, Shiyu Chang, Jianchao Yang, Thomas S. Huang

Abstract: Single image super-resolution (SR) aims to estimate a high-resolution (HR) image from a lowresolution (LR) input. Image priors are commonly learned to regularize the otherwise seriously ill-posed SR problem, either using external LR-HR pairs or internal similar patterns. We propose joint SR to adaptively combine the advantages of both external and internal SR methods. We define two loss functions… ▽ More Single image super-resolution (SR) aims to estimate a high-resolution (HR) image from a lowresolution (LR) input. Image priors are commonly learned to regularize the otherwise seriously ill-posed SR problem, either using external LR-HR pairs or internal similar patterns. We propose joint SR to adaptively combine the advantages of both external and internal SR methods. We define two loss functions using sparse coding based external examples, and epitomic matching based on internal examples, as well as a corresponding adaptive weight to automatically balance their contributions according to their reconstruction errors. Extensive SR results demonstrate the effectiveness of the proposed method over the existing state-of-the-art methods, and is also verified by our subjective evaluation study. △ Less

Submitted 16 June, 2015; v1 submitted 3 March, 2015; originally announced March 2015.

arXiv:1412.6597 [pdf, other]

An Analysis of Unsupervised Pre-training in Light of Recent Advances

Authors: Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

Abstract: Convolutional neural networks perform well on object recognition because of a number of recent advances: rectified linear units (ReLUs), data augmentation, dropout, and large labelled datasets. Unsupervised data has been proposed as another way to improve performance. Unfortunately, unsupervised pre-training is not used by state-of-the-art methods leading to the following question: Is unsupervised… ▽ More Convolutional neural networks perform well on object recognition because of a number of recent advances: rectified linear units (ReLUs), data augmentation, dropout, and large labelled datasets. Unsupervised data has been proposed as another way to improve performance. Unfortunately, unsupervised pre-training is not used by state-of-the-art methods leading to the following question: Is unsupervised pre-training still useful given recent advances? If so, when? We answer this in three parts: we 1) develop an unsupervised method that incorporates ReLUs and recent unsupervised regularization techniques, 2) analyze the benefits of unsupervised pre-training compared to data augmentation and dropout on CIFAR-10 while varying the ratio of unsupervised to supervised samples, 3) verify our findings on STL-10. We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low. We also use unsupervised pre-training with additional color augmentation to achieve near state-of-the-art performance on STL-10. △ Less

Submitted 10 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

Comments: Accepted as a workshop contribution to ICLR 2015

arXiv:1412.5758

Decomposition-Based Domain Adaptation for Real-World Font Recognition

Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

Abstract: We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic dom… ▽ More We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous font recognition methods (Chen et al. (2014)). In this paper, we introduce a Convolutional Neural Network decomposition approach, leveraging a large training corpus of synthetic data to obtain effective features for classification. This is done using an adaptation technique based on a Stacked Convolutional Auto-Encoder that exploits a large collection of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. The proposed DeepFont method achieves an accuracy of higher than 80% (top-5) on a new large labeled real-world dataset we collected. △ Less

Submitted 1 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

Comments: This paper has been withdrawn by the author due to project concerns

arXiv:1301.6731 [pdf]

Variational Learning in Mixed-State Dynamic Graphical Models

Authors: Vladimir Pavlovic, Brendan J. Frey, Thomas S. Huang

Abstract: Many real-valued stochastic time-series are locally linear (Gassian), but globally non-linear. For example, the trajectory of a human hand gesture can be viewed as a linear dynamic system driven by a nonlinear dynamic system that represents muscle actions. We present a mixed-state dynamic graphical model in which a hidden Markov model drives a linear dynamic system. This combination allows us t… ▽ More Many real-valued stochastic time-series are locally linear (Gassian), but globally non-linear. For example, the trajectory of a human hand gesture can be viewed as a linear dynamic system driven by a nonlinear dynamic system that represents muscle actions. We present a mixed-state dynamic graphical model in which a hidden Markov model drives a linear dynamic system. This combination allows us to model both the discrete and continuous causes of trajectories such as human gestures. The number of computations needed for exact inference is exponential in the sequence length, so we derive an approximate variational inference technique that can also be used to learn the parameters of the discrete and continuous models. We show how the mixed-state model and the variational technique can be used to classify human hand gestures made with a computer mouse. △ Less

Submitted 23 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI1999)

Report number: UAI-P-1999-PG-522-530

arXiv:1210.4481 [pdf, ps, other]

Epitome for Automatic Image Colorization

Authors: Yingzhen Yang, Xinqi Chu, Tian-Tsong Ng, Alex Yong-Sang Chia, Shuicheng Yan, Thomas S. Huang

Abstract: Image colorization adds color to grayscale images. It not only increases the visual appeal of grayscale images, but also enriches the information contained in scientific images that lack color information. Most existing methods of colorization require laborious user interaction for scribbles or image segmentation. To eliminate the need for human labor, we develop an automatic image colorization me… ▽ More Image colorization adds color to grayscale images. It not only increases the visual appeal of grayscale images, but also enriches the information contained in scientific images that lack color information. Most existing methods of colorization require laborious user interaction for scribbles or image segmentation. To eliminate the need for human labor, we develop an automatic image colorization method using epitome. Built upon a generative graphical model, epitome is a condensed image appearance and shape model which also proves to be an effective summary of color information for the colorization task. We train the epitome from the reference images and perform inference in the epitome to colorize grayscale images, rendering better colorization results than previous method in our experiments. △ Less

Submitted 8 October, 2012; originally announced October 2012.

arXiv:1210.0645 [pdf, ps, other]

Nonparametric Unsupervised Classification

Authors: Yingzhen Yang, Thomas S. Huang

Abstract: Unsupervised classification methods learn a discriminative classifier from unlabeled data, which has been proven to be an effective way of simultaneously clustering the data and training a classifier from the data. Various unsupervised classification methods obtain appealing results by the classifiers learned in an unsupervised manner. However, existing methods do not consider the misclassificatio… ▽ More Unsupervised classification methods learn a discriminative classifier from unlabeled data, which has been proven to be an effective way of simultaneously clustering the data and training a classifier from the data. Various unsupervised classification methods obtain appealing results by the classifiers learned in an unsupervised manner. However, existing methods do not consider the misclassification error of the unsupervised classifiers except unsupervised SVM, so the performance of the unsupervised classifiers is not fully evaluated. In this work, we study the misclassification error of two popular classifiers, i.e. the nearest neighbor classifier (NN) and the plug-in classifier, in the setting of unsupervised classification. △ Less

Submitted 20 May, 2013; v1 submitted 2 October, 2012; originally announced October 2012.

Comments: Submitted to ALT 2013

arXiv:1203.3483 [pdf]

Regularized Maximum Likelihood for Intrinsic Dimension Estimation

Authors: Mithun Das Gupta, Thomas S. Huang

Abstract: We propose a new method for estimating the intrinsic dimension of a dataset by applying the principle of regularized maximum likelihood to the distances between close neighbors. We propose a regularization scheme which is motivated by divergence minimization principles. We derive the estimator by a Poisson process approximation, argue about its convergence properties and apply it to a number of si… ▽ More We propose a new method for estimating the intrinsic dimension of a dataset by applying the principle of regularized maximum likelihood to the distances between close neighbors. We propose a regularization scheme which is motivated by divergence minimization principles. We derive the estimator by a Poisson process approximation, argue about its convergence properties and apply it to a number of simulated and real datasets. We also show it has the best overall performance compared with two other intrinsic dimension estimators. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-220-227

arXiv:1004.3814 [pdf, other]

Bregman Distance to L1 Regularized Logistic Regression

Authors: Mithun Das Gupta, Thomas S. Huang

Abstract: In this work we investigate the relationship between Bregman distances and regularized Logistic Regression model. We present a detailed study of Bregman Distance minimization, a family of generalized entropy measures associated with convex functions. We convert the L1-regularized logistic regression into this more general framework and propose a primal-dual method based algorithm for learning the… ▽ More In this work we investigate the relationship between Bregman distances and regularized Logistic Regression model. We present a detailed study of Bregman Distance minimization, a family of generalized entropy measures associated with convex functions. We convert the L1-regularized logistic regression into this more general framework and propose a primal-dual method based algorithm for learning the parameters. We pose L1-regularized logistic regression into Bregman distance minimization and then apply non-linear constrained optimization techniques to estimate the parameters of the logistic model. △ Less

Submitted 21 April, 2010; originally announced April 2010.

Comments: 8 pages, 3 images, shorter version published in ICPR 2008 by same authors.

Showing 51–65 of 65 results for author: Huang, T S