-
DeepLab2: A TensorFlow Library for Deep Labeling
Authors:
Mark Weber,
Huiyu Wang,
Siyuan Qiao,
Jun Xie,
Maxwell D. Collins,
Yukun Zhu,
Liangzhe Yuan,
Dahun Kim,
Qihang Yu,
Daniel Cremers,
Laura Leal-Taixe,
Alan L. Yuille,
Florian Schroff,
Hartwig Adam,
Liang-Chieh Chen
Abstract:
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the sta…
▽ More
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the state-of-art systems. To showcase the effectiveness of DeepLab2, our Panoptic-DeepLab employing Axial-SWideRNet as network backbone achieves 68.0% PQ or 83.5% mIoU on Cityscaspes validation set, with only single-scale inference and ImageNet-1K pretrained checkpoints. We hope that publicly sharing our library could facilitate future research on dense pixel labeling tasks and envision new applications of this technology. Code is made publicly available at \url{https://github.com/google-research/deeplab2}.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Approaches for handling slo** fluid-solid interfaces with the parabolic equation method
Authors:
Michael D. Collins,
Adith Ramamurti
Abstract:
Several methods for handling slo** fluid-solid interfaces with the elastic parabolic equation are tested. A single-scattering approach that is modified for the fluid-solid case is accurate for some problems but breaks down when the contrast across the interface is sufficiently large and when there is a Scholte wave. An approximate condition for conserving energy breaks down when a Scholte wave p…
▽ More
Several methods for handling slo** fluid-solid interfaces with the elastic parabolic equation are tested. A single-scattering approach that is modified for the fluid-solid case is accurate for some problems but breaks down when the contrast across the interface is sufficiently large and when there is a Scholte wave. An approximate condition for conserving energy breaks down when a Scholte wave propagates along a slo** interface but otherwise performs well for a large class of problems involving gradual slopes, a wide range of sediment parameters, and ice cover. An approach based on treating part of the fluid layer as a solid with low shear speed handles Scholte waves and a wide range of sediment parameters accurately, but this approach needs further development. The variable rotated parabolic equation is not effective for problems involving frequent or continuous changes in slope, but it provides a high level of accuracy for most of the test cases, which have regions of constant slope. Approaches based on a coordinate map** and on using a film of solid material with low shear speed on the rises of the stair steps that approximate a slo** interface are also tested and found to produce accurate results for some cases.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
Authors:
Liang-Chieh Chen,
Raphael Gontijo Lopes,
Bowen Cheng,
Maxwell D. Collins,
Ekin D. Cubuk,
Barret Zoph,
Hartwig Adam,
Jonathon Shlens
Abstract:
Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the exp…
▽ More
Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the expense of human annotation is especially large, yet large amounts of unlabeled data may exist. In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation. The goal of this work is to avoid the construction of sophisticated, learned architectures specific to label propagation (e.g., patch matching and optical flow). Instead, we simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data. The procedure is iterated for several times. As a result, our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.8% PQ, 42.6% AP, and 85.2% mIOU on the test set. We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences and extra images to surpass state-of-the-art performance on core computer vision tasks.
△ Less
Submitted 19 July, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
Authors:
Bowen Cheng,
Maxwell D. Collins,
Yukun Zhu,
Ting Liu,
Thomas S. Huang,
Hartwig Adam,
Liang-Chieh Chen
Abstract:
In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respect…
▽ More
In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.
△ Less
Submitted 11 March, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.
-
SegSort: Segmentation by Discriminative Sorting of Segments
Authors:
Jyh-**g Hwang,
Stella X. Yu,
Jianbo Shi,
Maxwell D. Collins,
Tien-Ju Yang,
Xiao Zhang,
Liang-Chieh Chen
Abstract:
Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In…
▽ More
Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In our approach, the optimal visual representation determines the right segmentation within individual images and associates segments with the same semantic classes across images. The core visual learning problem is therefore to maximize the similarity within segments and minimize the similarity between segments. Given a model trained this way, inference is performed consistently by extracting pixel-wise embeddings and clustering, with the semantic label determined by the majority vote of its nearest neighbors from an annotated set.
As a result, we present the SegSort, as a first attempt using deep learning for unsupervised semantic segmentation, achieving $76\%$ performance of its supervised counterpart. When supervision is available, SegSort shows consistent improvements over conventional approaches based on pixel-wise softmax training. Additionally, our approach produces more precise boundaries and consistent region predictions. The proposed SegSort further produces an interpretable result, as each choice of label can be easily understood from the retrieved nearest segments.
△ Less
Submitted 30 October, 2019; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Panoptic-DeepLab
Authors:
Bowen Cheng,
Maxwell D. Collins,
Yukun Zhu,
Ting Liu,
Thomas S. Huang,
Hartwig Adam,
Liang-Chieh Chen
Abstract:
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation…
▽ More
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas.
△ Less
Submitted 23 October, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
DeeperLab: Single-Shot Image Parser
Authors:
Tien-Ju Yang,
Maxwell D. Collins,
Yukun Zhu,
Jyh-**g Hwang,
Ting Liu,
Xiao Zhang,
Vivienne Sze,
George Papandreou,
Liang-Chieh Chen
Abstract:
We present a single-shot, bottom-up approach for whole image parsing. Whole image parsing, also known as Panoptic Segmentation, generalizes the tasks of semantic segmentation for 'stuff' classes and instance segmentation for 'thing' classes, assigning both semantic and instance labels to every pixel in an image. Recent approaches to whole image parsing typically employ separate standalone modules…
▽ More
We present a single-shot, bottom-up approach for whole image parsing. Whole image parsing, also known as Panoptic Segmentation, generalizes the tasks of semantic segmentation for 'stuff' classes and instance segmentation for 'thing' classes, assigning both semantic and instance labels to every pixel in an image. Recent approaches to whole image parsing typically employ separate standalone modules for the constituent semantic and instance segmentation tasks and require multiple passes of inference. Instead, the proposed DeeperLab image parser performs whole image parsing with a significantly simpler, fully convolutional approach that jointly addresses the semantic and instance segmentation tasks in a single-shot manner, resulting in a streamlined system that better lends itself to fast processing. For quantitative evaluation, we use both the instance-based Panoptic Quality (PQ) metric and the proposed region-based Parsing Covering (PC) metric, which better captures the image parsing quality on 'stuff' classes and larger object instances. We report experimental results on the challenging Mapillary Vistas dataset, in which our single model achieves 31.95% (val) / 31.6% PQ (test) and 55.26% PC (val) with 3 frames per second (fps) on GPU or near real-time speed (22.6 fps on GPU) with reduced accuracy.
△ Less
Submitted 12 March, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Authors:
Liang-Chieh Chen,
Maxwell D. Collins,
Yukun Zhu,
George Papandreou,
Barret Zoph,
Florian Schroff,
Hartwig Adam,
Jonathon Shlens
Abstract:
The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may…
▽ More
The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7\% on Cityscapes (street scene parsing), 71.3\% on PASCAL-Person-Part (person-part segmentation), and 87.9\% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Impulsive Noise Immunity of Multidimensional Pulse Position Modulation
Authors:
Michael D. Collins,
William B. Johnson
Abstract:
We describe block oriented multidimensional pulse position modulation and its resilience against impulsive noise. The modulation implements the encoder and part of the decoder of the BBC algorithm. We tested the modulation on circuits that send and detect a pulse based signal in the presence of impulsive noise. We measured the packet error rate vs. signal to noise ratio and we compared it with pub…
▽ More
We describe block oriented multidimensional pulse position modulation and its resilience against impulsive noise. The modulation implements the encoder and part of the decoder of the BBC algorithm. We tested the modulation on circuits that send and detect a pulse based signal in the presence of impulsive noise. We measured the packet error rate vs. signal to noise ratio and we compared it with published error rates for OFDM. We found an error rate of 2 x 10^(-5) at a signal to noise ratio of 16 dB without forward error correction and a data rate of 64 kbit /sec.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
A Deterministic Nonsmooth Frank Wolfe Algorithm with Coreset Guarantees
Authors:
Sathya N. Ravi,
Maxwell D. Collins,
Vikas Singh
Abstract:
We present a new Frank-Wolfe (FW) type algorithm that is applicable to minimization problems with a nonsmooth convex objective. We provide convergence bounds and show that the scheme yields so-called coreset results for various Machine Learning problems including 1-median, Balanced Development, Sparse PCA, Graph Cuts, and the $\ell_1$-norm-regularized Support Vector Machine (SVM) among others. Thi…
▽ More
We present a new Frank-Wolfe (FW) type algorithm that is applicable to minimization problems with a nonsmooth convex objective. We provide convergence bounds and show that the scheme yields so-called coreset results for various Machine Learning problems including 1-median, Balanced Development, Sparse PCA, Graph Cuts, and the $\ell_1$-norm-regularized Support Vector Machine (SVM) among others. This means that the algorithm provides approximate solutions to these problems in time complexity bounds that are not dependent on the size of the input problem. Our framework, motivated by a growing body of work on sublinear algorithms for various data analysis problems, is entirely deterministic and makes no use of smoothing or proximal operators. Apart from these theoretical results, we show experimentally that the algorithm is very practical and in some cases also offers significant computational advantages on large problem instances. We provide an open source implementation that can be adapted for other problems that fit the overall structure.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
Spatially Adaptive Computation Time for Residual Networks
Authors:
Michael Figurnov,
Maxwell D. Collins,
Yukun Zhu,
Li Zhang,
Jonathan Huang,
Dmitry Vetrov,
Ruslan Salakhutdinov
Abstract:
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segment…
▽ More
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions.
△ Less
Submitted 2 July, 2017; v1 submitted 7 December, 2016;
originally announced December 2016.
-
Efficient non-greedy optimization of decision trees
Authors:
Mohammad Norouzi,
Maxwell D. Collins,
Matthew Johnson,
David J. Fleet,
Pushmeet Kohli
Abstract:
Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the…
▽ More
Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective. We show that the problem of finding optimal linear-combination (oblique) splits for decision trees is related to structured prediction with latent variables, and we formulate a convex-concave upper bound on the tree's empirical loss. The run-time of computing the gradient of the proposed surrogate objective with respect to each training exemplar is quadratic in the the tree depth, and thus training deep trees is feasible. The use of stochastic gradient descent for optimization enables effective training with large datasets. Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines.
△ Less
Submitted 12 November, 2015;
originally announced November 2015.
-
CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits
Authors:
Mohammad Norouzi,
Maxwell D. Collins,
David J. Fleet,
Pushmeet Kohli
Abstract:
We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search to find good univariate split functions. In contrast, our method computes a linear combination of the features at each node, and optimizes the parameters of the…
▽ More
We propose a novel algorithm for optimizing multivariate linear threshold functions as split functions of decision trees to create improved Random Forest classifiers. Standard tree induction methods resort to sampling and exhaustive search to find good univariate split functions. In contrast, our method computes a linear combination of the features at each node, and optimizes the parameters of the linear combination (oblique) split functions by adopting a variant of latent variable SVM formulation. We develop a convex-concave upper bound on the classification loss for a one-level decision tree, and optimize the bound by stochastic gradient descent at each internal node of the tree. Forests of up to 1000 Continuously Optimized Oblique (CO2) decision trees are created, which significantly outperform Random Forest with univariate splits and previous techniques for constructing oblique trees. Experimental results are reported on multi-class classification benchmarks and on Labeled Faces in the Wild (LFW) dataset.
△ Less
Submitted 24 June, 2015; v1 submitted 19 June, 2015;
originally announced June 2015.
-
Memory Bounded Deep Convolutional Networks
Authors:
Maxwell D. Collins,
Pushmeet Kohli
Abstract:
In this work, we investigate the use of sparsity-inducing regularizers during training of Convolution Neural Networks (CNNs). These regularizers encourage that fewer connections in the convolution and fully connected layers take non-zero values and in effect result in sparse connectivity between hidden units in the deep network. This in turn reduces the memory and runtime cost involved in deployin…
▽ More
In this work, we investigate the use of sparsity-inducing regularizers during training of Convolution Neural Networks (CNNs). These regularizers encourage that fewer connections in the convolution and fully connected layers take non-zero values and in effect result in sparse connectivity between hidden units in the deep network. This in turn reduces the memory and runtime cost involved in deploying the learned CNNs. We show that training with such regularization can still be performed using stochastic gradient descent implying that it can be used easily in existing codebases. Experimental evaluation of our approach on MNIST, CIFAR, and ImageNet datasets shows that our regularizers can result in dramatic reductions in memory requirements. For instance, when applied on AlexNet, our method can reduce the memory consumption by a factor of four with minimal loss in accuracy.
△ Less
Submitted 3 December, 2014;
originally announced December 2014.
-
Line tensions, correlation lengths, and critical exponents in lipid membranes near critical points
Authors:
Aurelia R. Honerkamp-Smith,
Pietro Cicuta,
Marcus D. Collins,
Sarah L. Veatch,
Marcel den Nijs,
M. Schick,
Sarah L. Keller
Abstract:
Membranes containing a wide variety of ternary mixtures of high chain-melting temperature lipids, low chain-melting temperature lipids, and cholesterol undergo lateral phase separartion into coexisting liquid phases at a miscibility transition. When membranes are prepared from a ternary lipid mixture at a critical composition, they pass through a miscibility critical point at the transition temp…
▽ More
Membranes containing a wide variety of ternary mixtures of high chain-melting temperature lipids, low chain-melting temperature lipids, and cholesterol undergo lateral phase separartion into coexisting liquid phases at a miscibility transition. When membranes are prepared from a ternary lipid mixture at a critical composition, they pass through a miscibility critical point at the transition temperature. Since the critical temperature is typically on the order of room temperature, membranes provide an unusual opportunity in which to perform a quantitative study of biophysical systems that exhibit critical phenomena in the two-dimensional Ising universality class. As a critical point is approached from either high or low temperature, the scale of fluctuations in lipid composition, set by the correlation length, diverges. In addition, as a critical point is approached from low temperature, the line tension between coexisting phases decreases to zero. Here we quantitatively evaluate the temperature dependence of line tension between liquid domains and of fluctuation correlation lengths in lipid membranes in order to extract a critical exponent, nu. We obtain nu=1.2 plus or minus 0.2, consistent with the Ising model prediction nu=1. We also evaluate the probability distributions of pixel intensities in fluoresence images of membranes. From the temperature dependence of these distributions above the critical temperature, we extract an independent critical exponent beta=0.124 plus or minus 0.03 which is consistent with the Ising prediction of beta=1/8.
△ Less
Submitted 20 February, 2008;
originally announced February 2008.
-
Imaging density disturbances in water with 41.3 attosecond time resolution
Authors:
P. Abbamonte,
K. D. Finkelstein,
M. D. Collins,
S. M. Gruner
Abstract:
We show that the momentum flexibility of inelastic x-ray scattering may be exploited to invert its loss function, alowing real time imaging of density disturbances in a medium. We show the disturbance arising from a point source in liquid water, with a resolution of 41.3 attoseconds ($4.13 \times 10^{-17}$ sec) and 1.27 $Å$ ($1.27 \times 10^{-8}$ cm). This result is used to determine the structu…
▽ More
We show that the momentum flexibility of inelastic x-ray scattering may be exploited to invert its loss function, alowing real time imaging of density disturbances in a medium. We show the disturbance arising from a point source in liquid water, with a resolution of 41.3 attoseconds ($4.13 \times 10^{-17}$ sec) and 1.27 $Å$ ($1.27 \times 10^{-8}$ cm). This result is used to determine the structure of the electron cloud around a photoexcited molecule in solution, as well as the wake generated in water by a 9 MeV gold ion. We draw an analogy with pump-probe techniques and suggest that energy-loss scattering may be applied more generally to the study of attosecond phenomena.
△ Less
Submitted 17 November, 2003;
originally announced November 2003.