Search | arXiv e-print repository

arXiv:2010.14580 [pdf, other]

HOPPY: An open-source and low-cost kit for dynamic robotics education

Authors: Joao Ramos, Yanran Ding, Young-woo Sim, Kevin Murphy, Daniel Block

Abstract: This letter introduces HOPPY, an open-source, low-cost, robust, and modular kit for robotics education. The robot dynamically hops around a rotating gantry with a fixed base. The kit lowers the entry barrier for studying dynamic robots and legged locomotion in real systems. The kit bridges the theoretical content of fundamental robotic courses and real dynamic robots by facilitating and guiding th… ▽ More This letter introduces HOPPY, an open-source, low-cost, robust, and modular kit for robotics education. The robot dynamically hops around a rotating gantry with a fixed base. The kit lowers the entry barrier for studying dynamic robots and legged locomotion in real systems. The kit bridges the theoretical content of fundamental robotic courses and real dynamic robots by facilitating and guiding the software and hardware integration. This letter describes the topics which can be studied using the kit, lists its components, discusses best practices for implementation, presents results from experiments with the simulator and the real system, and suggests further improvements. A simple controller is described to achieve velocities up to 2m/s, navigate small objects, and mitigate external disturbances (kicks). HOPPY was utilized as the topic of a semester-long project for the Robot Dynamics and Control course at the University of Illinois at Urbana-Champaign. Students provided an overwhelmingly positive feedback from the hands-on activities during the course and the instructors will continue to improve the kit for upcoming semesters. △ Less

Submitted 27 October, 2020; originally announced October 2020.

arXiv:2009.07932 [pdf, other]

doi 10.1007/s00373-022-02564-1

On Weak Flexibility in Planar Graphs

Authors: Bernard Lidický, Tomáš Masařík, Kyle Murphy, Shira Zerbib

Abstract: Recently, Dvořák, Norin, and Postle introduced flexibility as an extension of list coloring on graphs [JGT 19']. In this new setting, each vertex $v$ in some subset of $V(G)$ has a request for a certain color $r(v)$ in its list of colors $L(v)$. The goal is to find an $L$ coloring satisfying many, but not necessarily all, of the requests. The main studied question is whether there exists a unive… ▽ More Recently, Dvořák, Norin, and Postle introduced flexibility as an extension of list coloring on graphs [JGT 19']. In this new setting, each vertex $v$ in some subset of $V(G)$ has a request for a certain color $r(v)$ in its list of colors $L(v)$. The goal is to find an $L$ coloring satisfying many, but not necessarily all, of the requests. The main studied question is whether there exists a universal constant $ε>0$ such that any graph $G$ in some graph class $\mathcal{C}$ satisfies at least $ε$ proportion of the requests. More formally, for $k > 0$ the goal is to prove that for any graph $G \in \mathcal{C}$ on vertex set $V$, with any list assignment $L$ of size $k$ for each vertex, and for every $R \subseteq V$ and a request vector $(r(v): v\in R, ~r(v) \in L(v))$, there exists an $L$-coloring of $G$ satisfying at least $ε|R|$ requests. If this is true, then $\mathcal{C}$ is called $ε$-flexible for lists of size $k$. Choi et al. [arXiv 20'] introduced the notion of weak flexibility, where $R = V$. We further develop this direction by introducing a tool to handle weak flexibility. We demonstrate this new tool by showing that for every positive integer $b$ there exists $ε(b)>0$ so that the class of planar graphs without $K_4, C_5 , C_6 , C_7, B_b$ is weakly $ε(b)$-flexible for lists of size $4$ (here $K_n$, $C_n$ and $B_n$ are the complete graph, a cycle, and a book on $n$ vertices, respectively). We also show that the class of planar graphs without $K_4, C_5 , C_6 , C_7, B_5$ is $ε$-flexible for lists of size $4$. The results are tight as these graph classes are not even 3-colorable. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: 30 pages, 9 figures

MSC Class: 05C15

Journal ref: Graphs and Combinatorics 38(6), 180:1-180:33, 2022

arXiv:2007.13584 [pdf, other]

doi 10.1103/PhysRevLett.125.237002

Unconventional bulk superconductivity in YFe$_2$Ge$_2$ single crystals

Authors: Jiasheng Chen, Monika B. Gamża, Jacintha Banda, Keiron Murphy, James Tarrant, Manuel Brando, F. Malte Grosche

Abstract: Using a new horizontal flux growth technique to produce high quality crystals of the unconventional superconductor YFe$_2$Ge$_2$ has led to a seven-fold reduction in disorder scattering, resulting in mm-sized crystals with residual resistivities $\simeq \SI{0.45}{\micro\ohm\centi\meter}$, resistivity ratios $\simeq 430$ and sharp superconducting heat capacity anomalies. This enables searching mult… ▽ More Using a new horizontal flux growth technique to produce high quality crystals of the unconventional superconductor YFe$_2$Ge$_2$ has led to a seven-fold reduction in disorder scattering, resulting in mm-sized crystals with residual resistivities $\simeq \SI{0.45}{\micro\ohm\centi\meter}$, resistivity ratios $\simeq 430$ and sharp superconducting heat capacity anomalies. This enables searching multi-probe experiments investigating the normal and superconducting states of YFe$_2$Ge$_2$. Low temperature heat capacity measurements suggest a significant residual Sommerfeld coefficient, consistent with in-gap states induced by residual disorder as predicted for a sign-changing order parameter. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: 5 pages, 6 figures

arXiv:2007.03064 [pdf, other]

doi 10.1016/j.ejc.2021.103367

Maximizing five-cycles in $K_r$-free graphs

Authors: Bernard Lidický, Kyle Murphy

Abstract: The Erdős Pentagon problem asks to find an $n$-vertex triangle-free graph that is maximizing the number of $5$-cycles. The problem was solved using flag algebras by Grzesik and independently by Hatami, Hladký, Král', Norin, and Razborov. Recently, Palmer suggested the general problem of maximizing the number of $5$-cycles in $K_{k+1}$-free graphs. Using flag algebras, we show that every $K_{k+1}$-… ▽ More The Erdős Pentagon problem asks to find an $n$-vertex triangle-free graph that is maximizing the number of $5$-cycles. The problem was solved using flag algebras by Grzesik and independently by Hatami, Hladký, Král', Norin, and Razborov. Recently, Palmer suggested the general problem of maximizing the number of $5$-cycles in $K_{k+1}$-free graphs. Using flag algebras, we show that every $K_{k+1}$-free graph of order $n$ contains at most \[\frac{1}{10k^4}(k^4 - 5k^3 + 10k^2 - 10k + 4)n^5 + o(n^5)\] copies of $C_5$ for any $k \geq 3$, with the Turán graph begin the extremal graph for large enough $n$. △ Less

Submitted 11 May, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 26 pages

arXiv:2006.03227 [pdf, other]

doi 10.1111/j.1365-246X.2006.03227.x

Population-Based Black-Box Optimization for Biological Sequence Design

Authors: Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley

Abstract: The use of black-box optimization for the design of new biological sequences is an emerging research area with potentially revolutionary impact. The cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds of large batches of sequences--a setting that off-the-shelf black-box optimization methods are ill-equipped to handle. We find that the perfor… ▽ More The use of black-box optimization for the design of new biological sequences is an emerging research area with potentially revolutionary impact. The cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds of large batches of sequences--a setting that off-the-shelf black-box optimization methods are ill-equipped to handle. We find that the performance of existing methods varies drastically across optimization tasks, posing a significant obstacle to real-world applications. To improve robustness, we propose Population-Based Black-Box Optimization (P3BO), which generates batches of sequences by sampling from an ensemble of methods. The number of sequences sampled from any method is proportional to the quality of sequences it previously proposed, allowing P3BO to combine the strengths of individual methods while hedging against their innate brittleness. Adapting the hyper-parameters of each of the methods online using evolutionary optimization further improves performance. Through extensive experiments on in-silico optimization tasks, we show that P3BO outperforms any single method in its population, proposing higher quality sequences as well as more diverse batches. As such, P3BO and Adaptive-P3BO are a crucial step towards deploying ML to real-world sequence design. △ Less

Submitted 10 July, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Journal ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

arXiv:2005.03675 [pdf, other]

Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Authors: Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

Abstract: There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second,… ▽ More There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning. The third, graph neural networks, aims to learn differentiable functions over discrete topologies with arbitrary structure. However, despite the popularity of these areas there has been surprisingly little work on unifying the three paradigms. Here, we aim to bridge the gap between graph neural networks, network embedding and graph regularization models. We propose a comprehensive taxonomy of representation learning methods for graph-structured data, aiming to unify several disparate bodies of work. Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which generalizes popular algorithms for semi-supervised learning on graphs (e.g. GraphSage, Graph Convolutional Networks, Graph Attention Networks), and unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc) into a single consistent approach. To illustrate the generality of this approach, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area. △ Less

Submitted 11 April, 2022; v1 submitted 7 May, 2020; originally announced May 2020.

arXiv:2004.11938 [pdf, other]

Towards Differentiable Resampling

Authors: Michael Zhu, Kevin Murphy, Rico Jonschkowski

Abstract: Resampling is a key component of sample-based recursive state estimation in particle filters. Recent work explores differentiable particle filters for end-to-end learning. However, resampling remains a challenge in these works, as it is inherently non-differentiable. We address this challenge by replacing traditional resampling with a learned neural network resampler. We present a novel network ar… ▽ More Resampling is a key component of sample-based recursive state estimation in particle filters. Recent work explores differentiable particle filters for end-to-end learning. However, resampling remains a challenge in these works, as it is inherently non-differentiable. We address this challenge by replacing traditional resampling with a learned neural network resampler. We present a novel network architecture, the particle transformer, and train it for particle resampling using a likelihood-based loss function over sets of particles. Incorporated into a differentiable particle filter, our model can be end-to-end optimized jointly with the other particle filter components via gradient descent. Our results show that our learned resampler outperforms traditional resampling techniques on synthetic data and in a simulated robot localization task. △ Less

Submitted 24 April, 2020; originally announced April 2020.

arXiv:2002.08927 [pdf, other]

Regularized Autoencoders via Relaxed Injective Probability Flow

Authors: Abhishek Kumar, Ben Poole, Kevin Murphy

Abstract: Invertible flow-based generative models are an effective method for learning to generate samples, while allowing for tractable likelihood computation and inference. However, the invertibility requirement restricts models to have the same latent dimensionality as the inputs. This imposes significant architectural, memory, and computational costs, making them more challenging to scale than other cla… ▽ More Invertible flow-based generative models are an effective method for learning to generate samples, while allowing for tractable likelihood computation and inference. However, the invertibility requirement restricts models to have the same latent dimensionality as the inputs. This imposes significant architectural, memory, and computational costs, making them more challenging to scale than other classes of generative models such as Variational Autoencoders (VAEs). We propose a generative model based on probability flows that does away with the bijectivity requirement on the model and only assumes injectivity. This also provides another perspective on regularized autoencoders (RAEs), with our final objectives resembling RAEs with specific regularizers that are derived by lower bounding the probability flow objective. We empirically demonstrate the promise of the proposed model, improving over VAEs and AEs in terms of sample quality. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: AISTATS 2020

arXiv:2001.01827 [pdf]

doi 10.1063/1.5144531

Broadband Free Space Impedance in $\mathrm{Co_2Z}$ Hexaferrites by Substitution of Quadrivalent Heavy Transition Metal Ions for Miniaturized RF Devices

Authors: Piotr Kulik, Gavin Winter, Alexander Sokolov, Katherine Murphy, Chengju Yu, Kun Qian, Ogheneyunume Fitchorova, Vincent Harris

Abstract: Polycrystalline samples of Z-type hexaferrites, having nominal compositions $\mathrm{Ba_3Co_{2+x}Fe_{24-2x}M_xO_{41}}$ where M = $\mathrm{Ir^{4+}, Hf^{4+}, Mo^{4+}}$ and x=0 and 0.05, were processed via ceramic processing protocols in pursuit of low magnetic and dielectric losses as well as equivalent permittivity and permeability. Fine process control was conducted to ensure optimal magnetic prop… ▽ More Polycrystalline samples of Z-type hexaferrites, having nominal compositions $\mathrm{Ba_3Co_{2+x}Fe_{24-2x}M_xO_{41}}$ where M = $\mathrm{Ir^{4+}, Hf^{4+}, Mo^{4+}}$ and x=0 and 0.05, were processed via ceramic processing protocols in pursuit of low magnetic and dielectric losses as well as equivalent permittivity and permeability. Fine process control was conducted to ensure optimal magnetic properties. Organic dispersants (i.e., isobutylene and maleic anhydride) were employed to achieve maximum densities. Crystallographic structure, characterized by X-ray diffraction, revealed that do** with $\mathrm{Ir^{4+}, Hf^{4+}, Mo^{4+}}$ did not adversely affect the crystal structure and phase purity of the Z-type hexaferrite. The measured microwave and magnetic properties show that the resonant frequency shifts depending on the specific dopant allowing for tunability of the operational frequency and bandwidth. The frequency bandwidth in which permittivity and permeability are very near equal (i.e., ~400 MHz for $\mathrm{Mo^{4+}}$ (x), where x=0.05 do**) is shown to occur at frequencies between 0.2 and 1.0 GHz depending on dopant type. These results give rise to low loss at 650 MHz, with considerable size reduction of an order of magnitude, while maintaining the characteristic impedance of free space (i.e., 377 $\mathrmΩ$). These results allow for miniaturization and optimized band-pass performance of magnetodielectric materials for communication devices such as antenna and radomes that can be engineered to operate over desired frequency ranges using cost effective and volumetric processing methodologies. △ Less

Submitted 6 January, 2020; originally announced January 2020.

arXiv:1912.06445 [pdf, other]

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

Authors: Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

Abstract: This paper studies the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes. We make two main contributions. The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals. This provides t… ▽ More This paper studies the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes. We make two main contributions. The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals. This provides the first benchmark for quantitative evaluation of the models to predict multi-future trajectories. The second contribution is a new model to generate multiple plausible future trajectories, which contains novel designs of using multi-scale location encodings and convolutional RNNs over graphs. We refer to our model as Multiverse. We show that our model achieves the best results on our dataset, as well as on the real-world VIRAT/ActEV dataset (which just contains one possible future). △ Less

Submitted 28 March, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

Comments: CVPR 2020. Code, models and dataset are available at: https://next.cs.cmu.edu/multiverse/index.html

arXiv:1911.08631 [pdf, other]

doi 10.1038/s41467-019-13464-z

Weyl-like points from band inversions of spin-polarised surface states in NbGeSb

Authors: I. Marković, C. A. Hooley, O. J. Clark, F. Mazzola, M. D. Watson, J. M. Riley, K. Volckaert, K. Underwood, M. S. Dyer, P. A. E. Murgatroyd, K. J. Murphy, P. Le Fèvre, F. Bertran, J. Fujii, I. Vobornik, S. Wu, T. Okuda, J. Alaria, P. D. C. King

Abstract: Band inversions are key to stabilising a variety of novel electronic states in solids, from topological surface states in inverted bulk band gaps of topological insulators to the formation of symmetry-protected three-dimensional Dirac and Weyl points and nodal-line semimetals. Here, we create a band inversion not of bulk states, but rather between manifolds of surface states. We realise this by al… ▽ More Band inversions are key to stabilising a variety of novel electronic states in solids, from topological surface states in inverted bulk band gaps of topological insulators to the formation of symmetry-protected three-dimensional Dirac and Weyl points and nodal-line semimetals. Here, we create a band inversion not of bulk states, but rather between manifolds of surface states. We realise this by aliovalent substitution of Nb for Zr and Sb for S in the ZrSiS family of nonsymmorphic semimetals. Using angle-resolved photoemission and density-functional theory, we show how two pairs of surface states, known from ZrSiS, are driven to intersect each other in the vicinity of the Fermi level in NbGeSb, as well as to develop pronounced spin-orbit mediated spin splittings. We demonstrate how mirror symmetry leads to protected crossing points in the resulting spin-orbital entangled surface band structure, thereby stabilising surface state analogues of three-dimensional Weyl points. More generally, our observations suggest new opportunities for engineering topologically and symmetry-protected states via band inversions of surface states. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: In press at Nature Communications. This is the originally submitted manuscript prior to changes during the review process. Contains 20+6 pages, including Supplementary Information

arXiv:1911.01560 [pdf, other]

doi 10.1103/PhysRevLett.124.168002

Memory in nonmonotonic stress relaxation of a granular system

Authors: Kieran A. Murphy, Jonathon W. Kruppe, Heinrich M. Jaeger

Abstract: We demonstrate experimentally that a granular packing of glass spheres is capable of storing memory of multiple strain states in the dynamic process of stress relaxation. Modeling the system as a non-interacting population of relaxing elements, we find that the functional form of the predicted relaxation requires a quantitative correction which grows in severity with each additional memory and is… ▽ More We demonstrate experimentally that a granular packing of glass spheres is capable of storing memory of multiple strain states in the dynamic process of stress relaxation. Modeling the system as a non-interacting population of relaxing elements, we find that the functional form of the predicted relaxation requires a quantitative correction which grows in severity with each additional memory and is suggestive of interactions between elements. Our findings have implications for the broad class of soft matter systems that display memory and anomalous relaxation. △ Less

Submitted 16 November, 2019; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: 5 pages, 5 figures

Journal ref: Phys. Rev. Lett. 124, 168002 (2020)

arXiv:1910.09588 [pdf, other]

Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems

Authors: Zhe Dong, Bryan A. Seybold, Kevin P. Murphy, Hung H. Bui

Abstract: We propose an efficient inference method for switching nonlinear dynamical systems. The key idea is to learn an inference network which can be used as a proposal distribution for the continuous latent variables, while performing exact marginalization of the discrete latent variables. This allows us to use the reparameterization trick, and apply end-to-end training with stochastic gradient descent.… ▽ More We propose an efficient inference method for switching nonlinear dynamical systems. The key idea is to learn an inference network which can be used as a proposal distribution for the continuous latent variables, while performing exact marginalization of the discrete latent variables. This allows us to use the reparameterization trick, and apply end-to-end training with stochastic gradient descent. We show that the proposed method can successfully segment time series data, including videos and 3D human pose, into meaningful ``regimes'' by using the piece-wise nonlinear dynamics. △ Less

Submitted 10 February, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1908.07963 [pdf, ps, other]

doi 10.1111/rssa.12712

Clustering Longitudinal Life-Course Sequences Using Mixtures of Exponential-Distance Models

Authors: Keefe Murphy, Thomas Brendan Murphy, Raffaella Piccarreta, Isobel Claire Gormley

Abstract: Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis… ▽ More Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership. △ Less

Submitted 21 December, 2021; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: Published in Journal of the Royal Statistical Society: Series A (Statistics in Society)

Journal ref: Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451 (2021)

arXiv:1907.01253 [pdf, other]

FRODO: Free rejection of out-of-distribution samples: application to chest x-ray analysis

Authors: Erdi Çallı, Keelin Murphy, Ecem Sogancioglu, Bram van Ginneken

Abstract: In this work, we propose a method to reject out-of-distribution samples which can be adapted to any network architecture and requires no additional training data. Publicly available chest x-ray data (38,353 images) is used to train a standard ResNet-50 model to detect emphysema. Feature activations of intermediate layers are used as descriptors defining the training data distribution. A novel metr… ▽ More In this work, we propose a method to reject out-of-distribution samples which can be adapted to any network architecture and requires no additional training data. Publicly available chest x-ray data (38,353 images) is used to train a standard ResNet-50 model to detect emphysema. Feature activations of intermediate layers are used as descriptors defining the training data distribution. A novel metric, FRODO, is measured by using the Mahalanobis distance of a new test sample to the training data distribution. The method is tested using a held-out test dataset of 21,176 chest x-rays (in-distribution) and a set of 14,821 out-of-distribution x-ray images of incorrect orientation or anatomy. In classifying test samples as in or out-of distribution, our method achieves an AUC score of 0.99. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/H1e7kWD794

arXiv:1906.07889 [pdf, other]

Unsupervised Learning of Object Structure and Dynamics from Videos

Authors: Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, Honglak Lee

Abstract: Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints. Future frames are reconstructed from the keypoints and a reference frame. By modeling dynamics in the keypoint coordinate space, we achieve… ▽ More Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning. To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints. Future frames are reconstructed from the keypoints and a reference frame. By modeling dynamics in the keypoint coordinate space, we achieve stable learning and avoid compounding of errors in pixel space. Our method improves upon unstructured representations both for pixel-level video prediction and for downstream tasks requiring object-level understanding of motion dynamics. We evaluate our model on diverse datasets: a multi-agent sports dataset, the Human3.6M dataset, and datasets based on continuous control tasks from the DeepMind Control Suite. The spatially structured representation outperforms unstructured representations on a range of motion-related tasks such as object tracking, action recognition and reward prediction. △ Less

Submitted 2 March, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1906.07343 [pdf, other]

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Authors: Yiding Jiang, Shixiang Gu, Kevin Murphy, Chelsea Finn

Abstract: Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn concepts and sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet ge… ▽ More Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn concepts and sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems. Our approach learns an instruction-following low-level policy and a high-level policy that can reuse abstractions across tasks, in essence, permitting agents to reason using structured language. To study compositional task learning, we introduce an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine. We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis reveals that the compositional nature of language is critical for learning diverse sub-skills and systematically generalizing to new sub-skills in comparison to non-compositional abstractions that use the same supervision. △ Less

Submitted 18 November, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: Published in Neural Information Processing Systems (NeurIPS) 2019; Supplementary materials: https://sites.google.com/view/hal-demo

arXiv:1906.06792 [pdf, other]

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

Authors: Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

Abstract: We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on syntheti… ▽ More We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on synthetic and finetuning on real; (3) jointly predict normals and semantics using a shared model, but only backpropagate errors on pixels that have valid training labels; (4) slim down the model and use grayscale instead of color inputs. Despite the simplicity of these steps, we demonstrate consistently improved results on several datasets, using a model that runs at 12 fps on a standard mobile phone. △ Less

Submitted 16 June, 2019; originally announced June 2019.

arXiv:1906.05743 [pdf, other]

Learning Video Representations using Contrastive Bidirectional Transformer

Authors: Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid

Abstract: This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive e… ▽ More This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive estimation (NCE). We also show how to learn representations from sequences of visual features and sequences of words derived from ASR (automatic speech recognition), and show that such cross-modal training (when possible) helps even more. △ Less

Submitted 27 September, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

arXiv:1905.10474 [pdf, ps, other]

A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

Authors: David H. Brookes, Akosua Busia, Clara Fannjiang, Kevin Murphy, Jennifer Listgarten

Abstract: We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to rea… ▽ More We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs. △ Less

Submitted 10 June, 2022; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1904.04231 [pdf, other]

Relational Action Forecasting

Authors: Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid

Abstract: This paper focuses on multi-person action forecasting in videos. More precisely, given a history of H previous frames, the goal is to detect actors and to predict their future actions for the next T frames. Our approach jointly models temporal and spatial interactions among different actors by constructing a recurrent graph, using actor proposals obtained with Faster R-CNN as nodes. Our method lea… ▽ More This paper focuses on multi-person action forecasting in videos. More precisely, given a history of H previous frames, the goal is to detect actors and to predict their future actions for the next T frames. Our approach jointly models temporal and spatial interactions among different actors by constructing a recurrent graph, using actor proposals obtained with Faster R-CNN as nodes. Our method learns to select a subset of discriminative relations without requiring explicit supervision, thus enabling us to tackle challenging visual data. We refer to our model as Discriminative Relational Recurrent Network (DRRN). Evaluation of action prediction on AVA demonstrates the effectiveness of our proposed method compared to simpler baselines. Furthermore, we significantly improve performance on the task of early action classification on J-HMDB, from the previous SOTA of 48% to 60%. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: CVPR 2019 (oral)

arXiv:1904.03701 [pdf, other]

doi 10.1088/1748-0221/14/06/P06022

Unsupervised Learning Methods in X-ray Spectral Imaging Material Segmentation

Authors: Jericho O'Connell, Kevin Murphy, Spencer Robinson, Kris Iniewski, Magdalena Bazalova-Carter

Abstract: In this work, we have investigated a number of unsupervised learning methods for material segmentation in projection x-ray imaging with a spectral detector. A phantom containing two hard materials (glass, steel) and three soft materials (PVC, polypropylene, and PFTE) all embedded in PMMA was imaged with a 5 energy bin spectal detector. The projection images were utilized to test nine unsupervised… ▽ More In this work, we have investigated a number of unsupervised learning methods for material segmentation in projection x-ray imaging with a spectral detector. A phantom containing two hard materials (glass, steel) and three soft materials (PVC, polypropylene, and PFTE) all embedded in PMMA was imaged with a 5 energy bin spectal detector. The projection images were utilized to test nine unsupervised learning algorithms for automated material segmentation. Each algorithm was investigated using single energy (SE), dual energy (DE) and multi energy (ME) images. Clustering results were scored based on homogeneity and completeness of the clusters, which were combined into the Rosenberg and Hirshberg's V-measure. Principle component analysis (PCA), independent component analysis (ICA), and non-negative matrix factorization (NMF) were tested as dimensional reduction methods. ME, DE and SE material segmentation was performed using five, two, and single energy images, respectively. ME had the highest V-measure on the soft materials using PCA and a novel interpolating bayesian gaussian mixture model (BGMM) clustering with a V-measure of 0.71. This was by 3.5% better than DE and 20.3% better than SE. Conversely, SE imaging was most capable of hard tissue segmentation using the standard BGMM, with a V-measures of 0.84. This was 6.3% better than DE and 5.0% better than ME. This work demonstrated that ME x-ray imaging might be superior in segmenting soft tissues compared to conventional SE x-ray imaging. △ Less

Submitted 10 June, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Journal ref: JINST 14 (2019)

arXiv:1904.01766 [pdf, other]

VideoBERT: A Joint Model for Video and Language Representation Learning

Authors: Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid

Abstract: Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches learn low-level representations, we propose a joint visual-linguistic model to learn high-level features without any explicit supervision. In particular, inspired by its recent success in language modeling, we build upon the BE… ▽ More Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches learn low-level representations, we propose a joint visual-linguistic model to learn high-level features without any explicit supervision. In particular, inspired by its recent success in language modeling, we build upon the BERT model to learn bidirectional joint distributions over sequences of visual and linguistic tokens, derived from vector quantization of video data and off-the-shelf speech recognition outputs, respectively. We use VideoBERT in numerous tasks, including action classification and video captioning. We show that it can be applied directly to open-vocabulary classification, and confirm that large amounts of training data and cross-modal information are critical to performance. Furthermore, we outperform the state-of-the-art on video captioning, and quantitative results verify that the model learns high-level semantic features. △ Less

Submitted 11 September, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: ICCV 2019 camera ready

arXiv:1903.05136 [pdf, other]

Unsupervised Discovery of Parts, Structure, and Dynamics

Authors: Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

Abstract: Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, fir… ▽ More Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: ICLR 2019. The first two authors contributed equally to this work

arXiv:1903.03349 [pdf]

doi 10.1038/s41598-020-62148-y

Computer aided detection of tuberculosis on chest radiographs: An evaluation of the CAD4TB v6 system

Authors: Keelin Murphy, Shifa Salman Habib, Syed Mohammad Asad Zaidi, Saira Khowaja, Aamir Khan, Jaime Melendez, Ernst T. Scholten, Farhan Amad, Steven Schalekamp, Maurits Verhagen, Rick H. H. M. Philipsen, Annet Meijers, Bram van Ginneken

Abstract: There is a growing interest in the automated analysis of chest X-Ray (CXR) as a sensitive and inexpensive means of screening susceptible populations for pulmonary tuberculosis. In this work we evaluate the latest version of CAD4TB, a commercial software platform designed for this purpose. Version 6 of CAD4TB was released in 2018 and is here tested on a fully independent dataset of 5565 CXR images… ▽ More There is a growing interest in the automated analysis of chest X-Ray (CXR) as a sensitive and inexpensive means of screening susceptible populations for pulmonary tuberculosis. In this work we evaluate the latest version of CAD4TB, a commercial software platform designed for this purpose. Version 6 of CAD4TB was released in 2018 and is here tested on a fully independent dataset of 5565 CXR images with GeneXpert (Xpert) sputum test results available (854 Xpert positive subjects). A subset of 500 subjects (50% Xpert positive) was reviewed and annotated by 5 expert observers independently to obtain a radiological reference standard. The latest version of CAD4TB is found to outperform all previous versions in terms of area under receiver operating curve (ROC) with respect to both Xpert and radiological reference standards. Improvements with respect to Xpert are most apparent at high sensitivity levels with a specificity of 76% obtained at a fixed 90% sensitivity. When compared with the radiological reference standard, CAD4TB v6 also outperformed previous versions by a considerable margin and achieved 98% specificity at the 90% sensitivity setting. No substantial difference was found between the performance of CAD4TB v6 and any of the various expert observers against the Xpert reference standard. A cost and efficiency analysis on this dataset demonstrates that in a standard clinical situation, operating at 90% sensitivity, users of CAD4TB v6 can process 132 subjects per day at n average cost per screen of \$5.95 per subject, while users of version 3 process only 85 subjects per day at a cost of \$8.38 per subject. At all tested operating points version 6 is shown to be more efficient and cost effective than any other version. △ Less

Submitted 2 April, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: Published in Scientific Reports

Journal ref: Scientific Reports 10, 5492 (2020)

arXiv:1902.09641 [pdf, other]

Stochastic Prediction of Multi-Agent Interactions from Partial Observations

Authors: Chen Sun, Per Karlsson, Jiajun Wu, Joshua B Tenenbaum, Kevin Murphy

Abstract: We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to for… ▽ More We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: ICLR 2019 camera ready

arXiv:1902.09635 [pdf, other]

NAS-Bench-101: Towards Reproducible Neural Architecture Search

Authors: Chris Ying, Aaron Klein, Esteban Real, Eric Christiansen, Kevin Murphy, Frank Hutter

Abstract: Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constru… ▽ More Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the pre-computed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms. △ Less

Submitted 14 May, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: Published in the Proceedings of the 36th International Conference on Machine Learning

arXiv:1902.09345 [pdf, other]

The homeostatic dynamics of feeding behaviour identify novel mechanisms of anorectic agents

Authors: Thomas M McGrath, Eleanor Spreckley, Aina Fernandez Rodriguez, Carlo Viscomi, Amin Alamshah, Elina Akalestou, Kevin G Murphy, Nick S Jones

Abstract: Better understanding of feeding behaviour will be vital in reducing obesity and metabolic syndrome, but we lack a standard model that captures the complexity of feeding behaviour. We construct an accurate stochastic model of rodent feeding at the bout level in order to perform quantitative behavioural analysis. Analysing the different effects on feeding behaviour of PYY 3-36, lithium chloride, GLP… ▽ More Better understanding of feeding behaviour will be vital in reducing obesity and metabolic syndrome, but we lack a standard model that captures the complexity of feeding behaviour. We construct an accurate stochastic model of rodent feeding at the bout level in order to perform quantitative behavioural analysis. Analysing the different effects on feeding behaviour of PYY 3-36, lithium chloride, GLP-1 and leptin shows the precise behavioural changes caused by each anorectic agent, and demonstrates that these changes do not mimic satiety. In the ad libitum fed state during the light period, meal initiation is governed by complete stomach emptying, whereas in all other conditions there is a graduated response. We show how robust homeostatic control of feeding thwarts attempts to reduce food intake, and how this might be overcome. In silico experiments suggest that introducing a minimum intermeal interval or modulating gastric emptying can be as effective as anorectic drug administration. △ Less

Submitted 9 May, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

arXiv:1902.03280 [pdf, other]

The intertwined roles of particle shape and surface roughness in controlling the shear strength of a granular material

Authors: Kieran A. Murphy, Arthur K. MacKeith, Leah K. Roth, Heinrich M. Jaeger

Abstract: Both the shape of individual particles and their surface properties contribute to the strength of a granular material under shear. Here we show the degree to which these two aspects can be intertwined. In experiments on assemblies of 3D printed, convex lens-shaped particles, we measure the stress-strain response under repeated compressive loading and find that the aggregate's shear strength falls… ▽ More Both the shape of individual particles and their surface properties contribute to the strength of a granular material under shear. Here we show the degree to which these two aspects can be intertwined. In experiments on assemblies of 3D printed, convex lens-shaped particles, we measure the stress-strain response under repeated compressive loading and find that the aggregate's shear strength falls rapidly when compared to other particle shapes. We probe the granular material at mm-scales with X-ray computed tomography and $μ$m-scales with high-resolution surface metrology to look for the cause of the degradation. We find that wear due to accumulated deformation smooths out the lens surfaces in a controlled and systematic manner that correlates with a significant loss of shear strength observed for the assembly as a whole. The sensitivity of lenses to changes in surface properties contrasts with results for assemblies of 3D printed tetrahedra and spheres, which under the same load cycling are found to exhibit only minor degradation in strength. This case study provides insight into the relationship between particle shape, surface wear, and the overall material response, and suggests new strategies when designing a granular material with desired evolution of properties under repeated deformation. △ Less

Submitted 24 June, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: 6 pages, 5 figures

arXiv:1901.08656 [pdf, ps, other]

Edges control clustering in levitated granular matter

Authors: Melody Xuan Lim, Kieran A. Murphy, Heinrich M. Jaeger

Abstract: The properties of small clusters depend dramatically on the interactions between their constituent particles. However, it remains challenging to design and tune the interactions between macroscopic particles, such as in a granular material. Here, we use acoustic levitation to trap macroscopic grains and induce forces between them. Our main results show that particles levitated in an acoustic field… ▽ More The properties of small clusters depend dramatically on the interactions between their constituent particles. However, it remains challenging to design and tune the interactions between macroscopic particles, such as in a granular material. Here, we use acoustic levitation to trap macroscopic grains and induce forces between them. Our main results show that particles levitated in an acoustic field prefer to make contact along sharp edges. The radius of curvature of the edges directly controls the magnitude of these forces. These highly directional interactions, combined with local contact forces, give rise to a diverse array of cluster shapes. Our results open up new possibilities for the design of specific forces between macroscopic particles, directing their assembly, and actuating their motion. △ Less

Submitted 6 June, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: Submitted to the special edition of Granular Matter, In Memoriam of Robert P. Behringer. For supplementary movies, see https://youtu.be/u_IJsMC69-8 and https://youtu.be/rxNYclVQFUY

arXiv:1812.07119 [pdf, other]

Composing Text and Image for Image Retrieval - An Empirical Odyssey

Authors: Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays

Abstract: In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel tower, and ask the system to find images which are visually similar but are modified in small ways, such as being taken at nighttime instead of during the day. To ta… ▽ More In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel tower, and ask the system to find images which are visually similar but are modified in small ways, such as being taken at nighttime instead of during the day. To tackle this task, we learn a similarity metric between a target image and a source image plus source text, an embedding and composing function such that target image feature is close to the source image plus text composition feature. We propose a new way to combine image and text using such function that is designed for the retrieval task. We show this outperforms existing approaches on 3 different datasets, namely Fashion-200k, MIT-States and a new synthetic dataset we create based on CLEVR. We also show that our approach can be used to classify input queries, in addition to image retrieval. △ Less

Submitted 17 December, 2018; originally announced December 2018.

arXiv:1812.03899 [pdf, other]

doi 10.1371/journal.pone.0212852

Diversity of Artists in Major U.S. Museums

Authors: Chad M. Topaz, Bernhard Klingenberg, Daniel Turek, Brianna Heggeseth, Pamela E. Harris, Julie C. Blackwood, C. Ondine Chavoya, Steven Nelson, Kevin M. Murphy

Abstract: The U.S. art museum sector is grappling with diversity. While previous work has investigated the demographic diversity of museum staffs and visitors, the diversity of artists in their collections has remained unreported. We conduct the first large-scale study of artist diversity in museums. By scra** the public online catalogs of 18 major U.S. museums, deploying a sample of 10,000 artist records… ▽ More The U.S. art museum sector is grappling with diversity. While previous work has investigated the demographic diversity of museum staffs and visitors, the diversity of artists in their collections has remained unreported. We conduct the first large-scale study of artist diversity in museums. By scra** the public online catalogs of 18 major U.S. museums, deploying a sample of 10,000 artist records comprising over 9,000 unique artists to crowdsourcing, and analyzing 45,000 responses, we infer artist genders, ethnicities, geographic origins, and birth decades. Our results are threefold. First, we provide estimates of gender and ethnic diversity at each museum, and overall, we find that 85% of artists are white and 87% are men. Second, we identify museums that are outliers, having significantly higher or lower representation of certain demographic groups than the rest of the pool. Third, we find that the relationship between museum collection mission and artist diversity is weak, suggesting that a museum wishing to increase diversity might do so without changing its emphases on specific time periods and regions. Our methodology can be used to broadly and efficiently assess diversity in other fields. △ Less

Submitted 11 February, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: 15 pages, 2 figures, minor revisions of and enhancements to text

arXiv:1810.00319 [pdf, other]

Modeling Uncertainty with Hedged Instance Embedding

Authors: Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher

Abstract: Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering. Many metric learning methods represent the input as a single point in the embedding space. Often the distance between points is used as a proxy for match confidence. However, this can fail to represent uncertainty arising when the input is… ▽ More Instance embeddings are an efficient and versatile image representation that facilitates applications like recognition, verification, retrieval, and clustering. Many metric learning methods represent the input as a single point in the embedding space. Often the distance between points is used as a proxy for match confidence. However, this can fail to represent uncertainty arising when the input is ambiguous, e.g., due to occlusion or blurriness. This work addresses this issue and explicitly models the uncertainty by hedging the location of each input in the embedding space. We introduce the hedged instance embedding (HIB) in which embeddings are modeled as random variables and the model is trained under the variational information bottleneck principle. Empirical results on our new N-digit MNIST dataset show that our method leads to the desired behavior of hedging its bets across the embedding space upon encountering ambiguous inputs. This results in improved performance for image matching and classification tasks, more structure in the learned embedding space, and an ability to compute a per-exemplar uncertainty measure that is correlated with downstream performance. △ Less

Submitted 26 August, 2019; v1 submitted 30 September, 2018; originally announced October 2018.

Comments: 15 pages, 11 figures, updated version of ICLR'19

arXiv:1808.06271 [pdf, other]

doi 10.1103/PhysRevX.9.011014

Transforming mesoscale granular plasticity through particle shape

Authors: Kieran A. Murphy, Karin A. Dahmen, Heinrich M. Jaeger

Abstract: When an amorphous material is strained beyond the point of yielding it enters a state of continual reconfiguration via dissipative, avalanche-like slip events that relieve built-up local stress. However, how the statistics of such events depend on local interactions among the constituent units remains debated. To address this we perform experiments on granular material in which we use particle sha… ▽ More When an amorphous material is strained beyond the point of yielding it enters a state of continual reconfiguration via dissipative, avalanche-like slip events that relieve built-up local stress. However, how the statistics of such events depend on local interactions among the constituent units remains debated. To address this we perform experiments on granular material in which we use particle shape to vary the interactions systematically. Granular material, confined under constant pressure boundary conditions, is uniaxially compressed while stress is measured and internal rearrangements are imaged with x-rays. We introduce volatility, a quantity from economic theory, as a powerful new tool to quantify the magnitude of stress fluctuations, finding systematic, shape-dependent trends. For all 22 investigated shapes the magnitude $s$ of relaxation events is well-fit by a truncated power law distribution $P(s)\sim {s}^{-τ} exp(-s/s^*)$, as has been proposed within the context of plasticity models. The power law exponent $τ$ for all shapes tested clusters around $τ=$ 1.5, within experimental uncertainty covering the range 1.3 - 1.7. The shape independence of $τ$ and its compatibility with mean field models indicate that the granularity of the system, but not particle shape, modifies the stress redistribution after a slip event away from that of continuum elasticity. Meanwhile, the characteristic maximum event size $s^*$ changes by two orders of magnitude and tracks the shape dependence of volatility. Particle shape in granular materials is therefore a powerful new factor influencing the distance at which an amorphous system operates from scale-free criticality. These experimental results are not captured by current models and suggest a need to reexamine the mechanisms driving mesoscale plastic deformation in amorphous systems. △ Less

Submitted 21 December, 2018; v1 submitted 19 August, 2018; originally announced August 2018.

Comments: 11 pages, 8 figures. v3 adds a new appendix and figure about event rates and changes several parts the text

Journal ref: Phys. Rev. X 9, 011014 (2019)

arXiv:1807.10982 [pdf, other]

Actor-Centric Relation Network

Authors: Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid

Abstract: Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mi… ▽ More Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mines the relevant elements automatically with an actor-centric relational network (ACRN). ACRN computes and accumulates pair-wise relation information from actor and global scene features, and generates relation features for action classification. It is implemented as neural networks and can be trained jointly with an existing action detection system. We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action. △ Less

Submitted 28 July, 2018; originally announced July 2018.

Comments: ECCV 2018 camera ready

arXiv:1806.09594 [pdf, other]

Tracking Emerges by Colorizing Videos

Authors: Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

Abstract: We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to colorize gray-scale videos by copying colors from a reference frame. Quantitative and qualitative experiments suggest that this task causes the model to automatically learn to track visual regions. Althoug… ▽ More We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to colorize gray-scale videos by copying colors from a reference frame. Quantitative and qualitative experiments suggest that this task causes the model to automatically learn to track visual regions. Although the model is trained without any ground-truth labels, our method learns to track well enough to outperform the latest methods based on optical flow. Moreover, our results suggest that failures to track are correlated with failures to colorize, indicating that advancing video colorization may further improve self-supervised visual tracking. △ Less

Submitted 27 July, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

Comments: ECCV 2018. Blog post: https://ai.googleblog.com/2018/06/self-supervised-tracking-via-video.html

arXiv:1803.08225 [pdf, other]

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

Authors: George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy

Abstract: We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our model employs a convolutional network which learns to detect individual keypoints and predict their re… ▽ More We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our model employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances. Further, we propose a part-induced geometric embedding descriptor which allows us to associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations. Our system is based on a fully-convolutional architecture and allows for efficient inference, with runtime essentially independent of the number of people present in the scene. Trained on COCO data alone, our system achieves COCO test-dev keypoint average precision of 0.665 using single-scale inference and 0.687 using multi-scale inference, significantly outperforming all previous bottom-up pose estimation systems. We are also the first bottom-up method to report competitive results for the person class in the COCO instance segmentation task, achieving a person category average precision of 0.417. △ Less

Submitted 22 March, 2018; originally announced March 2018.

Comments: Person detection and pose estimation, segmentation and grou**

arXiv:1712.04851 [pdf, other]

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Authors: Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Abstract: Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including spatial (image) feature representation, temporal information representation, and model/computation complexity. It was recently shown by Carreira and Zisserman that 3… ▽ More Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including spatial (image) feature representation, temporal information representation, and model/computation complexity. It was recently shown by Carreira and Zisserman that 3D CNNs, inflated from 2D networks and pretrained on ImageNet, could be a promising way for spatial and temporal representation learning. However, as for model/computation complexity, 3D CNNs are much more expensive than 2D CNNs and prone to overfit. We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices. In particular, we show that it is possible to replace many of the 3D convolutions by low-cost 2D convolutions. Rather surprisingly, best result (in both speed and accuracy) is achieved when replacing the 3D convolutions at the bottom of the network, suggesting that temporal representation learning on high-level semantic features is more useful. Our conclusion generalizes to datasets with very different properties. When combined with several other cost-effective designs including separable spatial/temporal convolution and feature gating, our system results in an effective video classification system that that produces very competitive results on several action classification benchmarks (Kinetics, Something-something, UCF101 and HMDB), as well as two action detection (localization) benchmarks (JHMDB and UCF101-24). △ Less

Submitted 26 July, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: ECCV 2018 camera ready

arXiv:1712.00559 [pdf, other]

Progressive Neural Architecture Search

Authors: Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

Abstract: We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate mode… ▽ More We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space. Direct comparison under the same search space shows that our method is up to 5 times more efficient than the RL method of Zoph et al. (2018) in terms of number of models evaluated, and 8 times faster in terms of total compute. The structures we discover in this way achieve state of the art classification accuracies on CIFAR-10 and ImageNet. △ Less

Submitted 26 July, 2018; v1 submitted 2 December, 2017; originally announced December 2017.

Comments: To appear in ECCV 2018 as oral. The code and checkpoint for PNASNet-5 trained on ImageNet (both Mobile and Large) can now be downloaded from https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. Also see https://github.com/chenxi116/PNASNet.TF for refactored and simplified TensorFlow code; see https://github.com/chenxi116/PNASNet.pytorch for exact conversion to PyTorch

arXiv:1711.05632 [pdf, ps, other]

doi 10.1007/s11634-019-00373-8

Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Authors: Keefe Murphy, Thomas Brendan Murphy

Abstract: We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of mixed-type fixed covariates by proposing the MoEClust suite of models. These models allow different subsets of covariates to influence the component weights and/or component densities by modelling the parameters of the mixture as functions of the covariates.… ▽ More We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of mixed-type fixed covariates by proposing the MoEClust suite of models. These models allow different subsets of covariates to influence the component weights and/or component densities by modelling the parameters of the mixture as functions of the covariates. A familiar range of constrained eigen-decomposition parameterisations of the component covariance matrices are also accommodated. This paper thus addresses the equivalent aims of including covariates in Gaussian parsimonious clustering models and incorporating parsimonious covariance structures into all special cases of the Gaussian mixture of experts framework. The MoEClust models demonstrate significant improvement from both perspectives in applications to both univariate and multivariate data sets. Novel extensions to include a uniform noise component for capturing outliers and to address initialisation of the EM algorithm, model selection, and the visualisation of results are also proposed. △ Less

Submitted 13 July, 2021; v1 submitted 15 November, 2017; originally announced November 2017.

Comments: Published in Advances in Data Analysis and Classification

Journal ref: Advances in Data Analysis and Classification, 14(2): 293-325 (2020)

arXiv:1711.05139 [pdf, other]

XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Map**s

Authors: Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy

Abstract: Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a map** between the corpus-level style of each collection, while preserving semantic cont… ▽ More Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a map** between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN ("Cross-GAN"), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we collected for this purpose is publicly available at google.github.io/cartoonset/ as a new benchmark for semantic style transfer. △ Less

Submitted 10 July, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

Comments: Domain Adaptation for Visual Understanding at ICML'18

arXiv:1711.00464 [pdf, other]

Fixing a Broken ELBO

Authors: Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy

Abstract: Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good… ▽ More Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code. △ Less

Submitted 13 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: 21 pages, 9 figures

arXiv:1705.10762 [pdf, other]

Generative Models of Visually Grounded Imagination

Authors: Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy

Abstract: It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network,… ▽ More It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good visual imagination, namely correctness, coverage, and compositionality (the 3 C's). Finally, we perform a detailed comparison of our method with two existing joint image-attribute VAE methods (the JMVAE method of Suzuki et.al. and the BiVCCA method of Wang et.al.) by applying them to two datasets: the MNIST-with-attributes dataset (which we introduce here), and the CelebA dataset. △ Less

Submitted 9 November, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

Comments: International Conference on Learning Representations (ICLR), 2018

arXiv:1705.07208 [pdf, other]

PixColor: Pixel Recursive Colorization

Authors: Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy

Abstract: We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution colo… ▽ More We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution color image and the original grayscale image as inputs, we train a second CNN to generate a high-resolution colorization of an image. We demonstrate that our approach produces more diverse and plausible colorizations than existing methods, as judged by human raters in a "Visual Turing Test". △ Less

Submitted 5 June, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

arXiv:1704.03549 [pdf, other]

Attention-based Extraction of Structured Information from Street View Imagery

Authors: Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz

Abstract: We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our m… ▽ More We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our model, we show that it also performs well on an even more challenging dataset derived from Google Street View, in which the goal is to extract business names from store fronts. Finally, we study the speed/accuracy tradeoff that results from using CNN feature extractors of different depths. Surprisingly, we find that deeper is not always better (in terms of accuracy, as well as speed). Our resulting model is simple, accurate and fast, allowing it to be used at scale on a variety of challenging real-world text extraction problems. △ Less

Submitted 20 August, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

Comments: Updated references, added link to the source code

arXiv:1703.10277 [pdf, other]

Semantic Instance Segmentation via Deep Metric Learning

Authors: Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy

Abstract: We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grou** similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grou** method is based on selecting all points that are sufficiently similar to a set of "seed points", chosen from a deep, fully conv… ▽ More We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grou** similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grou** method is based on selecting all points that are sufficiently similar to a set of "seed points", chosen from a deep, fully convolutional scoring model. We show competitive results on the Pascal VOC instance segmentation benchmark. △ Less

Submitted 29 March, 2017; originally announced March 2017.

arXiv:1701.07010 [pdf, other]

doi 10.1214/19-BA1179

Infinite Mixtures of Infinite Factor Analysers

Authors: Keefe Murphy, Cinzia Viroli, Isobel Claire Gormley

Abstract: Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across c… ▽ More Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across clusters are rarely considered. Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Furthermore, IMIFA employs multiplicative gamma process shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data. Applications to a benchmark data set, metabolomic spectral data, and a manifold learning handwritten digit example illustrate the IMIFA model and its advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and quantifying uncertainty in the numbers of clusters and cluster-specific factors. △ Less

Submitted 13 July, 2021; v1 submitted 24 January, 2017; originally announced January 2017.

Comments: Published in Bayesian Analysis

Journal ref: Bayesian Analysis, 15(3): 937-963 (2020)

arXiv:1701.03757 [pdf, ps, other]

Deep Probabilistic Programming

Authors: Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

Abstract: We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same… ▽ More We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow. △ Less

Submitted 7 March, 2017; v1 submitted 13 January, 2017; originally announced January 2017.

Comments: Appears in International Conference on Learning Representations, 2017. A companion webpage for this paper is available at http://edwardlib.org/iclr2017

arXiv:1701.02870 [pdf, other]

Context-aware Captions from Context-agnostic Supervision

Authors: Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik

Abstract: We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat"… ▽ More We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat" in a way that distinguishes it from "tiger cat". Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination. △ Less

Submitted 31 July, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

Comments: Accepted to CVPR 2017 (Spotlight)

arXiv:1701.01779 [pdf, other]

Towards Accurate Multi-person Pose Estimation in the Wild

Authors: George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy

Abstract: We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people; for this we use the Faster RCNN detector. In the second stage, we estimate the… ▽ More We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people; for this we use the Faster RCNN detector. In the second stage, we estimate the keypoints of the person potentially contained in each proposed bounding box. For each keypoint type we predict dense heatmaps and offsets using a fully convolutional ResNet. To combine these outputs we introduce a novel aggregation procedure to obtain highly localized keypoint predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based confidence score estimation, instead of box-level scoring. Trained on COCO data alone, our final system achieves average precision of 0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art. Further, by using additional in-house labeled data we obtain an even higher average precision of 0.685 on the test-dev set and 0.673 on the test-standard set, more than 5% absolute improvement compared to the previous best performing method on the same dataset. △ Less

Submitted 14 April, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

Comments: Paper describing an improved version of the G-RMI entry to the 2016 COCO keypoints challenge (http://image-net.org/challenges/ilsvrc+coco2016). Camera ready version to appear in the Proceedings of CVPR 2017

Showing 51–100 of 148 results for author: Murphy, K