Skip to main content

Showing 101–150 of 177 results for author: Fei-Fei, L

.
  1. arXiv:1902.07817  [pdf, other

    cs.SD cs.CL eess.AS

    Audio-Linguistic Embeddings for Spoken Sentences

    Authors: Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei

    Abstract: We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence level. Formulated as an audio-linguistic multitask learning problem, our encoder-decoder model simultaneously reconstructs acoustic and natural language features f… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

    Comments: International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2019

  2. arXiv:1902.03748  [pdf, other

    cs.CV

    Peeking into the Future: Predicting Future Person Activities and Locations in Videos

    Authors: Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei

    Abstract: Deciphering human behaviors to predict their future paths/trajectories and what they would do from videos is important in many applications. Motivated by this idea, this paper studies predicting a pedestrian's future path jointly with future activities. We propose an end-to-end, multi-task learning system utilizing rich visual features about human behavioral information and interaction with their… ▽ More

    Submitted 31 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: In CVPR 2019. Code, models and more results are available at: https://next.cs.cmu.edu/

  3. arXiv:1901.04780  [pdf, other

    cs.CV cs.RO

    DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

    Authors: Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese

    Abstract: A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources. Prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. In this work, we present DenseFusion, a generic framework for… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  4. arXiv:1901.02985  [pdf, other

    cs.CV cs.LG

    Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

    Authors: Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei

    Abstract: Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This… ▽ More

    Submitted 6 April, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

    Comments: To appear in CVPR 2019 as oral. Code for Auto-DeepLab released at https://github.com/tensorflow/models/tree/master/research/deeplab

  5. arXiv:1901.02598  [pdf, other

    cs.CV

    D3TW: Discriminative Differentiable Dynamic Time War** for Weakly Supervised Action Alignment and Segmentation

    Authors: Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles

    Abstract: We address weakly supervised action alignment and segmentation in videos, where only the order of occurring actions is available during training. We propose Discriminative Differentiable Dynamic Time War** (D3TW), the first discriminative model using weak ordering supervision. The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering… ▽ More

    Submitted 11 April, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: To appear in CVPR 2019

  6. arXiv:1812.07119  [pdf, other

    cs.CV

    Composing Text and Image for Image Retrieval - An Empirical Odyssey

    Authors: Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays

    Abstract: In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel tower, and ask the system to find images which are visually similar but are modified in small ways, such as being taken at nighttime instead of during the day. To ta… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  7. arXiv:1812.00169  [pdf, other

    cs.CV

    Vision-Based Gait Analysis for Senior Care

    Authors: David Xue, Anin Sayana, Evan Darke, Kelly Shen, Jun-Ting Hsieh, Zelun Luo, Li-Jia Li, N. Lance Downing, Arnold Milstein, Li Fei-Fei

    Abstract: As the senior population rapidly increases, it is challenging yet crucial to provide effective long-term care for seniors who live at home or in senior care facilities. Smart senior homes, which have gained widespread interest in the healthcare community, have been proposed to improve the well-being of seniors living independently. In particular, non-intrusive, cost-effective sensors placed in the… ▽ More

    Submitted 1 December, 2018; originally announced December 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/78

  8. arXiv:1811.09953  [pdf, other

    cs.CR

    Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference

    Authors: Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, Li Fei-Fei

    Abstract: Homomorphic encryption enables arbitrary computation over data while it remains encrypted. This privacy-preserving feature is attractive for machine learning, but requires significant computational time due to the large overhead of the encryption scheme. We present Faster CryptoNets, a method for efficient encrypted inference using neural networks. We develop a pruning and quantization approach th… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

  9. arXiv:1811.09951  [pdf, other

    cs.CR

    A Fully Private Pipeline for Deep Learning on Electronic Health Records

    Authors: Edward Chou, Thao Nguyen, Josh Beal, Albert Haque, Li Fei-Fei

    Abstract: We introduce an end-to-end private deep learning framework, applied to the task of predicting 30-day readmission from electronic health records. By using differential privacy during training and homomorphic encryption during inference, we demonstrate that our proposed pipeline could maintain high performance while providing robust privacy guarantees against information leak from data transmission… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

  10. arXiv:1811.09950  [pdf, other

    cs.CV

    Privacy-Preserving Action Recognition for Smart Hospitals using Low-Resolution Depth Images

    Authors: Edward Chou, Matthew Tan, Cherry Zou, Michelle Guo, Albert Haque, Arnold Milstein, Li Fei-Fei

    Abstract: Computer-vision hospital systems can greatly assist healthcare workers and improve medical facility treatment, but often face patient resistance due to the perceived intrusiveness and violation of privacy associated with visual surveillance. We downsample video frames to extremely low resolutions to degrade private information from surveillance videos. We measure the amount of activity-recognition… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/154

  11. arXiv:1811.08592  [pdf, other

    cs.CV cs.SD eess.AS

    Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions

    Authors: Albert Haque, Michelle Guo, Adam S Miner, Li Fei-Fei

    Abstract: With more than 300 million people depressed worldwide, depression is a global problem. Due to access barriers such as social stigma, cost, and treatment availability, 60% of mentally-ill adults do not receive any mental health services. Effective and efficient diagnosis relies on detecting clinical symptoms of depression. Automatic detection of depressive symptoms would potentially improve diagnos… ▽ More

    Submitted 26 November, 2018; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/9

  12. arXiv:1811.02790  [pdf, other

    cs.RO cs.AI cs.LG

    RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

    Authors: Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

    Abstract: Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to ad… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: Published at the Conference on Robot Learning (CoRL) 2018

  13. arXiv:1810.10191  [pdf, other

    cs.RO cs.AI cs.LG

    Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

    Authors: Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

    Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on… ▽ More

    Submitted 7 March, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: ICRA 2019

  14. arXiv:1807.09937  [pdf, other

    cs.CV cs.LG

    HiDDeN: Hiding Data With Deep Networks

    Authors: Jiren Zhu, Russell Kaplan, Justin Johnson, Li Fei-Fei

    Abstract: Recent work has shown that deep neural networks are highly sensitive to tiny perturbations of input images, giving rise to adversarial examples. Though this property is usually considered a weakness of learned models, we explore whether it can be beneficial. We find that neural networks can learn to use invisible perturbations to encode a rich amount of useful information. In fact, one can exploit… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

  15. arXiv:1807.03480  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

    Authors: De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

    Abstract: Our goal is to generate a policy to complete an unseen task given just a single video demonstration of the task in a given domain. We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model. To this end, we propose Neural Task Graph (NTG) Networks, which… ▽ More

    Submitted 6 March, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: CVPR 2019

  16. arXiv:1806.09266  [pdf, other

    cs.RO cs.CV cs.LG stat.ML

    Learning Task-Oriented Gras** for Tool Manipulation from Simulated Self-Supervision

    Authors: Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

    Abstract: Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly gras** and manipulating the tool to achieve the task. Task-agnostic gras** optimizes for grasp robustness while ignoring crucial task-specific constraints. In this paper, we propose the Task-Oriented Gras** Network (TOG-Net) to jo… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: RSS 2018

  17. arXiv:1806.08047  [pdf, other

    cs.AI cs.CV cs.LG cs.NE

    Flexible Neural Representation for Physics Prediction

    Authors: Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Joshua B. Tenenbaum, Daniel L. K. Yamins

    Abstract: Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail. Inspired by this ability, we propose a hierarchical particle-based object representation that covers a wide variety of types of three-dimensional objects, including both arbitrary rigid geometrical shapes and def… ▽ More

    Submitted 27 October, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: 23 pages, 20 figures

  18. arXiv:1806.04166  [pdf, other

    cs.LG cs.CV stat.ML

    Learning to Decompose and Disentangle Representations for Video Prediction

    Authors: Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles

    Abstract: Our goal is to predict future video frames given a sequence of input frames. Despite large amounts of video data, this remains a challenging task because of the high-dimensionality of video frames. We address this challenge by proposing the Decompositional Disentangled Predictive Auto-Encoder (DDPAE), a framework that combines structured probabilistic models and deep networks to automatically (i)… ▽ More

    Submitted 17 October, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

  19. arXiv:1804.01622  [pdf, other

    cs.CV cs.LG

    Image Generation from Scene Graphs

    Authors: Justin Johnson, Agrim Gupta, Li Fei-Fei

    Abstract: To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: To appear at CVPR 2018

  20. arXiv:1803.11189  [pdf, other

    cs.CV

    Iterative Visual Reasoning Beyond Convolutions

    Authors: Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta

    Abstract: We present a novel framework for iterative visual reasoning. Our framework goes beyond current recognition systems that lack the capability to reason beyond stack of convolutions. The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module. Our graph module has three components: a) a knowle… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: CVPR 2018

  21. arXiv:1803.10892  [pdf, other

    cs.CV

    Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

    Authors: Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

    Abstract: Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  22. arXiv:1803.10362  [pdf, other

    cs.CV

    Referring Relationships

    Authors: Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei

    Abstract: Images are not simply sets of objects: each image represents a web of interconnected relationships. These relationships between entities carry semantic meaning and help a viewer differentiate between instances of an entity. For example, in an image of a soccer match, there may be multiple persons present, but each participates in different relationships: one is kicking the ball, and the other is g… ▽ More

    Submitted 29 March, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: CVPR 2018, 19 pages, 12 figures, includes supplementary material

  23. arXiv:1802.08774  [pdf, other

    cs.CV

    Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks

    Authors: Amy **, Serena Yeung, Jeffrey Jopling, Jonathan Krause, Dan Azagury, Arnold Milstein, Li Fei-Fei

    Abstract: Five billion people in the world lack access to quality surgical care. Surgeon skill varies dramatically, and many surgical patients suffer complications and avoidable harm. Improving surgical training and feedback would help to reduce the rate of complications, half of which have been shown to be preventable. To do this, it is essential to assess operative skill, a process that currently requires… ▽ More

    Submitted 21 July, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: arXiv admin note: text overlap with arXiv:1806.02031 by other authors

  24. arXiv:1802.07461  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Emergence of Structured Behaviors from Curiosity-Based Intrinsic Motivation

    Authors: Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

    Abstract: Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to replicate some of these abilities with a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which the agent can move and interact with… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: 6 pages, 5 figures

    MSC Class: 68

  25. arXiv:1802.07442  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Learning to Play with Intrinsically-Motivated Self-Aware Agents

    Authors: Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

    Abstract: Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and intera… ▽ More

    Submitted 30 October, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: In NIPS 2018. 10 pages, 5 figures

    MSC Class: 68

  26. arXiv:1712.05055  [pdf, other

    cs.CV

    MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

    Authors: Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei

    Abstract: Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNe… ▽ More

    Submitted 13 August, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Journal ref: published at ICML 2018

  27. arXiv:1712.00559  [pdf, other

    cs.CV cs.LG stat.ML

    Progressive Neural Architecture Search

    Authors: Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

    Abstract: We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate mode… ▽ More

    Submitted 26 July, 2018; v1 submitted 2 December, 2017; originally announced December 2017.

    Comments: To appear in ECCV 2018 as oral. The code and checkpoint for PNASNet-5 trained on ImageNet (both Mobile and Large) can now be downloaded from https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. Also see https://github.com/chenxi116/PNASNet.TF for refactored and simplified TensorFlow code; see https://github.com/chenxi116/PNASNet.pytorch for exact conversion to PyTorch

  28. arXiv:1712.00123  [pdf, other

    stat.ML cs.CV

    Label Efficient Learning of Transferable Representations across Domains and Tasks

    Authors: Zelun Luo, Yuliang Zou, Judy Hoffman, Li Fei-Fei

    Abstract: We propose a framework that learns a representation transferable across different domains and tasks in a label efficient manner. Our approach battles domain shift with a domain adversarial loss, and generalizes the embedding to novel task using a metric learning-based approach. Our model is simultaneously optimized on labeled source data and unlabeled or sparsely labeled data in the target domain.… ▽ More

    Submitted 30 November, 2017; originally announced December 2017.

    Comments: NIPS 2017

  29. arXiv:1712.00108  [pdf, other

    cs.CV

    Graph Distillation for Action Detection with Privileged Modalities

    Authors: Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei

    Abstract: We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on… ▽ More

    Submitted 27 July, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

    Comments: ECCV 2018

  30. arXiv:1711.06373  [pdf, other

    cs.CV stat.ML

    Thoracic Disease Identification and Localization with Limited Supervision

    Authors: Zhe Li, Chong Wang, Mei Han, Yuan Xue, Wei Wei, Li-Jia Li, Li Fei-Fei

    Abstract: Accurate identification and localization of abnormalities from radiology images play an integral part in clinical diagnosis and treatment planning. Building a highly accurate prediction model for these tasks usually requires a large number of images manually annotated with labels and finding sites of abnormalities. In reality, however, such annotated data are expensive to acquire, especially the o… ▽ More

    Submitted 20 June, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: Conference on Computer Vision and Pattern Recognition 2018 (CVPR 2018). V1: CVPR submission; V2: +supplementary; V3: CVPR camera-ready; V4: correction, update reference baseline results according to their latest post; V5: minor correction; V6: Identification results using NIH data splits and various image models

  31. arXiv:1710.01813  [pdf, other

    cs.AI cs.LG cs.RO

    Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

    Authors: Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese

    Abstract: In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, wher… ▽ More

    Submitted 14 March, 2018; v1 submitted 4 October, 2017; originally announced October 2017.

    Comments: ICRA 2018

  32. arXiv:1709.02482  [pdf, other

    cs.HC cs.CV

    Scalable Annotation of Fine-Grained Categories Without Experts

    Authors: Timnit Gebru, Jonathan Krause, Jia Deng, Li Fei-Fei

    Abstract: We present a crowdsourcing workflow to collect image annotations for visually similar synthetic categories without requiring experts. In animals, there is a direct link between taxonomy and visual similarity: e.g. a collie (type of dog) looks more similar to other collies (e.g. smooth collie) than a greyhound (another type of dog). However, in synthetic categories such as cars, objects with simila… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: CHI 2017

  33. arXiv:1709.02480  [pdf, other

    cs.CV

    Fine-Grained Car Detection for Visual Census Estimation

    Authors: Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Li Fei-Fei

    Abstract: Targeted socioeconomic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine lea… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: AAAI 2016

  34. arXiv:1709.02476  [pdf, other

    cs.CV

    Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach

    Authors: Timnit Gebru, Judy Hoffman, Li Fei-Fei

    Abstract: While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: ICCV 2017

  35. arXiv:1708.00163  [pdf, other

    cs.CV

    Towards Vision-Based Smart Hospitals: A System for Tracking and Monitoring Hand Hygiene Compliance

    Authors: Albert Haque, Michelle Guo, Alexandre Alahi, Serena Yeung, Zelun Luo, Alisha Rege, Jeffrey Jopling, Lance Downing, William Beninati, Amit Singh, Terry Platchek, Arnold Milstein, Li Fei-Fei

    Abstract: One in twenty-five patients admitted to a hospital will suffer from a hospital acquired infection. If we can intelligently track healthcare staff, patients, and visitors, we can better understand the sources of such infections. We envision a smart hospital capable of increasing operational efficiency and improving patient care with less spending. In this paper, we propose a non-intrusive vision-ba… ▽ More

    Submitted 24 April, 2018; v1 submitted 1 August, 2017; originally announced August 2017.

    Comments: Machine Learning for Healthcare Conference (MLHC)

    Journal ref: PMLR 68:75-87, 2017

  36. arXiv:1707.04674  [pdf, other

    cs.RO

    ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

    Authors: James Harrison, Animesh Garg, Boris Ivanovic, Yuke Zhu, Silvio Savarese, Li Fei-Fei, Marco Pavone

    Abstract: Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem… ▽ More

    Submitted 8 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

    Comments: International Symposium on Robotics Research (ISRR), 2017

  37. arXiv:1706.03643  [pdf, other

    cs.LG

    Tackling Over-pruning in Variational Autoencoders

    Authors: Serena Yeung, Anitha Kannan, Yann Dauphin, Li Fei-Fei

    Abstract: Variational autoencoders (VAE) are directed generative models that learn factorial latent variables. As noted by Burda et al. (2015), these models exhibit the problem of factor over-pruning where a significant number of stochastic factors fail to learn anything and become inactive. This can limit their modeling power and their ability to learn diverse and meaningful latent representations. In this… ▽ More

    Submitted 6 August, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

  38. arXiv:1706.02884  [pdf, other

    cs.CV

    Learning to Learn from Noisy Web Videos

    Authors: Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, Greg Mori, Li Fei-Fei

    Abstract: Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesn't scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-sup… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

    Comments: To appear in CVPR 2017

  39. arXiv:1705.08080  [pdf, other

    cs.CV cs.LG cs.RO

    Visual Semantic Planning using Deep Successor Representations

    Authors: Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

    Abstract: A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects an… ▽ More

    Submitted 15 August, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: ICCV 2017 camera ready

  40. arXiv:1705.03633  [pdf, other

    cs.CV cs.CL cs.LG

    Inferring and Executing Programs for Visual Reasoning

    Authors: Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

    Abstract: Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a p… ▽ More

    Submitted 10 May, 2017; originally announced May 2017.

  41. arXiv:1705.02092  [pdf, other

    cs.CV

    Characterizing and Improving Stability in Neural Style Transfer

    Authors: Agrim Gupta, Justin Johnson, Alexandre Alahi, Li Fei-Fei

    Abstract: Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods. However, real-time methods are highly unstable resulting in visible flickering when applied to videos. In this work we characterize the instability of these methods by examining the solution set of the style transfer objective. We show that the trace of the Gram matrix represen… ▽ More

    Submitted 5 May, 2017; originally announced May 2017.

  42. arXiv:1705.00754  [pdf, other

    cs.CV

    Dense-Captioning Events in Videos

    Authors: Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles

    Abstract: Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clap**". We introduce the task of dense-captioning events, which involves both detecting and describing events in a video. We propose a new model that is able to identify all events in a single pass of the video while simultaneously descr… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: 16 pages, 16 figures

  43. arXiv:1703.02521  [pdf, other

    cs.CV

    Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

    Authors: De-An Huang, Joseph J. Lim, Li Fei-Fei, Juan Carlos Niebles

    Abstract: We propose an unsupervised method for reference resolution in instructional videos, where the goal is to temporally link an entity (e.g., "dressing") to the action (e.g., "mix yogurt") that produced it. The key challenge is the inevitable visual-linguistic ambiguities arising from the changes in both visual appearance and referring expression of an entity in the video. This challenge is amplified… ▽ More

    Submitted 20 May, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

    Comments: CVPR 2017

  44. Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

    Authors: Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei

    Abstract: The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade.… ▽ More

    Submitted 2 March, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

    Comments: 41 pages including supplementary material. Under review at PNAS

  45. arXiv:1701.02426  [pdf, other

    cs.CV

    Scene Graph Generation by Iterative Message Passing

    Authors: Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei

    Abstract: Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene represent… ▽ More

    Submitted 12 April, 2017; v1 submitted 9 January, 2017; originally announced January 2017.

    Comments: CVPR 2017

  46. arXiv:1701.01821  [pdf, other

    cs.CV

    Unsupervised Learning of Long-Term Motion Dynamics for Videos

    Authors: Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, Li Fei-Fei

    Abstract: We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network… ▽ More

    Submitted 11 April, 2017; v1 submitted 7 January, 2017; originally announced January 2017.

    Comments: CVPR 2017

  47. arXiv:1612.06890  [pdf, other

    cs.CV cs.CL cs.LG

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

    Authors: Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

    Abstract: When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pi… ▽ More

    Submitted 20 December, 2016; originally announced December 2016.

  48. arXiv:1611.07212  [pdf, other

    cs.CV

    Recurrent Attention Models for Depth-Based Person Identification

    Authors: Albert Haque, Alexandre Alahi, Li Fei-Fei

    Abstract: We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark. Our approach leverages unique 4D spatio-temporal signatures to address the identification problem across days. Formulated as a reinforcement learning task, our model is based on a combination of convolutional and recurrent neural net… ▽ More

    Submitted 22 November, 2016; originally announced November 2016.

    Comments: Computer Vision and Pattern Recognition (CVPR) 2016

  49. arXiv:1611.06607  [pdf, other

    cs.CV cs.CL

    A Hierarchical Approach for Generating Descriptive Image Paragraphs

    Authors: Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

    Abstract: Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable t… ▽ More

    Submitted 10 April, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

    Comments: CVPR 2017 spotlight

  50. Crowdsourcing in Computer Vision

    Authors: Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman

    Abstract: Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have e… ▽ More

    Submitted 7 November, 2016; originally announced November 2016.

    Comments: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 2016