Skip to main content

Showing 1–30 of 30 results for author: Erhan, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.11606  [pdf, other

    cs.CV cs.CL

    StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

    Authors: Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender

    Abstract: Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect compre… ▽ More

    Submitted 12 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: NeurIPS D&B 2023

  2. arXiv:2210.02399  [pdf, other

    cs.CV cs.AI

    Phenaki: Variable Length Video Generation From Open Domain Textual Description

    Authors: Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan

    Abstract: We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small repres… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  3. arXiv:2204.08585  [pdf, other

    cs.RO cs.AI cs.LG

    INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

    Authors: Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine

    Abstract: Model-based reinforcement learning (RL) algorithms designed for handling complex visual observations typically learn some sort of latent state representation, either explicitly or implicitly. Standard methods of this sort do not distinguish between functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information equally. We propose a modi… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: Published in International Conference on Learning Representations (ICLR 2022)

  4. arXiv:2106.13195  [pdf, other

    cs.CV cs.LG

    FitVid: Overfitting in Pixel-Level Video Prediction

    Authors: Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan

    Abstract: An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed pas… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  5. arXiv:2012.04603  [pdf, other

    cs.LG

    Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

    Authors: Mohammad Babaeizadeh, Mohammad Taghi Saffar, Danijar Hafner, Harini Kannan, Chelsea Finn, Sergey Levine, Dumitru Erhan

    Abstract: Model-based reinforcement learning (MBRL) methods have shown strong sample efficiency and performance across a variety of tasks, including when faced with high-dimensional visual observations. These methods learn to predict the environment dynamics and expected reward from interaction and use this predictive model to plan and perform the task. However, MBRL methods vary in their fundamental design… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  6. arXiv:2005.03844  [pdf, other

    cs.CV

    SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving

    Authors: Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, Henrik Kretzschmar

    Abstract: Autonomous driving system development is critically dependent on the ability to replay complex and diverse traffic scenarios in simulation. In such scenarios, the ability to accurately simulate the vehicle sensors such as cameras, lidar or radar is essential. However, current sensor simulators leverage gaming engines such as Unreal or Unity, requiring manual creation of environments, objects and m… ▽ More

    Submitted 25 June, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

    Journal ref: CVPR 2020

  7. arXiv:1911.01655  [pdf, other

    cs.CV

    High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

    Authors: Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

    Abstract: Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: In Advances in Neural Information Processing Systems (NeurIPS), 2019

  8. arXiv:1903.01434  [pdf, other

    cs.CV cs.AI cs.LG

    VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

    Authors: Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma

    Abstract: Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models… ▽ More

    Submitted 12 February, 2020; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: ICLR 2020 Camera-Ready. Previous title: VideoFlow: A Flow-Based Generative Model for Video

  9. arXiv:1903.00374  [pdf, other

    cs.LG stat.ML

    Model-Based Reinforcement Learning for Atari

    Authors: Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

    Abstract: Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 March, 2019; originally announced March 2019.

  10. arXiv:1806.10758  [pdf, other

    cs.LG cs.AI stat.ML

    A Benchmark for Interpretability Methods in Deep Neural Networks

    Authors: Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim

    Abstract: We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and Smoo… ▽ More

    Submitted 4 November, 2019; v1 submitted 27 June, 2018; originally announced June 2018.

    Comments: In NeurIPS 2019

  11. arXiv:1806.04768  [pdf, other

    cs.CV

    Hierarchical Long-term Video Prediction without Supervision

    Authors: Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee

    Abstract: Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons. The hierarchical video prediction method by Villegas et al. (2017) is an example of a state-of-the-art method for long-term video prediction, but their method is limited because it requires ground truth annot… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: International Conference on Machine Learning (ICML) 2018

  12. arXiv:1711.00867  [pdf, other

    stat.ML cs.LG

    The (Un)reliability of saliency methods

    Authors: Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim

    Abstract: Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribut… ▽ More

    Submitted 2 November, 2017; originally announced November 2017.

  13. arXiv:1710.11252  [pdf, other

    cs.CV cs.RO

    Stochastic Variational Video Prediction

    Authors: Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine

    Abstract: Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifyi… ▽ More

    Submitted 6 March, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

  14. arXiv:1705.05598  [pdf, other

    stat.ML cs.LG

    Learning how to explain neural networks: PatternNet and PatternAttribution

    Authors: Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne

    Abstract: DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work r… ▽ More

    Submitted 24 October, 2017; v1 submitted 16 May, 2017; originally announced May 2017.

  15. arXiv:1612.05424  [pdf, other

    cs.CV

    Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks

    Authors: Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

    Abstract: Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervis… ▽ More

    Submitted 23 August, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

    Comments: Final CVPR 2017 paper and supplementary material

  16. Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

    Authors: Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

    Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The mod… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1411.4555

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: PP, Issue: 99 , July 2016 )

  17. arXiv:1608.06019  [pdf, other

    cs.CV

    Domain Separation Networks

    Authors: Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan

    Abstract: The cost of large scale data collection and annotation often makes the application of machine learning algorithms to new tasks or datasets prohibitively expensive. One approach circumventing this cost is training models on synthetic data where annotations are provided automatically. Despite their appeal, such models often fail to generalize from synthetic to real images, necessitating domain adapt… ▽ More

    Submitted 21 August, 2016; originally announced August 2016.

    Comments: This work will be presented at NIPS 2016

  18. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  19. SSD: Single Shot MultiBox Detector

    Authors: Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

    Abstract: We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box… ▽ More

    Submitted 29 December, 2016; v1 submitted 7 December, 2015; originally announced December 2015.

    Comments: ECCV 2016

  20. arXiv:1503.02834  [pdf, ps, other

    stat.ME cs.AI

    Doubly Robust Policy Evaluation and Optimization

    Authors: Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

    Comments: Published in at http://dx.doi.org/10.1214/14-STS500 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS500

    Journal ref: Statistical Science 2014, Vol. 29, No. 4, 485-511

  21. arXiv:1412.6596  [pdf, other

    cs.CV cs.LG cs.NE

    Training Deep Neural Networks on Noisy Labels with Bootstrap**

    Authors: Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, Andrew Rabinovich

    Abstract: Current state-of-the-art deep learning systems for visual object recognition and detection use purely supervised training with regularization such as dropout to avoid overfitting. The performance depends critically on the amount of labeled examples, and in current practice the labels are assumed to be unambiguous and accurate. However, this assumption often does not hold; e.g. in recognition, clas… ▽ More

    Submitted 15 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

  22. arXiv:1412.1441  [pdf, other

    cs.CV

    Scalable, High-Quality Object Detection

    Authors: Christian Szegedy, Scott Reed, Dumitru Erhan, Dragomir Anguelov, Sergey Ioffe

    Abstract: Current high-quality object detection approaches use the scheme of salience-based object proposal methods followed by post-classification using deep convolutional features. This spurred recent research in improving object proposal methods. However, domain agnostic proposal generation has the principal drawback that the proposals come unranked or with very weak ranking, making it hard to trade-off… ▽ More

    Submitted 8 December, 2015; v1 submitted 3 December, 2014; originally announced December 2014.

  23. arXiv:1411.4555  [pdf, other

    cs.CV

    Show and Tell: A Neural Image Caption Generator

    Authors: Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

    Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The mod… ▽ More

    Submitted 20 April, 2015; v1 submitted 17 November, 2014; originally announced November 2014.

  24. arXiv:1409.4842  [pdf, other

    cs.CV

    Going Deeper with Convolutions

    Authors: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

    Abstract: We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully c… ▽ More

    Submitted 16 September, 2014; originally announced September 2014.

  25. arXiv:1312.6199  [pdf, other

    cs.CV cs.LG cs.NE

    Intriguing properties of neural networks

    Authors: Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus

    Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction betwee… ▽ More

    Submitted 19 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

  26. arXiv:1312.5697  [pdf, other

    cs.CV cs.LG

    Using Web Co-occurrence Statistics for Improving Image Categorization

    Authors: Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

    Abstract: Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see an elephant to appear in the middle of an ocean. We consider a simple approach to encapsulate such common sense knowledge using co-occurrence statis… ▽ More

    Submitted 20 December, 2013; v1 submitted 19 December, 2013; originally announced December 2013.

  27. arXiv:1312.2249  [pdf, other

    cs.CV stat.ML

    Scalable Object Detection using Deep Neural Networks

    Authors: Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov

    Abstract: Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bounding box and a confidence score for each object category in the image. Such a model captures the whol… ▽ More

    Submitted 8 December, 2013; originally announced December 2013.

  28. arXiv:1307.0414  [pdf, other

    stat.ML cs.LG

    Challenges in Representation Learning: A report on three machine learning contests

    Authors: Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, **g**g Xie, Lukasz Romaszko , et al. (3 additional authors not shown)

    Abstract: The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin… ▽ More

    Submitted 1 July, 2013; originally announced July 2013.

    Comments: 8 pages, 2 figures

  29. arXiv:1210.4862  [pdf

    cs.LG stat.ML

    Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

    Authors: Miroslav Dudik, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control of a… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-247-254

  30. arXiv:1009.3589  [pdf, other

    cs.LG cs.CV cs.NE

    Deep Self-Taught Learning for Handwritten Character Recognition

    Authors: Frédéric Bastien, Yoshua Bengio, Arnaud Bergeron, Nicolas Boulanger-Lewandowski, Thomas Breuel, Youssouf Chherawala, Moustapha Cisse, Myriam Côté, Dumitru Erhan, Jeremy Eustache, Xavier Glorot, Xavier Muller, Sylvain Pannetier Lebeuf, Razvan Pascanu, Salah Rifai, Francois Savard, Guillaume Sicard

    Abstract: Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of… ▽ More

    Submitted 18 September, 2010; originally announced September 2010.

    Report number: 1353, Dept. IRO, U. Montreal MSC Class: 68T05 ACM Class: I.2.6