Skip to main content

Showing 1–13 of 13 results for author: Kiros, R

.
  1. arXiv:1707.05612  [pdf, other

    cs.LG cs.CL cs.CV

    VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

    Authors: Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler

    Abstract: We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performa… ▽ More

    Submitted 29 July, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

    Comments: Accepted as spotlight presentation at British Machine Vision Conference (BMVC) 2018. Code: https://github.com/fartashf/vsepp

  2. arXiv:1607.06450  [pdf, other

    stat.ML cs.LG

    Layer Normalization

    Authors: Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

    Abstract: Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that n… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

  3. arXiv:1511.06361  [pdf, other

    cs.LG cs.CL cs.CV

    Order-Embeddings of Images and Language

    Authors: Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun

    Abstract: Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR camera-ready version

  4. arXiv:1511.04119  [pdf, other

    cs.LG cs.CV

    Action Recognition using Visual Attention

    Authors: Shikhar Sharma, Ryan Kiros, Ruslan Salakhutdinov

    Abstract: We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model learns to focus selectively on parts of the video frames and classifies videos after taking a few glimpses. The model essentially learns which parts in the frames… ▽ More

    Submitted 14 February, 2016; v1 submitted 12 November, 2015; originally announced November 2015.

  5. arXiv:1506.06726  [pdf, other

    cs.CL cs.LG

    Skip-Thought Vectors

    Authors: Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler

    Abstract: We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion me… ▽ More

    Submitted 22 June, 2015; originally announced June 2015.

    Comments: 11 pages

  6. arXiv:1506.06724  [pdf, other

    cs.CV cs.CL

    Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

    Authors: Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler

    Abstract: Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available i… ▽ More

    Submitted 22 June, 2015; originally announced June 2015.

  7. arXiv:1505.02074  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Exploring Models and Data for Image Question Answering

    Authors: Mengye Ren, Ryan Kiros, Richard Zemel

    Abstract: This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing i… ▽ More

    Submitted 29 November, 2015; v1 submitted 8 May, 2015; originally announced May 2015.

    Comments: 12 pages. Conference paper at NIPS 2015

  8. arXiv:1502.05700  [pdf, other

    stat.ML

    Scalable Bayesian Optimization Using Deep Neural Networks

    Authors: Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, Ryan P. Adams

    Abstract: Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale… ▽ More

    Submitted 13 July, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  9. arXiv:1502.03044  [pdf, other

    cs.LG cs.CV

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Authors: Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

    Abstract: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to auto… ▽ More

    Submitted 19 April, 2016; v1 submitted 10 February, 2015; originally announced February 2015.

  10. arXiv:1411.2539  [pdf, other

    cs.LG cs.CL cs.CV

    Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

    Authors: Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel

    Abstract: Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce t… ▽ More

    Submitted 10 November, 2014; originally announced November 2014.

    Comments: 13 pages. NIPS 2014 deep learning workshop

  11. arXiv:1406.2710  [pdf, other

    cs.LG cs.CL

    A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

    Authors: Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov

    Abstract: In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to document indicators (to learn sentence vectors), language indicators (to learn distributed language representations), meta-data and side information (such as the age, gender and i… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

    Comments: 11 pages. An earlier version was accepted to the ICML-2014 Workshop on Knowledge-Powered Deep Learning for Text Mining

  12. arXiv:1301.3641  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Training Neural Networks with Stochastic Hessian-Free Optimization

    Authors: Ryan Kiros

    Abstract: Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of t… ▽ More

    Submitted 1 May, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

    Comments: 11 pages, ICLR 2013

  13. arXiv:1206.6455  [pdf

    cs.LG stat.ML

    Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations

    Authors: Yaoliang Yu, James Neufeld, Ryan Kiros, Xinhua Zhang, Dale Schuurmans

    Abstract: We demonstrate that almost all non-parametric dimensionality reduction methods can be expressed by a simple procedure: regularized loss minimization plus singular value truncation. By distinguishing the role of the loss and regularizer in such a process, we recover a factored perspective that reveals some gaps in the current literature. Beyond identifying a useful new loss for manifold unfolding,… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)