Skip to main content

Showing 1–33 of 33 results for author: Al-Rfou, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19531  [pdf, other

    cs.CV

    MoST: Multi-modality Scene Tokenization for Motion Prediction

    Authors: Norman Mu, **gwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou

    Abstract: Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e.g., failures in detecting open-vocabulary obstacles) while missing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  2. arXiv:2404.03843  [pdf, other

    cs.RO cs.LG

    Scaling Motion Forecasting Models with Ensemble Distillation

    Authors: Scott Ettinger, Kratarth Goel, Avikalp Srivastava, Rami Al-Rfou

    Abstract: Motion forecasting has become an increasingly critical component of autonomous robotic systems. Onboard compute budgets typically limit the accuracy of real-time systems. In this work we propose methods of improving motion forecasting systems subject to limited compute budgets by combining model ensemble and distillation techniques. The use of ensembles of deep neural networks has been shown to im… ▽ More

    Submitted 13 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 11 pages, 14 figures

  3. arXiv:2402.05862  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    Let Your Graph Do the Talking: Encoding Structured Data for LLMs

    Authors: Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow

    Abstract: How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representati… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    ACM Class: I.5.1; I.2.6; I.2.7

  4. arXiv:2309.16534  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    MotionLM: Multi-Agent Motion Forecasting as Language Modeling

    Authors: Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou, Benjamin Sapp

    Abstract: Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent varia… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: To appear at the International Conference on Computer Vision (ICCV) 2023

  5. arXiv:2303.14588  [pdf

    cs.CL

    Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization

    Authors: Bashar Al-Rfooh, Gheith Abandah, Rami Al-Rfou

    Abstract: Most of previous work on learning diacritization of the Arabic language relied on training models from scratch. In this paper, we investigate how to leverage pre-trained language models to learn diacritization. We finetune token-free pre-trained multilingual models (ByT5) to learn to predict and insert missing diacritics in Arabic text, a complex task that requires understanding the sentence seman… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

  6. arXiv:2207.05844  [pdf, other

    cs.CV

    Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

    Authors: Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S. Refaat, Benjamin Sapp

    Abstract: Motion forecasting for autonomous driving is a challenging task because complex driving scenarios result in a heterogeneous mix of static and dynamic inputs. It is an open problem how best to represent and fuse information about road geometry, lane connectivity, time-varying traffic light state, and history of a dynamic set of agents and their interactions into an effective encoding. To model this… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  7. arXiv:2206.04176  [pdf, other

    cs.CV cs.LG cs.RO

    VN-Transformer: Rotation-Equivariant Attention for Vector Neurons

    Authors: Serge Assaad, Carlton Downey, Rami Al-Rfou, Nigamaa Nayakanti, Ben Sapp

    Abstract: Rotation equivariance is a desirable property in many practical applications such as motion forecasting and 3D perception, where it can offer benefits like sample efficiency, better generalization, and robustness to input perturbations. Vector Neurons (VN) is a recently developed framework offering a simple yet effective approach for deriving rotation-equivariant analogs of standard machine learni… ▽ More

    Submitted 24 January, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR), 2023; Previous version appeared in Workshop on Machine Learning for Autonomous Driving, Conference on Neural Information Processing Systems (NeurIPS), 2022

  8. arXiv:2206.03970  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Narrowing the Coordinate-frame Gap in Behavior Prediction Models: Distillation for Efficient and Accurate Scene-centric Motion Forecasting

    Authors: DiJia Su, Bertrand Douillard, Rami Al-Rfou, Cheolho Park, Benjamin Sapp

    Abstract: Behavior prediction models have proliferated in recent years, especially in the popular real-world robotics application of autonomous driving, where representing the distribution over possible futures of moving agents is essential for safe and comfortable motion planning. In these models, the choice of coordinate frames to represent inputs and outputs has crucial trade offs which broadly fall into… ▽ More

    Submitted 10 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted at ICRA 2022

  9. arXiv:2110.07904  [pdf, other

    cs.CL

    SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

    Authors: Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, Daniel Cer

    Abstract: There has been growing interest in parameter-efficient methods to apply pre-trained language models to downstream tasks. Building on the Prompt Tuning approach of Lester et al. (2021), which learns task-specific soft prompts to condition a frozen pre-trained model to perform different tasks, we propose a novel prompt-based transfer learning approach called SPoT: Soft Prompt Transfer. SPoT first le… ▽ More

    Submitted 16 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted as a main conference paper at ACL 2022, 21 pages, 8 figures, 7 tables

  10. arXiv:2106.02171  [pdf, other

    cs.CL

    nmT5 -- Is parallel data still relevant for pre-training massively multilingual language models?

    Authors: Mihir Kale, Aditya Siddhant, Noah Constant, Melvin Johnson, Rami Al-Rfou, Linting Xue

    Abstract: Recently, mT5 - a massively multilingual version of T5 - leveraged a unified text-to-text format to attain state-of-the-art results on a wide variety of multilingual NLP tasks. In this paper, we investigate the impact of incorporating parallel data into mT5 pre-training. We find that multi-tasking language modeling with objectives such as machine translation during pre-training is a straightforwar… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  11. arXiv:2105.13626  [pdf, other

    cs.CL

    ByT5: Towards a token-free future with pre-trained byte-to-byte models

    Authors: Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel

    Abstract: Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: they can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pi… ▽ More

    Submitted 7 March, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: To be published in TACL 2022

  12. arXiv:2104.08691  [pdf, other

    cs.CL

    The Power of Scale for Parameter-Efficient Prompt Tuning

    Authors: Brian Lester, Rami Al-Rfou, Noah Constant

    Abstract: In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3'… ▽ More

    Submitted 2 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021

  13. arXiv:2010.12688  [pdf, other

    cs.CL

    Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

    Authors: Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou

    Abstract: Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domain-specific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, large-scale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wiki… ▽ More

    Submitted 13 March, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NAACL 2021

  14. arXiv:2010.11934  [pdf, other

    cs.CL

    mT5: A massively multilingual pre-trained text-to-text transformer

    Authors: Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel

    Abstract: The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its s… ▽ More

    Submitted 11 March, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  15. arXiv:2004.05484  [pdf, other

    cs.CL cs.LG

    LAReQA: Language-agnostic answer retrieval from a multilingual pool

    Authors: Uma Roy, Noah Constant, Rami Al-Rfou, Aditya Barua, Aaron Phillips, Yinfei Yang

    Abstract: We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strateg… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

  16. arXiv:1908.10322  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Bridging the Gap for Tokenizer-Free Language Models

    Authors: Dokook Choe, Rami Al-Rfou, Mandy Guo, Heeyoung Lee, Noah Constant

    Abstract: Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the model is essential to achieving competitive results. In this paper, we show that contrary to this conventional wisdom, tokenizer-free LMs with sufficient capacity… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  17. arXiv:1904.09671  [pdf, other

    cs.LG cs.IR cs.SI stat.ML

    DDGK: Learning Graph Representations for Deep Divergence Graph Kernels

    Authors: Rami Al-Rfou, Dustin Zelle, Bryan Perozzi

    Abstract: Can neural networks learn to compare graphs without feature engineering? In this paper, we show that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision (i.e.\ feature engineering or labeled graphs). We propose Deep Divergence Graph Kernels, an unsupervised method for learning representations over graphs that encodes a relaxed notion of graph… ▽ More

    Submitted 21 April, 2019; originally announced April 2019.

    Comments: www '19

    Journal ref: Proceedings of the 2019 World Wide Web Conference (WWW '19), May 13--17, 2019, San Francisco, CA, USA

  18. arXiv:1808.04444  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Character-Level Language Modeling with Deeper Self-Attention

    Authors: Rami Al-Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones

    Abstract: LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large… ▽ More

    Submitted 10 December, 2018; v1 submitted 9 August, 2018; originally announced August 2018.

    Comments: 8 pages, 7 figures

  19. arXiv:1808.02590  [pdf, other

    cs.SI

    A Tutorial on Network Embeddings

    Authors: Haochen Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena

    Abstract: Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: 23 pages, 6 figures

  20. arXiv:1710.09599  [pdf, other

    cs.LG cs.SI stat.ML

    Watch Your Step: Learning Node Embeddings via Graph Attention

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi

    Abstract: Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In… ▽ More

    Submitted 12 September, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

  21. CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks

    Authors: Mustafa Mustafa, Deborah Bard, Wahid Bhimji, Zarija Lukić, Rami Al-Rfou, Jan M. Kratochvil

    Abstract: Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulat… ▽ More

    Submitted 22 May, 2019; v1 submitted 7 June, 2017; originally announced June 2017.

    Comments: 11 pages, 8 figures

    Journal ref: Computational Astrophysics and CosmologySimulations, Data Analysis and Algorithms 2019 6:1

  22. arXiv:1705.05615  [pdf, other

    cs.LG cs.SI stat.ML

    Learning Edge Representations via Low-Rank Asymmetric Projections

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou

    Abstract: We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks. Unlike previous work, we (1) explicitly model an edge as a functio… ▽ More

    Submitted 13 September, 2017; v1 submitted 16 May, 2017; originally announced May 2017.

    Journal ref: ACM International Conference on Information and Knowledge Management, 2017

  23. arXiv:1705.00652  [pdf, other

    cs.CL

    Efficient Natural Language Response Suggestion for Smart Reply

    Authors: Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil

    Abstract: This paper presents a computationally efficient machine-learned method for natural language response suggestion. Feed-forward neural networks using n-gram embedding features encode messages into vectors which are optimized to give message-response pairs a high dot-product value. An optimized search finds response suggestions. The method is evaluated in a large-scale commercial e-mail application,… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

  24. arXiv:1611.06478  [pdf

    cs.CL cs.HC

    Visualizing Linguistic Shift

    Authors: Salman Mahmood, Rami Al-Rfou, Klaus Mueller

    Abstract: Neural network based models are a very powerful tool for creating word embeddings, the objective of these models is to group similar words together. These embeddings have been used as features to improve results in various applications such as document classification, named entity recognition, etc. Neural language models are able to learn word representations which have been used to capture semant… ▽ More

    Submitted 20 November, 2016; originally announced November 2016.

  25. arXiv:1610.06402  [pdf, other

    cs.AI cs.LG cs.NE

    A Growing Long-term Episodic & Semantic Memory

    Authors: Marc Pickett, Rami Al-Rfou, Louis Shao, Chris Tar

    Abstract: The long-term memory of most connectionist systems lies entirely in the weights of the system. Since the number of weights is typically fixed, this bounds the total amount of knowledge that can be learned and stored. Though this is not normally a problem for a neural network designed for a specific task, such a bound is undesirable for a system that continually learns over an open range of domains… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

    Comments: Submission to NIPS workshop on Continual Learning. 4 page extended abstract plus 5 more pages of references, figures, and supplementary material

  26. arXiv:1606.00372  [pdf, other

    cs.CL cs.LG

    Conversational Contextual Cues: The Case of Personalization and History for Response Ranking

    Authors: Rami Al-Rfou, Marc Pickett, Javier Snaider, Yun-hsuan Sung, Brian Strope, Ray Kurzweil

    Abstract: We investigate the task of modeling open-domain, multi-turn, unstructured, multi-participant, conversational dialogue. We specifically study the effect of incorporating different elements of the conversation. Unlike previous efforts, which focused on modeling messages and responses, we extend the modeling to long context and participant's history. Our system does not rely on handwritten rules or e… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

    Comments: 10 pages, 6 figures

  27. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  28. arXiv:1411.3315  [pdf, other

    cs.CL cs.IR cs.LG

    Statistically Significant Detection of Linguistic Change

    Authors: Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena

    Abstract: We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. Such linguistic shifts are especially prevalent on the Internet, where the rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis approach constructs property time series of word usage, and then uses statistically sound change poi… ▽ More

    Submitted 12 November, 2014; originally announced November 2014.

    Comments: 11 pages, 7 figures, 4 tables

    ACM Class: H.3.3; I.2.6

  29. arXiv:1410.3791  [pdf, other

    cs.CL cs.LG

    POLYGLOT-NER: Massive Multilingual Named Entity Recognition

    Authors: Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena

    Abstract: The increasing diversity of languages used on the web introduces a new level of complexity to Information Retrieval (IR) systems. We can no longer assume that textual content is written in one language or even the same language family. In this paper, we demonstrate how to build massive multilingual annotators with minimal human expertise and intervention. We describe a system that builds Named Ent… ▽ More

    Submitted 14 October, 2014; originally announced October 2014.

    Comments: 9 pages, 4 figures, 5 tables

    ACM Class: I.2.7; I.2.6

  30. DeepWalk: Online Learning of Social Representations

    Authors: Bryan Perozzi, Rami Al-Rfou, Steven Skiena

    Abstract: We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses loca… ▽ More

    Submitted 27 June, 2014; v1 submitted 26 March, 2014; originally announced March 2014.

    Comments: 10 pages, 5 figures, 4 tables

    ACM Class: H.2.8; I.2.6; I.5.1

  31. arXiv:1403.1252  [pdf, other

    cs.LG cs.CL cs.SI

    Inducing Language Networks from Continuous Space Word Representations

    Authors: Bryan Perozzi, Rami Al-Rfou, Vivek Kulkarni, Steven Skiena

    Abstract: Recent advancements in unsupervised feature learning have developed powerful latent representations of words. However, it is still not clear what makes one representation better than another and how we can learn the ideal representation. Understanding the structure of latent spaces attained is key to any future advancement in unsupervised learning. In this work, we introduce a new view of continuo… ▽ More

    Submitted 27 June, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

    Comments: 14 pages

  32. arXiv:1307.1662  [pdf, other

    cs.CL cs.LG

    Polyglot: Distributed Word Representations for Multilingual NLP

    Authors: Rami Al-Rfou, Bryan Perozzi, Steven Skiena

    Abstract: Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subs… ▽ More

    Submitted 27 June, 2014; v1 submitted 5 July, 2013; originally announced July 2013.

    Comments: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'2013

  33. arXiv:1301.3226  [pdf, ps, other

    cs.LG cs.CL stat.ML

    The Expressive Power of Word Embeddings

    Authors: Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena

    Abstract: We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and synonym/antonym relations shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking… ▽ More

    Submitted 29 May, 2013; v1 submitted 14 January, 2013; originally announced January 2013.

    Comments: submitted to ICML 2013, Deep Learning for Audio, Speech and Language Processing Workshop. 8 pages, 8 figures