Skip to main content

Showing 1–10 of 10 results for author: Ahuja, C

.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2301.05339  [pdf, other

    cs.GR cs.CV cs.HC cs.LG

    A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

    Authors: Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, Michael Neff

    Abstract: Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic n… ▽ More

    Submitted 10 April, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Accepted for EUROGRAPHICS 2023

    ACM Class: I.3.7

  3. arXiv:2208.08080  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MM

    Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

    Authors: Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency

    Abstract: Lecture slide presentations, a sequence of pages that contain text and figures accompanied by speech, are constructed and presented carefully in order to optimally transfer knowledge to students. Previous studies in multimedia and psychology attribute the effectiveness of lecture presentations to their multimodal nature. As a step toward develo** AI to aid in student learning as intelligent teac… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 9 pages, 5 figures

  4. arXiv:2007.12553  [pdf, other

    cs.CV cs.RO

    Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach

    Authors: Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency

    Abstract: How can we teach robots or virtual assistants to gesture naturally? Can we go further and adapt the gesturing style to follow a specific speaker? Gestures that are naturally timed with corresponding speech during human communication are called co-speech gestures. A key challenge, called gesture style transfer, is to learn a model that generates these gestures for a speaking agent 'A' in the gestur… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: 24 pages, 12 figures

    Journal ref: European Conference on Computer Vision 2020

  5. arXiv:1910.02181  [pdf, other

    cs.CV cs.AI

    To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

    Authors: Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

    Abstract: Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar's speech a… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  6. arXiv:1907.01108  [pdf, other

    cs.CV cs.CL

    Language2Pose: Natural Language Grounded Pose Forecasting

    Authors: Chaitanya Ahuja, Louis-Philippe Morency

    Abstract: Generating animations from natural language sentences finds its applications in a a number of domains such as movie script visualization, virtual human animation and, robot motion planning. These sentences can describe different kinds of actions, speeds and direction of these actions, and possibly a target destination. The core modeling challenge in this language-to-pose application is how to map… ▽ More

    Submitted 27 November, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

  7. arXiv:1809.06215  [pdf

    cs.CV

    An Automatic Method for Complete Brain Matter Segmentation from Multislice CT scan

    Authors: Soumi Ray, Vinod Kumar, Chirag Ahuja, Niranjan Khandelwal

    Abstract: Computed tomography imaging is well accepted for its imaging speed, image contrast & resolution and cost. Thus it has wide use in detection and diagnosis of brain diseases. But unfortunately reported works on CT segmentation is not very significant. In this paper, a robust automatic segmentation system is presented which is capable of segment complete brain matter from CT slices, without any lose… ▽ More

    Submitted 22 October, 2018; v1 submitted 11 September, 2018; originally announced September 2018.

  8. arXiv:1710.02254  [pdf, other

    cs.LG cs.AI cs.NE

    Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

    Authors: Chaitanya Ahuja, Louis-Philippe Morency

    Abstract: Recurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but cou… ▽ More

    Submitted 22 November, 2017; v1 submitted 5 October, 2017; originally announced October 2017.

    Comments: 8 pages, 7 figures

  9. arXiv:1705.09406  [pdf, other

    cs.LG

    Multimodal Machine Learning: A Survey and Taxonomy

    Authors: Tadas BaltruĊĦaitis, Chaitanya Ahuja, Louis-Philippe Morency

    Abstract: Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able… ▽ More

    Submitted 1 August, 2017; v1 submitted 25 May, 2017; originally announced May 2017.

  10. arXiv:1411.6741  [pdf, other

    cs.SD

    A Complex Matrix Factorization approach to Joint Modeling of Magnitude and Phase for Source Separation

    Authors: Chaitanya Ahuja, Karan Nathwani, Rajesh M. Hegde

    Abstract: Conventional NMF methods for source separation factorize the matrix of spectral magnitudes. Spectral Phase is not included in the decomposition process of these methods. However, phase of the speech mixture is generally used in reconstructing the target speech signal. This results in undesired traces of interfering sources in the target signal. In this paper the spectral phase is incorporated in t… ▽ More

    Submitted 25 November, 2014; originally announced November 2014.

    Comments: 5 pages, 3 figures