Skip to main content

Showing 1–9 of 9 results for author: Guha, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2303.02665  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Heterogeneous Graph Learning for Acoustic Event Classification

    Authors: Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha

    Abstract: Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities. This makes modeling audiovisual data using heterogeneous graphs an attractive option. However, graph structure does not appear naturally in audiovisual data. Graphs for audiovisual data are constructed manually which is both difficult and sub-optimal. In this work, we address… ▽ More

    Submitted 12 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.07935

  2. arXiv:2207.07935  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Visually-aware Acoustic Event Detection using Heterogeneous Graphs

    Authors: Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha

    Abstract: Perception of auditory events is inherently multimodal relying on both audio and visual cues. A large number of existing multimodal approaches process each modality using modality-specific models and then fuse the embeddings to encode the joint information. In contrast, we employ heterogeneous graphs to explicitly capture the spatial and temporal relationships between the modalities and represent… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

  3. arXiv:2008.02661  [pdf, other

    cs.CV cs.MM eess.AS

    Dynamic Emotion Modeling with Learnable Graphs and Graph Inception Network

    Authors: A. Shirian, S. Tripathi, T. Guha

    Abstract: Human emotion is expressed, perceived and captured using a variety of dynamic data modalities, such as speech (verbal), videos (facial expressions) and motion sensors (body gestures). We propose a generalized approach to emotion recognition that can adapt across modalities by modeling dynamic data as structured graphs. The motivation behind the graph approach is to build compact models without com… ▽ More

    Submitted 8 February, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

    Journal ref: 10.1109/TMM.2021.3059169

  4. arXiv:2008.02063  [pdf, other

    cs.CV cs.LG eess.AS

    Compact Graph Architecture for Speech Emotion Recognition

    Authors: A. Shirian, T. Guha

    Abstract: We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an… ▽ More

    Submitted 2 February, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

  5. arXiv:2006.03898  [pdf, other

    cs.CV cs.MM eess.IV

    Ensemble Network for Ranking Images Based on Visual Appeal

    Authors: Sachin Singh, Victor Sanchez, Tanaya Guha

    Abstract: We propose a computational framework for ranking images (group photos in particular) taken at the same event within a short time span. The ranking is expected to correspond with human perception of overall appeal of the images. We hypothesize and provide evidence through subjective analysis that the factors that appeal to humans are its emotional content, aesthetics and image quality. We propose a… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

  6. arXiv:1910.08732  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

    Authors: Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, Gaurav Sharma

    Abstract: We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos. ZSL has been studied extensively in the recent past but has primarily been limited to visual modality and to images. We demonstrate that both audio and visual modalities are important for ZSL for videos. Since a dataset to study the task is currently not available, we… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

    Comments: To appear in WACV 2020, Project Page: https://cse.iitk.ac.in/users/kranti/avzsl.html

  7. arXiv:1904.00150  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Learning Affective Correspondence between Music and Image

    Authors: Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

    Abstract: We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to estimate this crossmodal, emotion-centric similarity, we propose a deep neural network architecture that learns to project the data from the two mo… ▽ More

    Submitted 16 April, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: 5 pages, International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

  8. arXiv:1712.04753  [pdf, other

    eess.AS cs.CL cs.HC cs.SD

    Learning Spontaneity to Improve Emotion Recognition In Speech

    Authors: Karttikeya Mangalam, Tanaya Guha

    Abstract: We investigate the effect and usefulness of spontaneity (i.e. whether a given speech is spontaneous or not) in speech in the context of emotion recognition. We hypothesize that emotional content in speech is interrelated with its spontaneity, and use spontaneity classification as an auxiliary task to the problem of emotion recognition. We propose two supervised learning settings that utilize spont… ▽ More

    Submitted 13 June, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

    Comments: Accepted at Interspeech 2018

  9. arXiv:1306.2727  [pdf, other

    cs.CV cs.MM eess.IV

    Sparse Representation-based Image Quality Assessment

    Authors: Tanaya Guha, Ehsan Nezhadarya, Rabab K Ward

    Abstract: A successful approach to image quality assessment involves comparing the structural information between a distorted and its reference image. However, extracting structural information that is perceptually important to our visual system is a challenging task. This paper addresses this issue by employing a sparse representation-based approach and proposes a new metric called the \emph{sparse represe… ▽ More

    Submitted 12 June, 2013; originally announced June 2013.

    Comments: 10 pages, 3 figures, 3 tables, submitted to a journal