Skip to main content

Showing 1–2 of 2 results for author: Khosravani, H

.
  1. arXiv:2406.06612  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

    Authors: Rishit Dagli, Shivesh Prakash, Robert Wu, Houman Khosravani

    Abstract: Generating combined visual and auditory sensory experiences is critical for the consumption of immersive content. Recent advances in neural generative models have enabled the creation of high-resolution content across multiple modalities such as images, text, speech, and videos. Despite these successes, there remains a significant gap in the generation of high-quality spatial audio that complement… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://see2sound.github.io/

  2. arXiv:2402.10100  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

    Authors: Hamza Mahdi, Eptehal Nashnoush, Rami Saab, Arjun Balachandar, Rishit Dagli, Lucas X. Perri, Houman Khosravani

    Abstract: This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST, and compare them against pre-trained audio models such as YAMNet and VGGish. Our method highlights the benefits of pre-… ▽ More

    Submitted 5 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: CHIL 2024