Skip to main content

Showing 1–14 of 14 results for author: Oore, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17229  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

    Authors: Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore

    Abstract: Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better util… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2406.16000  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Predicting Individual Depression Symptoms from Acoustic Features During Speech

    Authors: Sebastian Rodriguez, Sri Harsha Dumpala, Katerina Dikaios, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Current automatic depression detection systems provide predictions directly without relying on the individual symptoms/items of depression as denoted in the clinical depression rating scales. In contrast, clinicians assess each item in the depression rating scale in a clinical setting, thus implicitly providing a more detailed rationale for a depression diagnosis. In this work, we make a first ste… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2404.05071  [pdf, other

    cs.LG cs.SD eess.AS

    Test-Time Training for Depression Detection

    Authors: Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore

    Abstract: Previous works on depression detection use datasets collected in similar environments to train and test the models. In practice, however, the train and test distributions cannot be guaranteed to be identical. Distribution shifts can be introduced due to variations such as recording environment (e.g., background noise) and demographics (e.g., gender, age, etc). Such distributional shifts can surpri… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  4. arXiv:2402.14285  [pdf, other

    cs.SD cs.LG eess.AS

    Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

    Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

    Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui… ▽ More

    Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 (Oral)

  5. arXiv:2312.04690  [pdf, other

    cs.HC cs.AI cs.SD eess.AS

    SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

    Authors: Stephen Brade, Bryan Wang, Mauricio Sousa, Gregory Lee Newsome, Sageev Oore, Tovi Grossman

    Abstract: Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe -- a fullstack system that uses multimodal deep learning to let users express their… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  6. arXiv:2309.10930  [pdf, other

    cs.SD cs.LG eess.AS

    Test-Time Training for Speech

    Authors: Sri Harsha Dumpala, Chandramouli Sastry, Sageev Oore

    Abstract: In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In ou… ▽ More

    Submitted 28 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  7. arXiv:2207.12816  [pdf, other

    cs.CR cs.SD eess.AS

    Generative Extraction of Audio Classifiers for Speaker Identification

    Authors: Tejumade Afonja, Lucas Bourtoule, Varun Chandrasekaran, Sageev Oore, Nicolas Papernot

    Abstract: It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the attacker attempts to steal a victim's model by training a surrogate model to mimic the decision boundaries of the victim model. Previous works have demonstrated the… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  8. arXiv:2202.09648  [pdf, other

    cs.LG cs.CV eess.SP

    Echofilter: A Deep Learning Segmentation Model Improves the Automation, Standardization, and Timeliness for Post-Processing Echosounder Data in Tidal Energy Streams

    Authors: Scott C. Lowe, Louise P. McGarry, Jessica Douglas, Jason Newport, Sageev Oore, Christopher Whidden, Daniel J. Hasselman

    Abstract: Understanding the abundance and distribution of fish in tidal energy streams is important to assess risks presented by introducing tidal energy devices to the habitat. However tidal current flows suitable for tidal energy are often highly turbulent, complicating the interpretation of echosounder data. The portion of the water column contaminated by returns from entrained air must be excluded from… ▽ More

    Submitted 18 August, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Journal ref: Front. Mar. Sci. 9:867857 (2022)

  9. arXiv:2108.01043  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Musical Speech: A Transformer-based Composition Tool

    Authors: Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Dani Oore, Sageev Oore

    Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our p… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: NeurIPS 2020 Demonstration Track; extended for PMLR

  10. arXiv:2107.13969  [pdf, other

    cs.CY cs.LG cs.SD eess.AS

    Significance of Speaker Embeddings and Temporal Context for Depression Detection

    Authors: Sri Harsha Dumpala, Sebastian Rodriguez, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

  11. arXiv:1907.04352  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring Conditioning for Generative Music Systems with Human-Interpretable Controls

    Authors: Nicholas Meade, Nicholas Barreyre, Scott C. Lowe, Sageev Oore

    Abstract: Performance RNN is a machine-learning system designed primarily for the generation of solo piano performances using an event-based (rather than audio) representation. More specifically, Performance RNN is a long short-term memory (LSTM) based recurrent neural network that models polyphonic music with expressive timing and dynamics (Oore et al., 2018). The neural network uses a simple language mode… ▽ More

    Submitted 3 August, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Journal ref: International Conference on Computational Creativity, 2019

  12. arXiv:1811.09620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

    Authors: Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse

    Abstract: In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having… ▽ More

    Submitted 22 October, 2023; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: 17 pages, published as a conference paper at ICLR 2019

    Journal ref: ICLR 2019

  13. arXiv:1808.03715  [pdf, ps, other

    cs.SD cs.LG eess.AS

    This Time with Feeling: Learning Expressive Musical Performance

    Authors: Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan

    Abstract: Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set need… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: Includes links to urls for audio samples

  14. arXiv:1710.11153  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Onsets and Frames: Dual-Objective Piano Transcription

    Authors: Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

    Abstract: We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note t… ▽ More

    Submitted 5 June, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Examples available at https://goo.gl/magenta/onsets-frames-examples