Search | arXiv e-print repository

Test-Time Training for Depression Detection

Authors: Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore

Abstract: Previous works on depression detection use datasets collected in similar environments to train and test the models. In practice, however, the train and test distributions cannot be guaranteed to be identical. Distribution shifts can be introduced due to variations such as recording environment (e.g., background noise) and demographics (e.g., gender, age, etc). Such distributional shifts can surpri… ▽ More Previous works on depression detection use datasets collected in similar environments to train and test the models. In practice, however, the train and test distributions cannot be guaranteed to be identical. Distribution shifts can be introduced due to variations such as recording environment (e.g., background noise) and demographics (e.g., gender, age, etc). Such distributional shifts can surprisingly lead to severe performance degradation of the depression detection models. In this paper, we analyze the application of test-time training (TTT) to improve robustness of models trained for depression detection. When compared to regular testing of the models, we find TTT can significantly improve the robustness of the model under a variety of distributional shifts introduced due to: (a) background-noise, (b) gender-bias, and (c) data collection and curation procedure (i.e., train and test samples are from separate datasets). △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2402.14285 [pdf, other]

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui… ▽ More We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/. △ Less

Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: ICML 2024 (Oral)

arXiv:2309.10930 [pdf, other]

Test-Time Training for Speech

Authors: Sri Harsha Dumpala, Chandramouli Sastry, Sageev Oore

Abstract: In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In ou… ▽ More In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In our experiments that include distribution shifts due to background noise and natural variations in speech such as gender and age, we identify some key-challenges with TTT including sensitivity to optimization hyperparameters (e.g., number of optimization steps and subset of parameters chosen for TTT) and scalability (e.g., as each example gets its own set of parameters, TTT is not scalable). Finally, we propose using BitFit -- a parameter-efficient fine-tuning algorithm proposed for text applications that only considers the bias parameters for fine-tuning -- as a solution to the aforementioned challenges and demonstrate that it is consistently more stable than fine-tuning all the parameters of the model. △ Less

Submitted 28 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2108.01043 [pdf, other]

Musical Speech: A Transformer-based Composition Tool

Authors: Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Dani Oore, Sageev Oore

Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our p… ▽ More In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our proposed pipeline. This pipeline begins with speech-based signal processing, after which some simple musical heuristics are applied, and finally these pre-processed signals are passed through Transformer models trained on new musical tasks. We illustrate the effectiveness of our pipeline -- which does not require a paired dataset for training -- through examples of music created by musicians making use of our tool. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: NeurIPS 2020 Demonstration Track; extended for PMLR

arXiv:1807.03232 [pdf, ps, other]

Robust Heartbeat Detection from Multimodal Data via CNN-based Generalizable Information Fusion

Authors: B S Chandra, C S Sastry, S Jana

Abstract: Objective: Heartbeat detection remains central to cardiac disease diagnosis and management, and is traditionally performed based on electrocardiogram (ECG). To improve robustness and accuracy of detection, especially, in certain critical-care scenarios, the use of additional physiological signals such as arterial blood pressure (BP) has recently been suggested. There, estimation of heartbeat locat… ▽ More Objective: Heartbeat detection remains central to cardiac disease diagnosis and management, and is traditionally performed based on electrocardiogram (ECG). To improve robustness and accuracy of detection, especially, in certain critical-care scenarios, the use of additional physiological signals such as arterial blood pressure (BP) has recently been suggested. There, estimation of heartbeat location requires information fusion from multiple signals. However, reported efforts in this direction often obtain multimodal estimates somewhat indirectly, by voting among separately obtained signal-specific intermediate estimates. In contrast, we propose to directly fuse information from multiple signals without requiring intermediate estimates, and thence estimate heartbeat location in a robust manner. Method: We propose as a heartbeat detector, a convolutional neural network (CNN) that learns fused features from multiple physiological signals. This method eliminates the need for hand-picked signal-specific features and ad hoc fusion schemes. Further, being data-driven, the same algorithm learns suitable features from arbitrary set of signals. Results: Using ECG and BP signals of PhysioNet 2014 Challenge database, we obtained a score of 94%. Further, using two ECG channels of MIT-BIH arrhythmia database, we scored 99.92\%. Both those scores compare favourably with previously reported database-specific results. Also, our detector achieved high accuracy in a variety of clinical conditions. Conclusion: The proposed CNN-based information fusion (CIF) algorithm is generalizable, robust and efficient in detecting heartbeat location from multiple signals. Significance: In medical signal monitoring systems, our technique would accurately estimate heartbeat locations even when only a subset of channels are reliable. △ Less

Submitted 29 June, 2018; originally announced July 2018.

arXiv:1806.04874 [pdf, other]

Novel Light Weight Compressed Data Aggregation Using Sparse Measurements for IoT Networks

Authors: Amarlingam M, Pradeep Kumar Mishra, P Rajalakshmi, Sumohana S. Channappayya, C. S. Sastry

Abstract: Optimal data aggregation aimed at maximizing IoT network lifetime by minimizing constrained on-board resource utilization continues to be a challenging task. The existing data aggregation methods have proven that compressed sensing is promising for data aggregation. However, they compromise either on energy efficiency or recovery fidelity and require complex on-node computations. In this paper, we… ▽ More Optimal data aggregation aimed at maximizing IoT network lifetime by minimizing constrained on-board resource utilization continues to be a challenging task. The existing data aggregation methods have proven that compressed sensing is promising for data aggregation. However, they compromise either on energy efficiency or recovery fidelity and require complex on-node computations. In this paper, we propose a novel Light Weight Compressed Data Aggregation (LWCDA) algorithm that randomly divides the entire network into non-overlap** clusters for data aggregation. The random non-overlap** clustering offers two important advantages: 1) energy efficiency, as each node has to send its measurement only to its cluster head, 2) highly sparse measurement matrix, which leads to a practically implementable framework with low complexity. We analyze the properties of our measurement matrix using restricted isometry property, the associated coherence and phase transition. Through extensive simulations on practical data, we show that the measurement matrix can reconstruct data with high fidelity. Further, we demonstrate that the LWCDA algorithm reduces transmission cost significantly against baseline approaches, implying thereby the enhancement of the network lifetime. △ Less

Submitted 13 June, 2018; originally announced June 2018.

Showing 1–6 of 6 results for author: Sastry, C