Skip to main content

Showing 1–25 of 25 results for author: Rudovic, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09617  [pdf, other

    cs.CL cs.HC eess.AS

    Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

    Authors: Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik

    Abstract: Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2310.15261  [pdf, ps, other

    cs.SD cs.HC cs.LG eess.AS

    Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

    Authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

    Abstract: Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 5 pages

  3. arXiv:2210.12134  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

    Authors: Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

    Abstract: Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  4. arXiv:2203.15975  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

    Authors: Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

    Abstract: We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH 2022

  5. arXiv:2110.04656  [pdf, other

    cs.SD cs.LG eess.AS

    Streaming on-device detection of device directed speech from voice and touch-based invocation

    Authors: Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

    Abstract: When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-t… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

  6. arXiv:2101.10580  [pdf, other

    cs.RO cs.HC

    Toward Personalized Affect-Aware Socially Assistive Robot Tutors in Long-Term Interventions for Children with Autism

    Authors: Zhonghao Shi, Thomas R Groechel, Shomik Jain, Kourtney Chima, Ognjen Rudovic, Maja J Matarić

    Abstract: Affect-aware socially assistive robotics (SAR) has shown great potential for augmenting interventions for children with autism spectrum disorders (ASD). However, current SAR cannot yet perceive the unique and diverse set of atypical cognitive-affective behaviors from children with ASD in an automatic and personalized fashion in long-term (multi-session) real-world interactions. To bridge this gap,… ▽ More

    Submitted 29 January, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

  7. arXiv:2101.04800  [pdf, other

    cs.LG cs.CV

    Personalized Federated Deep Learning for Pain Estimation From Face Images

    Authors: Ognjen Rudovic, Nicolas Tobis, Sebastian Kaltwang, Björn Schuller, Daniel Rueckert, Jeffrey F. Cohn, Rosalind W. Picard

    Abstract: Standard machine learning approaches require centralizing the users' data in one computer or a shared database, which raises data privacy and confidentiality concerns. Therefore, limiting central access is important, especially in healthcare settings, where data regulations are strict. A potential approach to tackling this is Federated Learning (FL), which enables multiple parties to collaborative… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 12 pages, 6 figures

  8. arXiv:1909.12158  [pdf, other

    cs.CV

    Fast and Effective Adaptation of Facial Action Unit Detection Deep Model

    Authors: Mihee Lee, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

    Abstract: Detecting facial action units (AU) is one of the fundamental steps in automatic recognition of facial expression of emotions and cognitive states. Though there have been a variety of approaches proposed for this task, most of these models are trained only for the specific target AUs, and as such they fail to easily adapt to the task of recognition of new AUs (i.e., those not initially used to trai… ▽ More

    Submitted 27 November, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Presented at 2019 IJCAI Affective Computing Workshop

  9. arXiv:1906.03098  [pdf, other

    cs.LG cs.AI cs.HC cs.RO stat.ML

    Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach

    Authors: Ognjen Rudovic, Meiru Zhang, Bjorn Schuller, Rosalind W. Picard

    Abstract: Human behavior expression and experience are inherently multi-modal, and characterized by vast individual and contextual heterogeneity. To achieve meaningful human-computer and human-robot interactions, multi-modal models of the users states (e.g., engagement) are therefore needed. Most of the existing works that try to build classifiers for the users states assume that the data to train the model… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  10. arXiv:1904.09370  [pdf, other

    cs.LG stat.ML

    Meta-Weighted Gaussian Process Experts for Personalized Forecasting of AD Cognitive Changes

    Authors: Ognjen Rudovic, Yuria Utsumi, Ricardo Guerrero, Kelly Peterson, Daniel Rueckert, Rosalind W. Picard

    Abstract: We introduce a novel personalized Gaussian Process Experts (pGPE) model for predicting per-subject ADAS-Cog13 cognitive scores -- a significant predictor of Alzheimer's Disease (AD) in the cognitive domain -- over the future 6, 12, 18, and 24 months. We start by training a population-level model using multi-modal data from previously seen subjects using a base Gaussian Process (GP) regression. The… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Journal ref: Machine Learning for Healthcare Conference (ML4HC2019)

  11. arXiv:1810.11547  [pdf, other

    cs.CV

    Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach

    Authors: Behnam Gholami, Pritish Sahu, Ognjen Rudovic, Konstantinos Bousmalis, Vladimir Pavlovic

    Abstract: Unsupervised domain adaptation (uDA) models focus on pairwise adaptation settings where there is a single, labeled, source and a single target domain. However, in many real-world settings one seeks to adapt to multiple, but somewhat similar, target domains. Applying pairwise adaptation approaches to this setting may be suboptimal, as they fail to leverage shared information among multiple domains.… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: 19 pages, 5 Figures, 5 Tables

  12. Multi-Instance Dynamic Ordinal Random Fields for Weakly-supervised Facial Behavior Analysis

    Authors: Adria Ruiz, Ognjen Rudovic, Xavier Binefa, Maja Pantic

    Abstract: We propose a Multi-Instance-Learning (MIL) approach for weakly-supervised learning problems, where a training set is formed by bags (sets of feature vectors or instances) and only labels at bag-level are provided. Specifically, we consider the Multi-Instance Dynamic-Ordinal-Regression (MI-DOR) setting, where the instance labels are naturally represented as ordinal variables and bags are structured… ▽ More

    Submitted 28 February, 2018; originally announced March 2018.

    Comments: submitted TIP (June 2017). arXiv admin note: text overlap with arXiv:1609.01465

  13. arXiv:1802.08561  [pdf, other

    cs.LG stat.AP

    Personalized Gaussian Processes for Forecasting of Alzheimer's Disease Assessment Scale-Cognition Sub-Scale (ADAS-Cog13)

    Authors: Yuria Utsumi, Ognjen Rudovic, Kelly Peterson, Ricardo Guerrero, Rosalind W. Picard

    Abstract: In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict per-patient changes in ADAS-Cog13 -- a significant predictor of Alzheimer's Disease (AD) in the cognitive domain -- using data from each patient's previous visits, and testing on future (held-out) data. We start by learning a population-level model using multi-modal data from previously seen patients usin… ▽ More

    Submitted 4 May, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: International Engineering in Medicine and Biology Conference (EMBC) 2018 - accepted. 5 pages. arXiv admin note: text overlap with arXiv:1712.00181

  14. arXiv:1802.04480  [pdf, other

    cs.RO cs.HC

    RoboChain: A Secure Data-Sharing Framework for Human-Robot Interaction

    Authors: Eduardo Castelló Ferrer, Ognjen Rudovic, Thomas Hardjono, Alex Pentland

    Abstract: Robots have potential to revolutionize the way we interact with the world around us. One of their largest potentials is in the domain of mobile health where they can be used to facilitate clinical interventions. However, to accomplish this, robots need to have access to our private data in order to learn from these data and improve their interaction capabilities. Furthermore, to enhance this learn… ▽ More

    Submitted 26 March, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: 7 pages, 6 figures

    ACM Class: I.2.9; I.2.11; I.2.6; C.2.4; H.3.2; H.3.3; H.5.0; J.3; K.4.2; K.6.5

  15. arXiv:1802.01186   

    cs.RO cs.AI cs.CV cs.HC

    Personalized Machine Learning for Robot Perception of Affect and Engagement in Autism Therapy

    Authors: Ognjen Rudovic, Jaeryoung Lee, Miles Dai, Bjorn Schuller, Rosalind Picard

    Abstract: Robots have great potential to facilitate future therapies for children on the autism spectrum. However, existing robots lack the ability to automatically perceive and respond to human affect, which is necessary for establishing and maintaining engaging interactions. Moreover, their inference challenge is made harder by the fact that many individuals with autism have atypical and unusually diverse… ▽ More

    Submitted 18 June, 2018; v1 submitted 4 February, 2018; originally announced February 2018.

    Comments: The paper has undergone a major revision and its content is outdated

  16. arXiv:1712.00181  [pdf, other

    cs.LG q-bio.QM stat.ML

    Personalized Gaussian Processes for Future Prediction of Alzheimer's Disease Progression

    Authors: Kelly Peterson, Ognjen Rudovic, Ricardo Guerrero, Rosalind W. Picard

    Abstract: In this paper, we introduce the use of a personalized Gaussian Process model (pGP) to predict the key metrics of Alzheimer's Disease progression (MMSE, ADAS-Cog13, CDRSB and CS) based on each patient's previous visits. We start by learning a population-level model using multi-modal data from previously seen patients using the base Gaussian Process (GP) regression. Then, this model is adapted seque… ▽ More

    Submitted 3 May, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

    Comments: 13 pages

  17. arXiv:1711.04036  [pdf, other

    cs.AI cs.HC

    Physiological and behavioral profiling for nociceptive pain estimation using personalized multitask learning

    Authors: Daniel Lopez-Martinez, Ognjen Rudovic, Rosalind Picard

    Abstract: Pain is a subjective experience commonly measured through patient's self report. While there exist numerous situations in which automatic pain estimation methods may be preferred, inter-subject variability in physiological and behavioral pain responses has hindered the development of such methods. In this work, we address this problem by introducing a novel personalized multitask machine learning… ▽ More

    Submitted 10 November, 2017; originally announced November 2017.

    Comments: NIPS Machine Learning for Health 2017

  18. arXiv:1710.00018  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Unsupervised Domain Adaptation with Copula Models

    Authors: Cuong D. Tran, Ognjen Rudovic, Vladimir Pavlovic

    Abstract: We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predic… ▽ More

    Submitted 29 September, 2017; originally announced October 2017.

    Comments: IEEE International Workshop On Machine Learning for Signal Processing 2017

  19. arXiv:1706.07154  [pdf, other

    cs.CV

    Personalized Automatic Estimation of Self-reported Pain Intensity from Facial Expressions

    Authors: Daniel Lopez Martinez, Ognjen Rudovic, Rosalind Picard

    Abstract: Pain is a personal, subjective experience that is commonly evaluated through visual analog scales (VAS). While this is often convenient and useful, automatic pain detection systems can reduce pain score acquisition efforts in large-scale studies by estimating it directly from the participants' facial expressions. In this paper, we propose a novel two-stage learning approach for VAS estimation: fir… ▽ More

    Submitted 23 June, 2017; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: Computer Vision and Pattern Recognition Conference, The 1st International Workshop on Deep Affective Learning and Context Modeling

  20. arXiv:1704.02206  [pdf, other

    cs.CV

    DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding

    Authors: Dieu Linh Tran, Robert Walecki, Ognjen Rudovic, Stefanos Eleftheriadis, Bjørn Schuller, Maja Pantic

    Abstract: Human face exhibits an inherent hierarchy in its representations (i.e., holistic facial expressions can be encoded via a set of facial action units (AUs) and their intensity). Variational (deep) auto-encoders (VAE) have shown great results in unsupervised extraction of hierarchical latent representations from large amounts of image data, while being robust to noise and other undesired artifacts. P… ▽ More

    Submitted 5 August, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

    Comments: ICCV 2017 - accepted

  21. arXiv:1609.01465  [pdf, ps, other

    cs.CV

    Multi-instance Dynamic Ordinal Random Fields for Weakly-Supervised Pain Intensity Estimation

    Authors: Adria Ruiz, Ognjen Rudovic, Xavier Binefa, Maja Pantic

    Abstract: In this paper, we address the Multi-Instance-Learning (MIL) problem when bag labels are naturally represented as ordinal variables (Multi--Instance--Ordinal Regression). Moreover, we consider the case where bags are temporal sequences of ordinal instances. To model this, we propose the novel Multi-Instance Dynamic Ordinal Random Fields (MI-DORF). In this model, we treat instance-labels inside the… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  22. arXiv:1608.04664  [pdf, other

    stat.ML cs.CV

    Variational Gaussian Process Auto-Encoder for Ordinal Prediction of Facial Action Units

    Authors: Stefanos Eleftheriadis, Ognjen Rudovic, Marc P. Deisenroth, Maja Pantic

    Abstract: We address the task of simultaneous feature fusion and modeling of discrete ordinal outputs. We propose a novel Gaussian process(GP) auto-encoder modeling approach. In particular, we introduce GP encoders to project multiple observed features onto a latent space, while GP decoders are responsible for reconstructing the original features. Inference is performed in a novel variational framework, whe… ▽ More

    Submitted 5 September, 2016; v1 submitted 16 August, 2016; originally announced August 2016.

  23. arXiv:1604.02917  [pdf, other

    stat.ML cs.CV cs.LG

    Gaussian Process Domain Experts for Model Adaptation in Facial Behavior Analysis

    Authors: Stefanos Eleftheriadis, Ognjen Rudovic, Marc P. Deisenroth, Maja Pantic

    Abstract: We present a novel approach for supervised domain adaptation that is based upon the probabilistic framework of Gaussian processes (GPs). Specifically, we introduce domain-specific GPs as local experts for facial expression classification from face images. The adaptation of the classifier is facilitated in probabilistic fashion by conditioning the target expert on multiple source experts. Furthermo… ▽ More

    Submitted 2 May, 2016; v1 submitted 11 April, 2016; originally announced April 2016.

  24. arXiv:1510.03909  [pdf, other

    cs.CV cs.HC

    Variable-state Latent Conditional Random Fields for Facial Expression Recognition and Action Unit Detection

    Authors: Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

    Abstract: Automated recognition of facial expressions of emotions, and detection of facial action units (AUs), from videos depends critically on modeling of their dynamics. These dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs, the appearance of which may vary considerably among target subjects, making the recognition/detection task v… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

  25. arXiv:1301.5063   

    cs.CV cs.LG stat.ML

    Heteroscedastic Conditional Ordinal Random Fields for Pain Intensity Estimation from Facial Images

    Authors: Ognjen Rudovic, Maja Pantic, Vladimir Pavlovic

    Abstract: We propose a novel method for automatic pain intensity estimation from facial images based on the framework of kernel Conditional Ordinal Random Fields (KCORF). We extend this framework to account for heteroscedasticity on the output labels(i.e., pain intensity scores) and introduce a novel dynamic features, dynamic ranks, that impose temporal ordinal constraints on the static ranks (i.e., intensi… ▽ More

    Submitted 3 April, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

    Comments: This paper has been withdrawn by the authors due to a crucial sign error in equation 2&3