Search | arXiv e-print repository

ViLCo-Bench: VIdeo Language COntinual learning Benchmark

Authors: Tianqi Tang, Shohreh Deldari, Hao Xue, Celso De Melo, Flora D. Salim

Abstract: Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark… ▽ More Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github.com/cruiseresearchgroup/ViLCo . △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures, 8 tables, under review

arXiv:2403.10561

A collection of the accepted papers for the Human-Centric Representation Learning workshop at AAAI 2024

Authors: Dimitris Spathis, Aaqib Saeed, Ali Etemad, Sana Tonekaboni, Stefanos Laskaridis, Shohreh Deldari, Chi Ian Tang, Patrick Schwab, Shyam Tailor

Abstract: This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website. This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2307.16847 [pdf, other]

CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking

Authors: Shohreh Deldari, Dimitris Spathis, Mohammad Malekzadeh, Fahim Kawsar, Flora Salim, Akhil Mathur

Abstract: Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatilit… ▽ More Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator that can be fed to down-stream classifiers. CroSSL allows for handling missing modalities and end-to-end cross-modal learning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal) to investigate the impact of masking ratios and masking strategies for various data types and the robustness of the learned representations to missing data. Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning. Our code is open-sourced at https://github.com/dr-bell/CroSSL. △ Less

Submitted 19 February, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted in WSDM24. Short version presented in ML4MHD @ICML23

arXiv:2305.00619 [pdf, other]

Self-supervised Activity Representation Learning with Incremental Data: An Empirical Study

Authors: Jason Liu, Shohreh Deldari, Hao Xue, Van Nguyen, Flora D. Salim

Abstract: In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventi… ▽ More In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventional supervised models to address the absence of labelled datasets. This research examines the impact of using a self-supervised representation learning model for time series classification tasks in which data is incrementally available. We proposed and evaluated a workflow in which a model learns to extract informative features using a corpus of unlabeled time series data and then conducts classification on labelled data using features extracted by the model. We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets, including various types of sensors in diverse applications. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: 6 pages, accepted in the 24th IEEE International Conference on Mobile Data Management (MDM2023)

arXiv:2208.00467 [pdf, other]

doi 10.1145/3550316

COCOA: Cross Modality Contrastive Learning for Sensor Data

Authors: Shohreh Deldari, Hao Xue, Aaqib Saeed, Daniel V. Smith, Flora D. Salim

Abstract: Self-Supervised Learning (SSL) is a new paradigm for learning discriminative representations without labelled data and has reached comparable or even state-of-the-art results in comparison to supervised counterparts. Contrastive Learning (CL) is one of the most well-known approaches in SSL that attempts to learn general, informative representations of data. CL methods have been mostly developed fo… ▽ More Self-Supervised Learning (SSL) is a new paradigm for learning discriminative representations without labelled data and has reached comparable or even state-of-the-art results in comparison to supervised counterparts. Contrastive Learning (CL) is one of the most well-known approaches in SSL that attempts to learn general, informative representations of data. CL methods have been mostly developed for applications in computer vision and natural language processing where only a single sensor modality is used. A majority of pervasive computing applications, however, exploit data from a range of different sensor modalities. While existing CL methods are limited to learning from one or two data sources, we propose COCOA (Cross mOdality COntrastive leArning), a self-supervised model that employs a novel objective function to learn quality representations from multisensor data by computing the cross-correlation between different data modalities and minimizing the similarity between irrelevant instances. We evaluate the effectiveness of COCOA against eight recently introduced state-of-the-art self-supervised models, and two supervised baselines across five public datasets. We show that COCOA achieves superior classification performance to all other approaches. Also, COCOA is far more label-efficient than the other baselines including the fully supervised model using only one-tenth of available labelled data. △ Less

Submitted 3 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

Comments: 27 pages, 10 figures, 6 tables, Accepted with minor revision at IMWUT Vol. 6 No. 3

arXiv:2207.03405 [pdf, other]

Investigating the Effects of Mood & Usage Behaviour on Notification Response Time

Authors: Judith S. Heinisch, Nan Gao, Christoph Anderson, Shohreh Deldari, Klaus David, Flora Salim

Abstract: Notifications are one of the most prevailing mechanisms on smartphones and personal computers to convey timely and important information. Despite these benefits, smartphone notifications demand individuals' attention and can cause stress and frustration when delivered at inopportune timings. This paper investigates the effect of individuals' smartphone usage behavior and mood on notification respo… ▽ More Notifications are one of the most prevailing mechanisms on smartphones and personal computers to convey timely and important information. Despite these benefits, smartphone notifications demand individuals' attention and can cause stress and frustration when delivered at inopportune timings. This paper investigates the effect of individuals' smartphone usage behavior and mood on notification response time. We conduct an in-the-wild study with more than 18 participants for five weeks. Extensive experiment results show that the proposed regression model is able to accurately predict the response time of smartphone notifications using current user's mood and physiological signals. We explored the effect of different features for each participant to choose the most important user-oriented features in order to to achieve a meaningful and personalised notification response prediction. On average, our regression model achieved over all participants an MAE of 0.7764 ms and RMSE of 1.0527 ms. We also investigate how physiological signals (collected from E4 wristbands) are used as an indicator for mood and discuss the individual differences in application usage and categories of smartphone applications on the response time of notifications. Our research sheds light on the future intelligent notification management system. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.02353 [pdf, other]

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Authors: Shohreh Deldari, Hao Xue, Aaqib Saeed, Jiayuan He, Daniel V. Smith, Flora D. Salim

Abstract: Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training… ▽ More Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Acquiring annotated data can be a difficult and costly process. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models using supervisory signals that have been freely obtained from the raw data. Unlike existing reviews of SSRL that have pre-dominately focused upon methods in the fields of CV or NLP for a single modality, we aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data. To this end, we 1) provide a comprehensive categorization of existing SSRL methods, 2) introduce a generic pipeline by defining the key components of a SSRL framework, 3) compare existing models in terms of their objective function, network architecture and potential applications, and 4) review existing multimodal techniques in each category and various modalities. Finally, we present existing weaknesses and future opportunities. We believe our work develops a perspective on the requirements of SSRL in domains that utilise multimodal and/or temporal data △ Less

Submitted 7 June, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 36 pages, 5 figures, 9 tables, Survey paper

arXiv:2106.04265 [pdf, other]

doi 10.1109/MPRV.2022.3229905

Towards Social Role-Based Interruptibility Management

Authors: Christoph Anderson, Judith Simone Heinisch, Shohreh Deldari, Flora D. Salim, Sandra Ohly, Klaus David, Veljko Pejovic

Abstract: Pervasive and ubiquitous computing facilitates immediate access to information in the sense of always-on. Information such as news, messages, or reminders can significantly enhance our daily routines but are rendered useless or disturbing when not being aligned with our intrinsic interruptibility preferences. Attention management systems use machine learning to identify short-term opportune moment… ▽ More Pervasive and ubiquitous computing facilitates immediate access to information in the sense of always-on. Information such as news, messages, or reminders can significantly enhance our daily routines but are rendered useless or disturbing when not being aligned with our intrinsic interruptibility preferences. Attention management systems use machine learning to identify short-term opportune moments, so that information delivery leads to fewer interruptions. Humans' intrinsic interruptibility preferences - established for and across social roles and life domains - would complement short-term attention and interruption management approaches. In this article, we present our comprehensive results towards social role-based attention and interruptibility management. Our approach combines on-device sensing and machine learning with theories from social science to form a personalized two-stage classification model. Finally, we discuss the challenges of the current and future AI-driven attention management systems concerning privacy, ethical issues, and future directions. △ Less

Submitted 18 December, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: 10 pages, 6 figures, submitted on December 2022, to appear in IEEE Pervasive Computing, Special Issue - Human-Centered AI

arXiv:2011.14097 [pdf, other]

doi 10.1145/3442381.3449903

Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Authors: Shohreh Deldari, Daniel V. Smith, Hao Xue, Flora D. Salim

Abstract: Change Point Detection (CPD) methods identify the times associated with changes in the trends and properties of time series data in order to describe the underlying behaviour of the system. For instance, detecting the changes and anomalies associated with web service usage, application usage or human behaviour can provide valuable insights for downstream modelling tasks. We propose a novel approac… ▽ More Change Point Detection (CPD) methods identify the times associated with changes in the trends and properties of time series data in order to describe the underlying behaviour of the system. For instance, detecting the changes and anomalies associated with web service usage, application usage or human behaviour can provide valuable insights for downstream modelling tasks. We propose a novel approach for self-supervised Time Series Change Point detection method based onContrastivePredictive coding (TS-CP^2). TS-CP^2 is the first approach to employ a contrastive learning strategy for CPD by learning an embedded representation that separates pairs of embeddings of time adjacent intervals from pairs of interval embeddings separated across time. Through extensive experiments on three diverse, widely used time series datasets, we demonstrate that our method outperforms five state-of-the-art CPD methods, which include unsupervised and semi-supervisedapproaches. TS-CP^2 is shown to improve the performance of methods that use either handcrafted statistical or temporal features by 79.4% and deep learning-based methods by 17.0% with respect to the F1-score averaged across the three datasets. △ Less

Submitted 4 March, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: Accepted at The WEB Conference 2021 (WWW'21)

arXiv:2008.03230 [pdf, other]

doi 10.1145/3411832

ESPRESSO: Entropy and ShaPe awaRe timE-Series SegmentatiOn for processing heterogeneous sensor data

Authors: Shohreh Deldari, Daniel V. Smith, Amin Sadri, Flora D. Salim

Abstract: Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi… ▽ More Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi-dimensional time-series that is formulated to exploit the entropy and temporal shape properties of time-series. ESPRESSO differs from existing methods that focus upon particular statistical or temporal properties of time-series exclusively. As part of model development, a novel temporal representation of time-series $WCAC$ was introduced along with a greedy search approach that estimate segments based upon the entropy metric. ESPRESSO was shown to offer superior performance to four state-of-the-art methods across seven public datasets of wearable and wear-free sensing. In addition, we undertake a deeper investigation of these datasets to understand how ESPRESSO and its constituent methods perform with respect to different dataset characteristics. Finally, we provide two interesting case-studies to show how applying ESPRESSO can assist in inferring daily activity routines and the emotional state of humans. △ Less

Submitted 24 July, 2020; originally announced August 2020.

Comments: 23 pages, 11 figures, accepted at IMWUT Volume(4) issue(3)

Showing 1–10 of 10 results for author: Deldari, S