Search | arXiv e-print repository

A Survey on Multimodal Wearable Sensor-based Human Action Recognition

Authors: Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, Anne H. H. Ngu

Abstract: The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep lear… ▽ More The combination of increased life expectancy and falling birth rates is resulting in an aging population. Wearable Sensor-based Human Activity Recognition (WSHAR) emerges as a promising assistive technology to support the daily lives of older individuals, unlocking vast potential for human-centric applications. However, recent surveys in WSHAR have been limited, focusing either solely on deep learning approaches or on a single sensor modality. In real life, our human interact with the world in a multi-sensory way, where diverse information sources are intricately processed and interpreted to accomplish a complex and unified sensing system. To give machines similar intelligence, multimodal machine learning, which merges data from various sources, has become a popular research area with recent advancements. In this study, we present a comprehensive survey from a novel perspective on how to leverage multimodal learning to WSHAR domain for newcomers and researchers. We begin by presenting the recent sensor modalities as well as deep learning approaches in HAR. Subsequently, we explore the techniques used in present multimodal systems for WSHAR. This includes inter-multimodal systems which utilize sensor modalities from both visual and non-visual systems and intra-multimodal systems that simply take modalities from non-visual systems. After that, we focus on current multimodal learning approaches that have applied to solve some of the challenges existing in WSHAR. Specifically, we make extra efforts by connecting the existing multimodal literature from other domains, such as computer vision and natural language processing, with current WSHAR area. Finally, we identify the corresponding challenges and potential research direction in current WSHAR area for further improvement. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Multimodal Survey for Wearable Sensor-based Human Action Recognition

arXiv:2307.03638

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Authors: Jianyuan Ni, Hao Tang, Anne H. H. Ngu, Gaowen Liu, Yan Yan

Abstract: Wearable sensor-based Human Action Recognition (HAR) has made significant strides in recent times. However, the accuracy performance of wearable sensor-based HAR is currently still lagging behind that of visual modalities-based systems, such as RGB video and depth data. Although diverse input modalities can provide complementary cues and improve the accuracy performance of HAR, wearable devices ca… ▽ More Wearable sensor-based Human Action Recognition (HAR) has made significant strides in recent times. However, the accuracy performance of wearable sensor-based HAR is currently still lagging behind that of visual modalities-based systems, such as RGB video and depth data. Although diverse input modalities can provide complementary cues and improve the accuracy performance of HAR, wearable devices can only capture limited kinds of non-visual time series input, such as accelerometers and gyroscopes. This limitation hinders the deployment of multimodal simultaneously using visual and non-visual modality data in parallel on current wearable devices. To address this issue, we propose a novel Physical-aware Cross-modal Adversarial (PCA) framework that utilizes only time-series accelerometer data from four inertial sensors for the wearable sensor-based HAR problem. Specifically, we propose an effective IMU2SKELETON network to produce corresponding synthetic skeleton joints from accelerometer data. Subsequently, we imposed additional constraints on the synthetic skeleton data from a physical perspective, as accelerometer data can be regarded as the second derivative of the skeleton sequence coordinates. After that, the original accelerometer as well as the constrained skeleton sequence were fused together to make the final classification. In this way, when individuals wear wearable devices, the devices can not only capture accelerometer data, but can also generate synthetic skeleton sequences for real-time wearable sensor-based HAR applications that need to be conducted anytime and anywhere. To demonstrate the effectiveness of our proposed PCA framework, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PCA approach has competitive performance compared to the previous methods on the mono sensor-based HAR classification problem. △ Less

Submitted 19 May, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: We will be making some significant changes to the paper, including the title and methodology. We therefore wish to withdraw the paper for now

arXiv:2301.08141 [pdf, other]

Self-supervised Learning for Segmentation and Quantification of Dopamine Neurons in Parkinson's Disease

Authors: Fatemeh Haghighi, Soumitra Ghosh, Hai Ngu, Sarah Chu, Han Lin, Mohsen Hejrati, Baris Bingol, Somaye Hashemifar

Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disease in humans. PD is characterized by the gradual loss of dopaminergic neurons in the Substantia Nigra (SN). Counting the number of dopaminergic neurons in the SN is one of the most important indexes in evaluating drug efficacy in PD animal models. Currently, analyzing and quantifying dopaminergic neurons is conducted manuall… ▽ More Parkinson's Disease (PD) is the second most common neurodegenerative disease in humans. PD is characterized by the gradual loss of dopaminergic neurons in the Substantia Nigra (SN). Counting the number of dopaminergic neurons in the SN is one of the most important indexes in evaluating drug efficacy in PD animal models. Currently, analyzing and quantifying dopaminergic neurons is conducted manually by experts through analysis of digital pathology images which is laborious, time-consuming, and highly subjective. As such, a reliable and unbiased automated system is demanded for the quantification of dopaminergic neurons in digital pathology images. Recent years have seen a surge in adopting deep learning solutions in medical image processing. However, develo** high-performing deep learning models hinges on the availability of large-scale, high-quality annotated data, which can be expensive to acquire, especially in applications like digital pathology image analysis. To this end, we propose an end-to-end deep learning framework based on self-supervised learning for the segmentation and quantification of dopaminergic neurons in PD animal models. To the best of our knowledge, this is the first deep learning model that detects the cell body of dopaminergic neurons, counts the number of dopaminergic neurons, and provides characteristics of individual dopaminergic neurons as a numerical output. Extensive experiments demonstrate the effectiveness of our model in quantifying neurons with high precision, which can provide a faster turnaround for drug efficacy studies, better understanding of dopaminergic neuronal health status, and unbiased results in PD pre-clinical research. As part of our contributions, we also provide the first publicly available dataset of histology digital images along with expert annotations for the segmentation of TH-positive DA neuronal soma. △ Less

Submitted 12 October, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2301.02925 [pdf, other]

doi 10.1016/j.neuri.2023.100131

Multiclass Semantic Segmentation to Identify Anatomical Sub-Regions of Brain and Measure Neuronal Health in Parkinson's Disease

Authors: Hosein Barzekar, Hai Ngu, Han Hui Lin, Mohsen Hejrati, Steven Ray Valdespino, Sarah Chu, Baris Bingol, Somaye Hashemifar, Soumitra Ghosh

Abstract: Automated segmentation of anatomical sub-regions with high precision has become a necessity to enable the quantification and characterization of cells/ tissues in histology images. Currently, a machine learning model to analyze sub-anatomical regions of the brain to analyze 2D histological images is not available. The scientists rely on manually segmenting anatomical sub-regions of the brain which… ▽ More Automated segmentation of anatomical sub-regions with high precision has become a necessity to enable the quantification and characterization of cells/ tissues in histology images. Currently, a machine learning model to analyze sub-anatomical regions of the brain to analyze 2D histological images is not available. The scientists rely on manually segmenting anatomical sub-regions of the brain which is extremely time-consuming and prone to labeler-dependent bias. One of the major challenges in accomplishing such a task is the lack of high-quality annotated images that can be used to train a generic artificial intelligence model. In this study, we employed a UNet-based architecture, compared model performance with various combinations of encoders, image sizes, and sample selection techniques. Additionally, to increase the sample set we resorted to data augmentation which provided data diversity and robust learning. In this study, we trained our best fit model on approximately one thousand annotated 2D brain images stained with Nissl/ Haematoxylin and Tyrosine Hydroxylase enzyme (TH, indicator of dopaminergic neuron viability). The dataset comprises of different animal studies enabling the model to be trained on different datasets. The model effectively is able to detect two sub-regions compacta (SNCD) and reticulata (SNr) in all the images. In spite of limited training data, our best model achieves a mean intersection over union (IOU) of 79% and a mean dice coefficient of 87%. In conclusion, the UNet-based model with EffiecientNet as an encoder outperforms all other encoders, resulting in a first of its kind robust model for multiclass segmentation of sub-brain regions in 2D images. △ Less

Submitted 7 January, 2023; originally announced January 2023.

arXiv:2208.08090 [pdf]

Progressive Cross-modal Knowledge Distillation for Human Action Recognition

Authors: Jianyuan Ni, Anne H. H. Ngu, Yan Yan

Abstract: Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage o… ▽ More Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multimodal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: ACM MM 2022

arXiv:2206.13676 [pdf, other]

TTS-CGAN: A Transformer Time-Series Conditional GAN for Biosignal Data Augmentation

Authors: Xiaomin Li, Anne Hee Hiong Ngu, Vangelis Metsis

Abstract: Signal measurement appearing in the form of time series is one of the most common types of data used in medical machine learning applications. Such datasets are often small in size, expensive to collect and annotate, and might involve privacy issues, which hinders our ability to train large, state-of-the-art deep learning models for biomedical applications. For time-series data, the suite of data… ▽ More Signal measurement appearing in the form of time series is one of the most common types of data used in medical machine learning applications. Such datasets are often small in size, expensive to collect and annotate, and might involve privacy issues, which hinders our ability to train large, state-of-the-art deep learning models for biomedical applications. For time-series data, the suite of data augmentation strategies we can use to expand the size of the dataset is limited by the need to maintain the basic properties of the signal. Generative Adversarial Networks (GANs) can be utilized as another data augmentation tool. In this paper, we present TTS-CGAN, a transformer-based conditional GAN model that can be trained on existing multi-class datasets and generate class-specific synthetic time-series sequences of arbitrary length. We elaborate on the model architecture and design strategies. Synthetic sequences generated by our model are indistinguishable from real ones, and can be used to complement or replace real signals of the same type, thus achieving the goal of data augmentation. To evaluate the quality of the generated data, we modify the wavelet coherence metric to be able to compare the similarity between two sets of signals, and also conduct a case study where a mix of synthetic and real data are used to train a deep learning model for sequence classification. Together with other visualization techniques and qualitative evaluation approaches, we demonstrate that TTS-CGAN generated synthetic data are similar to real data, and that our model performs better than the other state-of-the-art GAN models built for time-series data generation. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: under review

arXiv:2202.02691 [pdf, other]

TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network

Authors: Xiaomin Li, Vangelis Metsis, Huangyingrui Wang, Anne Hee Hiong Ngu

Abstract: Signal measurements appearing in the form of time series are one of the most common types of data used in medical machine learning applications. However, such datasets are often small, making the training of deep neural network architectures ineffective. For time-series, the suite of data augmentation tricks we can use to expand the size of the dataset is limited by the need to maintain the basic… ▽ More Signal measurements appearing in the form of time series are one of the most common types of data used in medical machine learning applications. However, such datasets are often small, making the training of deep neural network architectures ineffective. For time-series, the suite of data augmentation tricks we can use to expand the size of the dataset is limited by the need to maintain the basic properties of the signal. Data generated by a Generative Adversarial Network (GAN) can be utilized as another data augmentation tool. RNN-based GANs suffer from the fact that they cannot effectively model long sequences of data points with irregular temporal relations. To tackle these problems, we introduce TTS-GAN, a transformer-based GAN which can successfully generate realistic synthetic time-series data sequences of arbitrary length, similar to the real ones. Both the generator and discriminator networks of the GAN model are built using a pure transformer encoder architecture. We use visualizations and dimensionality reduction techniques to demonstrate the similarity of real and generated time-series data. We also compare the quality of our generated data with the best existing alternative, which is an RNN-based time-series GAN. △ Less

Submitted 26 June, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

Comments: The paper has been accepted for publication in the 20th International Conference on Artificial Intelligence in Medicine (AIME 2022)

arXiv:2112.01849 [pdf, ps, other]

Cross-modal Knowledge Distillation for Vision-to-Sensor Action Recognition

Authors: Jianyuan Ni, Raunak Sarbajna, Yang Liu, Anne H. H. Ngu, Yan Yan

Abstract: Human activity recognition (HAR) based on multi-modal approach has been recently shown to improve the accuracy performance of HAR. However, restricted computational resources associated with wearable devices, i.e., smartwatch, failed to directly support such advanced methods. To tackle this issue, this study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. In this… ▽ More Human activity recognition (HAR) based on multi-modal approach has been recently shown to improve the accuracy performance of HAR. However, restricted computational resources associated with wearable devices, i.e., smartwatch, failed to directly support such advanced methods. To tackle this issue, this study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase. Therefore, this framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach. In order to retain the local temporal relationship and facilitate visual deep learning models, we first convert time-series data to two-dimensional images by applying the Gramian Angular Field ( GAF) based encoding method. We adopted ResNet18 and multi-scale TRN with BN-Inception as teacher and student network in this study, respectively. A novel loss function, named Distance and Angle-wised Semantic Knowledge loss (DASK), is proposed to mitigate the modality variations between the vision and the sensor domain. Extensive experimental results on UTD-MHAD, MMAct, and Berkeley-MHAD datasets demonstrate the effectiveness and competitiveness of the proposed VSKD model which can deployed on wearable sensors. △ Less

Submitted 8 October, 2021; originally announced December 2021.

Comments: 5 pages, 2 figures, submitted to ICASSP2022

arXiv:1811.12573 [pdf, other]

ContextServ: Towards Model-Driven Development of Context-AwareWeb Services

Authors: Quan Z. Sheng, Jian Yu, Hanchuan Xu, Wei Emma Zhang, Anne H. H. Ngu, Jun Han, Ruilin Liu

Abstract: In the era of Web of Things and Services, Context-aware Web Services (CASs) are emerging as an important technology for building innovative context-aware applications. CASs enable the information integration from both the physical and virtual world, which affects human living. However, it is challenging to build CASs, due to the lack of context provisioning management approach and limited generic… ▽ More In the era of Web of Things and Services, Context-aware Web Services (CASs) are emerging as an important technology for building innovative context-aware applications. CASs enable the information integration from both the physical and virtual world, which affects human living. However, it is challenging to build CASs, due to the lack of context provisioning management approach and limited generic approach for formalizing the development process. We therefore propose ContextServ, a platform that uses a model-driven approach to support the full life cycle of CASs development, hence offering significant design and management flexibility. ContextServ implements a proposed UML-based modelling language ContextUML to support multiple modelling languages. It also supports dynamic adaptation of WS-BPEL based context-aware composite services by weaving context-aware rules into the process. Extensive experimental evaluations on ContextServ and its components showcase that ContextServ can support effective development and efficient execution of context-aware Web services. △ Less

Submitted 19 December, 2018; v1 submitted 29 November, 2018; originally announced November 2018.

Comments: 29 pages

arXiv:1708.02029 [pdf, other]

From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth

Authors: Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Wei Emma Zhang, Anne H. H. Ngu

Abstract: Truth discovery has been widely studied in recent years as a fundamental means for resolving the conflicts in multi-source data. Although many truth discovery methods have been proposed based on different considerations and intuitions, investigations show that no single method consistently outperforms the others. To select the right truth discovery method for a specific application scenario, it be… ▽ More Truth discovery has been widely studied in recent years as a fundamental means for resolving the conflicts in multi-source data. Although many truth discovery methods have been proposed based on different considerations and intuitions, investigations show that no single method consistently outperforms the others. To select the right truth discovery method for a specific application scenario, it becomes essential to evaluate and compare the performance of different methods. A drawback of current research efforts is that they commonly assume the availability of certain ground truth for the evaluation of methods. However, the ground truth may be very limited or even out-of-reach in practice, rendering the evaluation biased by the small ground truth or even unfeasible. In this paper, we present CompTruthHyp, a general approach for comparing the performance of truth discovery methods without using ground truth. In particular, our approach calculates the probability of observations in a dataset based on the output of different methods. The probability is then ranked to reflect the performance of these methods. We review and compare twelve existing truth discovery methods and consider both single-valued and multi-valued objects. Empirical studies on both real-world and synthetic datasets demonstrate the effectiveness of our approach for comparing truth discovery methods. △ Less

Submitted 7 August, 2017; originally announced August 2017.

arXiv:1708.02018 [pdf, ps, other]

SmartMTD: A Graph-Based Approach for Effective Multi-Truth Discovery

Authors: Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H. H. Ngu

Abstract: The Big Data era features a huge amount of data that are contributed by numerous sources and used by many critical data-driven applications. Due to the varying reliability of sources, it is common to see conflicts among the multi-source data, making it difficult to determine which data sources to trust. Recently, truth discovery has emerged as a means of addressing this challenging issue by determ… ▽ More The Big Data era features a huge amount of data that are contributed by numerous sources and used by many critical data-driven applications. Due to the varying reliability of sources, it is common to see conflicts among the multi-source data, making it difficult to determine which data sources to trust. Recently, truth discovery has emerged as a means of addressing this challenging issue by determining data veracity jointly with estimating the reliability of data sources. A fundamental issue with current truth discovery methods is that they generally assume only one true value for each object, while in reality, objects may have multiple true values. In this paper, we propose a graph-based approach, called SmartMTD, to unravel the truth discovery problem beyond the single-truth assumption, or the multi-truth discovery problem. SmartMTD models and quantifies two types of source relations to estimate source reliability precisely and to detect malicious agreement among sources for effective multi-truth discovery. In particular, two graphs are constructed based on the modeled source relations. They are further used to derive the two aspects of source reliability (i.e., positive precision and negative precision) via random walk computation. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach. △ Less

Submitted 7 August, 2017; originally announced August 2017.

arXiv:1512.08493 [pdf, other]

Unveiling Contextual Similarity of Things via Mining Human-Thing Interactions in the Internet of Things

Authors: Lina Yao, Quan Z. Sheng, Anne H. H. Ngu, Xue Li, Boualem Benatallah

Abstract: With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web services, physical things are becoming an integral part of the emerging ubiquitous Web. Finding correlations of ubiquitous things is a crucial prerequisite for many important applications such as things search, discovery, classification, recommendation, and composition. This article presents DisCor-T,… ▽ More With recent advances in radio-frequency identification (RFID), wireless sensor networks, and Web services, physical things are becoming an integral part of the emerging ubiquitous Web. Finding correlations of ubiquitous things is a crucial prerequisite for many important applications such as things search, discovery, classification, recommendation, and composition. This article presents DisCor-T, a novel graph-based method for discovering underlying connections of things via mining the rich content embodied in human-thing interactions in terms of user, temporal and spatial information. We model these various information using two graphs, namely spatio-temporal graph and social graph. Then, random walk with restart (RWR) is applied to find proximities among things, and a relational graph of things (RGT) indicating implicit correlations of things is learned. The correlation analysis lays a solid foundation contributing to improved effectiveness in things management. To demonstrate the utility, we develop a flexible feature-based classification framework on top of RGT and perform a systematic case study. Our evaluation exhibits the strength and feasibility of the proposed approach. △ Less

Submitted 17 July, 2017; v1 submitted 24 December, 2015; originally announced December 2015.

arXiv:1512.06257 [pdf, other]

Up in the Air: When Homes Meet the Web of Things

Authors: Lina Yao, Quan Z. Sheng, Boualem Benatallah, Schahram Dustdar, Xianzhi Wang, Ali Shemshadi, Anne H. H. Ngu

Abstract: The emerging Internet of Things (IoT) will comprise billions of Web-enabled objects (or "things") where such objects can sense, communicate, compute and potentially actuate. WoT is essentially the embodiment of the evolution from systems linking digital documents to systems relating digital information to real-world physical items. It is widely understood that significant technical challenges exis… ▽ More The emerging Internet of Things (IoT) will comprise billions of Web-enabled objects (or "things") where such objects can sense, communicate, compute and potentially actuate. WoT is essentially the embodiment of the evolution from systems linking digital documents to systems relating digital information to real-world physical items. It is widely understood that significant technical challenges exist in develo** applications in the WoT environment. In this paper, we report our practical experience in the design and development of a smart home system in a WoT environment. Our system provides a layered framework for managing and sharing the information produced by physical things as well as the residents. We particularly focus on a research prototype named WITS, that helps the elderly live independently and safely in their own homes, with minimal support from the decreasing number of individuals in the working-age population. WITS enables an unobtrusive monitoring of elderly people in a real-world, inhabituated home environment, by leveraging WoT technologies in building context-aware, personalized services. △ Less

Submitted 18 July, 2017; v1 submitted 19 December, 2015; originally announced December 2015.

Showing 1–13 of 13 results for author: Ngu, H