Search | arXiv e-print repository

eCAR: edge-assisted Collaborative Augmented Reality Framework

Abstract: We propose a novel edge-assisted multi-user collaborative augmented reality framework in a large indoor environment. In Collaborative Augmented Reality, data communication that synchronizes virtual objects has large network traffic and high network latency. Due to drift, CAR applications without continuous data communication for coordinate system alignment have virtual object inconsistency. In add… ▽ More We propose a novel edge-assisted multi-user collaborative augmented reality framework in a large indoor environment. In Collaborative Augmented Reality, data communication that synchronizes virtual objects has large network traffic and high network latency. Due to drift, CAR applications without continuous data communication for coordinate system alignment have virtual object inconsistency. In addition, synchronization messages for online virtual object updates have high latency as the number of collaborative devices increases. To solve this problem, we implement the CAR framework, called eCAR, which utilizes edge computing to continuously match the device's coordinate system with less network traffic. Furthermore, we extend the co-visibility graph of the edge server to maintain virtual object spatial-temporal consistency in neighboring devices by synchronizing a local graph. We evaluate the system quantitatively and qualitatively in the public dataset and a physical indoor environment. eCAR communicates data for coordinate system alignment between the edge server and devices with less network traffic and latency. In addition, collaborative augmented reality synchronization algorithms quickly and accurately host and resolve virtual objects. The proposed system continuously aligns coordinate systems to multiple devices in a large indoor environment and shares augmented reality content. Through our system, users interact with virtual objects and share augmented reality experiences with neighboring users. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.17179 [pdf, other]

Meta-Object: Interactive and Multisensory Virtual Object Learned from the Real World for the Post-Metaverse

Authors: Dooyoung Kim, Taewook Ha, **seok Hong, Seonji Kim, Selin Choi, Heejeong Ko, Woontack Woo

Abstract: With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics lea… ▽ More With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics learned from the real world. Current virtual objects differ significantly from real-world objects due to restricted sensory feedback based on limited physical properties. To leverage meta-objects in the metaverse, three key components are needed: meta-object modeling and property embedding, interaction-adaptive multisensory feedback, and an intelligence simulation-based post-metaverse platform. Utilizing meta-objects that enable both on-site and remote users to interact as if they were engaging with real objects could contribute to the advent of the post-metaverse era through wearable AR/VR devices. △ Less

Submitted 28 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 12 pages, 4 figures, under review in the IEEE CG&A magazine

arXiv:2404.01151 [pdf, other]

Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs

Authors: Jialou Wang, Manli Zhu, Yulei Li, Honglei Li, Longzhi Yang, Wai Lok Woo

Abstract: Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditi… ▽ More Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditional systems face challenges in accurately map** objects within images to generate nuanced and spatially aware responses. In this work, we introduce "Detect2Interact", which addresses these challenges by introducing an advanced approach for fine-grained object visual key field detection. First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images. Next, we use Vision Studio to extract semantic object descriptions. Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map. As a result, Detect2Interact achieves consistent qualitative results on object key field detection across extensive test cases and outperforms the existing VQA system with object detection by providing a more reasonable and finer visual representation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Intelligent Systems

arXiv:2403.18067 [pdf, other]

State of the art applications of deep learning within tracking and detecting marine debris: A survey

Authors: Zoe Moorton, Dr. Zeyneb Kurt, Dr. Wai Lok Woo

Abstract: Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family sig… ▽ More Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family significantly outperforms all other methods of object detection but there are many respected contributions to this field that have categorically agreed that a comprehensive database of underwater debris is not currently available for machine learning. Using a small dataset curated and labelled by us, we tested YOLOv5 on a binary classification task and found the accuracy was low and the rate of false positives was high; highlighting the importance of a comprehensive database. We conclude this survey with over 40 future research recommendations and open challenges. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Review paper, 60 pages including references, 1 figure, 3 tables, 1 supplementary data

arXiv:2401.12648

Consistency Enhancement-Based Deep Multiview Clustering via Contrastive Learning

Authors: Hao Yang, Hua Mao, Wai Lok Woo, Jie Chen, Xi Peng

Abstract: Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clu… ▽ More Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clustering methods based on contrastive learning overlook the consistency of the clustering representations during the clustering process. In this paper, we show how the above problems can be overcome and propose a consistent enhancement-based deep MVC method via contrastive learning (CCEC). Specifically, semantic connection blocks are incorporated into a feature representation to preserve the consistent information among multiple views. Furthermore, the representation process for clustering is enhanced through spectral clustering, and the consistency across multiple views is improved. Experiments conducted on five datasets demonstrate the effectiveness and superiority of our method in comparison with the state-of-the-art (SOTA) methods. The code for this method can be accessed at https://anonymous.4open.science/r/CCEC-E84E/. △ Less

Submitted 21 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: There are multiple errors that need to be corrected, including some formulas and concept descriptions. We will re upload the paper after the modifications are completed

arXiv:2311.11821 [pdf, ps, other]

Cross-View Graph Consistency Learning for Invariant Graph Representations

Authors: Jie Chen, Zhiming Li, Hua Mao, Wai Lok Woo, Xi Peng

Abstract: Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived f… ▽ More Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived from an incomplete graph structure through a bidirectional graph structure augmentation scheme. This augmentation scheme mitigates the potential information loss that is commonly associated with various data augmentation techniques involving raw graph data, such as edge perturbation, node removal, and attribute masking. Second, we propose a CGCL model that can learn invariant graph representations. A cross-view training scheme is proposed to train the proposed CGCL model. This scheme attempts to maximize the consistency information between one augmented view and the graph structure reconstructed from the other augmented view. Furthermore, we offer a comprehensive theoretical CGCL analysis. This paper empirically and experimentally demonstrates the effectiveness of the proposed CGCL method, achieving competitive results on graph datasets in comparisons with several state-of-the-art algorithms. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2310.17158 [pdf, other]

CosmosDSR -- a methodology for automated detection and tracking of orbital debris using the Unscented Kalman Filter

Authors: Daniel S. Roll, Zeyneb Kurt, Wai Lok Woo

Abstract: The Kessler syndrome refers to the escalating space debris from frequent space activities, threatening future space exploration. Addressing this issue is vital. Several AI models, including Convolutional Neural Networks, Kernel Principal Component Analysis, and Model-Agnostic Meta- Learning have been assessed with various data types. Earlier studies highlighted the combination of the YOLO object d… ▽ More The Kessler syndrome refers to the escalating space debris from frequent space activities, threatening future space exploration. Addressing this issue is vital. Several AI models, including Convolutional Neural Networks, Kernel Principal Component Analysis, and Model-Agnostic Meta- Learning have been assessed with various data types. Earlier studies highlighted the combination of the YOLO object detector and a linear Kalman filter (LKF) for object detection and tracking. Advancing this, the current paper introduces a novel methodology for the Comprehensive Orbital Surveillance and Monitoring Of Space by Detecting Satellite Residuals (CosmosDSR) by combining YOLOv3 with an Unscented Kalman Filter (UKF) for tracking satellites in sequential images. Using the Spacecraft Recognition Leveraging Knowledge of Space Environment (SPARK) dataset for training and testing, the YOLOv3 precisely detected and classified all satellite categories (Mean Average Precision=97.18%, F1=0.95) with few errors (TP=4163, FP=209, FN=237). Both CosmosDSR and an implemented LKF used for comparison tracked satellites accurately for a mean squared error (MSE) and root mean squared error (RME) of MSE=2.83/RMSE=1.66 for UKF and MSE=2.84/RMSE=1.66 for LKF. The current study is limited to images generated in a space simulation environment, but the CosmosDSR methodology shows great potential in detecting and tracking satellites, paving the way for solutions to the Kessler syndrome. △ Less

Submitted 31 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 7 figures, 15 pages inc refs

MSC Class: 68 ACM Class: I.2.6; K.3.2

arXiv:2309.07170 [pdf, other]

Overview of Human Activity Recognition Using Sensor Data

Authors: Rebeen Ali Hamad, Wai Lok Woo, Bo Wei, Longzhi Yang

Abstract: Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently develo** deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last… ▽ More Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently develo** deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.11210 [pdf, other]

Edge-Centric Space Rescaling with Redirected Walking for Dissimilar Physical-Virtual Space Registration

Authors: Dooyoung Kim, Woontack Woo

Abstract: We propose a novel space-rescaling technique for registering dissimilar physical-virtual spaces by utilizing the effects of adjusting physical space with redirected walking. Achieving a seamless immersive Virtual Reality (VR) experience requires overcoming the spatial heterogeneities between the physical and virtual spaces and accurately aligning the VR environment with the user's tracked physical… ▽ More We propose a novel space-rescaling technique for registering dissimilar physical-virtual spaces by utilizing the effects of adjusting physical space with redirected walking. Achieving a seamless immersive Virtual Reality (VR) experience requires overcoming the spatial heterogeneities between the physical and virtual spaces and accurately aligning the VR environment with the user's tracked physical space. However, existing space-matching algorithms that rely on one-to-one scale map** are inadequate when dealing with highly dissimilar physical and virtual spaces, and redirected walking controllers could not utilize basic geometric information from physical space in the virtual space due to coordinate distortion. To address these issues, we apply relative translation gains to partitioned space grids based on the main interactable object's edge, which enables space-adaptive modification effects of physical space without coordinate distortion. Our evaluation results demonstrate the effectiveness of our algorithm in aligning the main object's edge, surface, and wall, as well as securing the largest registered area compared to alternative methods under all conditions. These findings can be used to create an immersive play area for VR content where users can receive passive feedback from the plane and edge in their physical environment. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: This paper has been accepted as a paper for the 2023 ISMAR conference (2023/10/16-2023/10/20) 10 pages, 5 figures

arXiv:2304.10769 [pdf, ps, other]

doi 10.1109/ICCV51070.2023.01536

Deep Multiview Clustering by Contrasting Cluster Assignments

Authors: Jie Chen, Hua Mao, Wai Lok Woo, Xi Peng

Abstract: Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contras… ▽ More Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. Specifically, we first employ deep autoencoders to extract view-dependent features in the pretraining stage. Then, a cluster-level CVCL strategy is presented to explore consistent semantic label information among the multiple views in the fine-tuning stage. Thus, the proposed CVCL method is able to produce more discriminative cluster assignments by virtue of this learning strategy. Moreover, we provide a theoretical analysis of soft cluster assignment alignment. Extensive experimental results obtained on several datasets demonstrate that the proposed CVCL method outperforms several state-of-the-art approaches. △ Less

Submitted 10 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: 10 pages, 7 figures

Journal ref: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

arXiv:2301.00552 [pdf]

Neural source/sink phase connectivity in developmental dyslexia by means of interchannel causality

Authors: I. RodrÍguez-RodrÍguez, A. Ortiz, N. J. Gallego-Molina, M. A. Formoso, W. L. Woo

Abstract: While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners a… ▽ More While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.06834 [pdf]

Deep Neural Networks integrating genomics and histopathological images for predicting stages and survival time-to-event in colon cancer

Authors: Olalekan Ogundipe, Zeyneb Kurt, Wai Lok Woo

Abstract: There exists unexplained diverse variation within the predefined colon cancer stages using only features either from genomics or histopathological whole slide images as prognostic factors. Unraveling this variation will bring about improved in staging and treatment outcome, hence motivated by the advancement of Deep Neural Network libraries and different structures and factors within some genomic… ▽ More There exists unexplained diverse variation within the predefined colon cancer stages using only features either from genomics or histopathological whole slide images as prognostic factors. Unraveling this variation will bring about improved in staging and treatment outcome, hence motivated by the advancement of Deep Neural Network libraries and different structures and factors within some genomic dataset, we aggregate atypical patterns in histopathological images with diverse carcinogenic expression from mRNA, miRNA and DNA Methylation as an integrative input source into an ensemble deep neural network for colon cancer stages classification and samples stratification into low or high risk survival groups. The results of our Ensemble Deep Convolutional Neural Network model show an improved performance in stages classification on the integrated dataset. The fused input features return Area under curve Receiver Operating Characteristic curve (AUC ROC) of 0.95 compared with AUC ROC of 0.71 and 0.68 obtained when only genomics and images features are used for the stage's classification, respectively. Also, the extracted features were used to split the patients into low or high risk survival groups. Among the 2548 fused features, 1695 features showed a statistically significant survival probability differences between the two risk groups defined by the extracted features. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 21 pages, 5 figures, 4 tables

arXiv:2211.00382 [pdf, other]

Seg&Struct: The Interplay Between Part Segmentation and Structure Inference for 3D Shape Parsing

Authors: Jeonghyun Kim, Kaichun Mo, Minhyuk Sung, Woontack Woo

Abstract: We propose Seg&Struct, a supervised learning framework leveraging the interplay between part segmentation and structure inference and demonstrating their synergy in an integrated framework. Both part segmentation and structure inference have been extensively studied in the recent deep learning literature, while the supervisions used for each task have not been fully exploited to assist the other t… ▽ More We propose Seg&Struct, a supervised learning framework leveraging the interplay between part segmentation and structure inference and demonstrating their synergy in an integrated framework. Both part segmentation and structure inference have been extensively studied in the recent deep learning literature, while the supervisions used for each task have not been fully exploited to assist the other task. Namely, structure inference has been typically conducted with an autoencoder that does not leverage the point-to-part associations. Also, segmentation has been mostly performed without structural priors that tell the plausibility of the output segments. We present how these two tasks can be best combined while fully utilizing supervision to improve performance. Our framework first decomposes a raw input shape into part segments using an off-the-shelf algorithm, whose outputs are then mapped to nodes in a part hierarchy, establishing point-to-part associations. Following this, ours predicts the structural information, e.g., part bounding boxes and part relationships. Lastly, the segmentation is rectified by examining the confusion of part boundaries using the structure-based part features. Our experimental results based on the StructureNet and PartNet demonstrate that the interplay between the two tasks results in remarkable improvements in both tasks: 27.91% in structure inference and 0.5% in segmentation. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: WACV 2023 (Algorithm Track)

arXiv:2206.05522 [pdf, other]

The Effects of Spatial Configuration on Relative Translation Gain Thresholds in Redirected Walking

Authors: Dooyoung Kim, Seonji Kim, Jae-eun Shin, Boram Yoon, **wook Kim, Jeongmi Lee, Woontack Woo

Abstract: In this study, we explore how spatial configurations can be reflected in determining the threshold range of Relative Translation Gains (RTGs), a translation gain-based Redirected Walking (RDW) technique that scales the user's movement in Virtual Reality (VR) in different ratios for width and depth. While previous works have shown that various cognitive factors or individual differences influence t… ▽ More In this study, we explore how spatial configurations can be reflected in determining the threshold range of Relative Translation Gains (RTGs), a translation gain-based Redirected Walking (RDW) technique that scales the user's movement in Virtual Reality (VR) in different ratios for width and depth. While previous works have shown that various cognitive factors or individual differences influence the RDW threshold, constructive studies investigating the impact of the environmental composition on the RDW threshold with regard to the user's visual perception were lacking. Therefore, we examined the effect of spatial configurations on the RTG threshold by analyzing the participant's responses and gaze distribution data in two user studies. The first study concerned the size of the virtual room and the existence of objects within it, and the second study focused on the combined impact of room size and the spatial layout. Our results show that three compositions of spatial configuration (size, object existence, spatial layout) significantly affect the RTG threshold range. Based on our findings, we proposed virtual space rescaling guidelines to increase the range of adjustable movable space with RTGs for developers: placing distractors in the room, setting the perceived movable space to be larger than the adjusted movable space if it's an empty room, and avoid placing objects together as centered layout. Our findings can be used to adaptively rescale VR users' space according to the target virtual space's configuration with a unified coordinate system that enables the utilization of physical objects in a virtual scene. △ Less

Submitted 30 October, 2022; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: 21 pages, 11 figures, Under review in the Springer VR Journal

arXiv:2204.13584 [pdf, ps, other]

Predicting Slee** Quality using Convolutional Neural Networks

Authors: Vidya Rohini Konanur Sathish, Wai Lok Woo, Edmond S. L. Ho

Abstract: Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to slee** patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from diff… ▽ More Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to slee** patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from different methods, including traditional machine learning methods such as Logistic Regression (LR), Decision Trees (DT), k-Nearest Neighbour (k-NN), Naive Bayes (NB) and Support Vector Machine (SVM), on 3 publicly available sleep datasets. The accuracy, sensitivity, specificity, precision, recall, and F-score are reported and will serve as a baseline to simulate the research in this direction in the future. △ Less

Submitted 24 April, 2022; originally announced April 2022.

ACM Class: I.2.10

arXiv:2203.17085 [pdf, other]

RobIn: A Robust Interpretable Deep Network for Schizophrenia Diagnosis

Authors: Daniel Organisciak, Hubert P. H. Shum, Ephraim Nwoye, Wai Lok Woo

Abstract: Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application… ▽ More Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application gap - it is difficult to apply lab research to the real world. We propose to reduce this training-application gap by focusing on readily accessible data. We collect a data set of psychiatric observations of patients based on DSM-5 criteria. Because similar data is already recorded in all mental health clinics that diagnose schizophrenia using DSM-5, our method could be easily integrated into current processes as a tool to assist clinicians, whilst abiding by formal diagnostic criteria. To facilitate real-world usage of our system, we show that it is interpretable and robust. Understanding how a machine learning tool reaches its diagnosis is essential to allow clinicians to trust that diagnosis. To interpret the framework, we fuse two complementary attention mechanisms, 'squeeze and excitation' and 'self-attention', to determine global attribute importance and attribute interactivity, respectively. The model uses these importance scores to make decisions. This allows clinicians to understand how a diagnosis was reached, improving trust in the model. Because machine learning models often struggle to generalise to data from different sources, we perform experiments with augmented test data to evaluate the model's applicability to the real world. We find that our model is more robust to perturbations, and should therefore perform better in a clinical setting. It achieves 98% accuracy with 10-fold cross-validation. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2201.05046 [pdf]

Flood Prediction and Analysis on the Relevance of Features using Explainable Artificial Intelligence

Authors: Sai Prasanth Kadiyala, Wai Lok Woo

Abstract: This paper presents flood prediction models for the state of Kerala in India by analyzing the monthly rainfall data and applying machine learning algorithms including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, and Support Vector Machine. Although these models have shown high accuracy prediction of the occurrence of flood in a particular year, they do not quantitative… ▽ More This paper presents flood prediction models for the state of Kerala in India by analyzing the monthly rainfall data and applying machine learning algorithms including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, and Support Vector Machine. Although these models have shown high accuracy prediction of the occurrence of flood in a particular year, they do not quantitatively and qualitatively explain the prediction decision. This paper shows how the background features are learned that contributed to the prediction decision and further extended to explain the inner workings with the development of explainable artificial intelligence modules. The obtained results have confirmed the validity of the findings uncovered by the explainer modules basing on the historical flood monthly rainfall data in Kerala. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: Proceedings of the 2nd Artificial Intelligence and Complex Systems Conference (AICSconf), accepted, 2021

ACM Class: I.2.6; I.2.1

arXiv:2201.04273 [pdf, other]

Effects of Virtual Room Size and Objects on Relative Translation Gain Thresholds in Redirected Walking

Authors: Dooyoung Kim, **wook Kim, Jae-eun Shin, Boram Yoon, Jeongmi Lee, Woontack Woo

Abstract: This paper investigates how the size of virtual space and objects within it affect the threshold range of relative translation gains, a Redirected Walking (RDW) technique that scales the user's movement in virtual space in different ratios for the width and depth. While previous studies assert that a virtual room's size affects relative translation gain thresholds on account of the virtual horizon… ▽ More This paper investigates how the size of virtual space and objects within it affect the threshold range of relative translation gains, a Redirected Walking (RDW) technique that scales the user's movement in virtual space in different ratios for the width and depth. While previous studies assert that a virtual room's size affects relative translation gain thresholds on account of the virtual horizon's location, additional research is needed to explore this assumption through a structured approach to visual perception in Virtual Reality (VR). We estimate the relative translation gain thresholds in six spatial conditions configured by three room sizes and the presence of virtual objects (3 X 2), which were set according to differing Angles of Declination (AoDs) between eye-gaze and the forward-gaze. Results show that both size and virtual objects significantly affect the threshold range, it being greater in the large-sized condition and furnished condition. This indicates that the effect of relative translation gains can be further increased by constructing a perceived virtual movable space that is even larger than the adjusted virtual movable space and placing objects in it. Our study can be applied to adjust virtual spaces in synchronizing heterogeneous spaces without coordinate distortion where real and virtual objects can be leveraged to create realistic mutual spaces. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 10 pages, 8 figures, Accepted in 2022 IEEE Virtual Reality and 3D User Interfaces (VR)

arXiv:2112.00190 [pdf]

doi 10.1016/j.marpolbul.2022.113853

Is the use of Deep Learning and Artificial Intelligence an appropriate means to locate debris in the ocean without harming aquatic wildlife?

Authors: Zoe Moorton, Zeyneb Kurt, Wai Lok Woo

Abstract: With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystem… ▽ More With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystems. The research explores the use of Convolutional Neural Networks from the perspective of protecting the ecosystem, rather than primarily collecting rubbish. We did this by building a custom-built, deep learning model, with an original database including 1,644 underwater images and used a binary classification to sort synthesised material from aquatic life. We concluded that although it is possible to safely distinguish between debris and life, further exploration with a larger database and stronger CNN structure has the potential for much more promising results. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: reference list is added/updated; sorry for causing any inconveniences. 3681 words, 14 pages

arXiv:2106.04966 [pdf, other]

Towards Explainable Abnormal Infant Movements Identification: A Body-part Based Prediction and Visualisation Framework

Authors: Kevin D. McCay, Edmond S. L. Ho, Dimitrios Sakkos, Wai Lok Woo, Claire Marcroft, Patricia Dulson, Nicholas D. Embleton

Abstract: Providing early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious. In this paper, we propose a new framework for the automated classification of infant body movements, based upon the GM… ▽ More Providing early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious. In this paper, we propose a new framework for the automated classification of infant body movements, based upon the GMA, which unlike previous methods, also incorporates a visualization framework to aid with interpretability. Our proposed framework segments extracted features to detect the presence of Fidgety Movements (FMs) associated with the GMA spatiotemporally. These features are then used to identify the body-parts with the greatest contribution towards a classification decision and highlight the related body-part segment providing visual feedback to the user. We quantitatively compare the proposed framework's classification performance with several other methods from the literature and qualitatively evaluate the visualization's veracity. Our experimental results show that the proposed method performs more robustly than comparable techniques in this setting whilst simultaneously providing relevant visual interpretability. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: Proceedings of the 2021 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), accepted, 2021

ACM Class: I.4.9; I.5.0; J.3; I.2.1

arXiv:1706.03458 [pdf, other]

Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

Authors: Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo

Abstract: With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that dee… ▽ More With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art. △ Less

Submitted 5 October, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

Comments: NIPS 2017 Spotlight

arXiv:1506.04214 [pdf, other]

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Authors: Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo

Abstract: The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input… ▽ More The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting. △ Less

Submitted 19 September, 2015; v1 submitted 12 June, 2015; originally announced June 2015.

Showing 1–22 of 22 results for author: Woo, W