Search | arXiv e-print repository

Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs

Authors: Jialou Wang, Manli Zhu, Yulei Li, Honglei Li, Longzhi Yang, Wai Lok Woo

Abstract: Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditi… ▽ More Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditional systems face challenges in accurately map** objects within images to generate nuanced and spatially aware responses. In this work, we introduce "Detect2Interact", which addresses these challenges by introducing an advanced approach for fine-grained object visual key field detection. First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images. Next, we use Vision Studio to extract semantic object descriptions. Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map. As a result, Detect2Interact achieves consistent qualitative results on object key field detection across extensive test cases and outperforms the existing VQA system with object detection by providing a more reasonable and finer visual representation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Intelligent Systems

arXiv:2403.18067 [pdf, other]

State of the art applications of deep learning within tracking and detecting marine debris: A survey

Authors: Zoe Moorton, Dr. Zeyneb Kurt, Dr. Wai Lok Woo

Abstract: Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family sig… ▽ More Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family significantly outperforms all other methods of object detection but there are many respected contributions to this field that have categorically agreed that a comprehensive database of underwater debris is not currently available for machine learning. Using a small dataset curated and labelled by us, we tested YOLOv5 on a binary classification task and found the accuracy was low and the rate of false positives was high; highlighting the importance of a comprehensive database. We conclude this survey with over 40 future research recommendations and open challenges. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Review paper, 60 pages including references, 1 figure, 3 tables, 1 supplementary data

arXiv:2401.12648

Consistency Enhancement-Based Deep Multiview Clustering via Contrastive Learning

Authors: Hao Yang, Hua Mao, Wai Lok Woo, Jie Chen, Xi Peng

Abstract: Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clu… ▽ More Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clustering methods based on contrastive learning overlook the consistency of the clustering representations during the clustering process. In this paper, we show how the above problems can be overcome and propose a consistent enhancement-based deep MVC method via contrastive learning (CCEC). Specifically, semantic connection blocks are incorporated into a feature representation to preserve the consistent information among multiple views. Furthermore, the representation process for clustering is enhanced through spectral clustering, and the consistency across multiple views is improved. Experiments conducted on five datasets demonstrate the effectiveness and superiority of our method in comparison with the state-of-the-art (SOTA) methods. The code for this method can be accessed at https://anonymous.4open.science/r/CCEC-E84E/. △ Less

Submitted 21 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: There are multiple errors that need to be corrected, including some formulas and concept descriptions. We will re upload the paper after the modifications are completed

arXiv:2311.11821 [pdf, ps, other]

Cross-View Graph Consistency Learning for Invariant Graph Representations

Authors: Jie Chen, Zhiming Li, Hua Mao, Wai Lok Woo, Xi Peng

Abstract: Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived f… ▽ More Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived from an incomplete graph structure through a bidirectional graph structure augmentation scheme. This augmentation scheme mitigates the potential information loss that is commonly associated with various data augmentation techniques involving raw graph data, such as edge perturbation, node removal, and attribute masking. Second, we propose a CGCL model that can learn invariant graph representations. A cross-view training scheme is proposed to train the proposed CGCL model. This scheme attempts to maximize the consistency information between one augmented view and the graph structure reconstructed from the other augmented view. Furthermore, we offer a comprehensive theoretical CGCL analysis. This paper empirically and experimentally demonstrates the effectiveness of the proposed CGCL method, achieving competitive results on graph datasets in comparisons with several state-of-the-art algorithms. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2310.17158 [pdf, other]

CosmosDSR -- a methodology for automated detection and tracking of orbital debris using the Unscented Kalman Filter

Authors: Daniel S. Roll, Zeyneb Kurt, Wai Lok Woo

Abstract: The Kessler syndrome refers to the escalating space debris from frequent space activities, threatening future space exploration. Addressing this issue is vital. Several AI models, including Convolutional Neural Networks, Kernel Principal Component Analysis, and Model-Agnostic Meta- Learning have been assessed with various data types. Earlier studies highlighted the combination of the YOLO object d… ▽ More The Kessler syndrome refers to the escalating space debris from frequent space activities, threatening future space exploration. Addressing this issue is vital. Several AI models, including Convolutional Neural Networks, Kernel Principal Component Analysis, and Model-Agnostic Meta- Learning have been assessed with various data types. Earlier studies highlighted the combination of the YOLO object detector and a linear Kalman filter (LKF) for object detection and tracking. Advancing this, the current paper introduces a novel methodology for the Comprehensive Orbital Surveillance and Monitoring Of Space by Detecting Satellite Residuals (CosmosDSR) by combining YOLOv3 with an Unscented Kalman Filter (UKF) for tracking satellites in sequential images. Using the Spacecraft Recognition Leveraging Knowledge of Space Environment (SPARK) dataset for training and testing, the YOLOv3 precisely detected and classified all satellite categories (Mean Average Precision=97.18%, F1=0.95) with few errors (TP=4163, FP=209, FN=237). Both CosmosDSR and an implemented LKF used for comparison tracked satellites accurately for a mean squared error (MSE) and root mean squared error (RME) of MSE=2.83/RMSE=1.66 for UKF and MSE=2.84/RMSE=1.66 for LKF. The current study is limited to images generated in a space simulation environment, but the CosmosDSR methodology shows great potential in detecting and tracking satellites, paving the way for solutions to the Kessler syndrome. △ Less

Submitted 31 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 7 figures, 15 pages inc refs

MSC Class: 68 ACM Class: I.2.6; K.3.2

arXiv:2309.07170 [pdf, other]

Overview of Human Activity Recognition Using Sensor Data

Authors: Rebeen Ali Hamad, Wai Lok Woo, Bo Wei, Longzhi Yang

Abstract: Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently develo** deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last… ▽ More Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently develo** deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2304.10769 [pdf, ps, other]

doi 10.1109/ICCV51070.2023.01536

Deep Multiview Clustering by Contrasting Cluster Assignments

Authors: Jie Chen, Hua Mao, Wai Lok Woo, Xi Peng

Abstract: Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contras… ▽ More Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. Specifically, we first employ deep autoencoders to extract view-dependent features in the pretraining stage. Then, a cluster-level CVCL strategy is presented to explore consistent semantic label information among the multiple views in the fine-tuning stage. Thus, the proposed CVCL method is able to produce more discriminative cluster assignments by virtue of this learning strategy. Moreover, we provide a theoretical analysis of soft cluster assignment alignment. Extensive experimental results obtained on several datasets demonstrate that the proposed CVCL method outperforms several state-of-the-art approaches. △ Less

Submitted 10 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: 10 pages, 7 figures

Journal ref: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

arXiv:2301.00552 [pdf]

Neural source/sink phase connectivity in developmental dyslexia by means of interchannel causality

Authors: I. RodrÍguez-RodrÍguez, A. Ortiz, N. J. Gallego-Molina, M. A. Formoso, W. L. Woo

Abstract: While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners a… ▽ More While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.06834 [pdf]

Deep Neural Networks integrating genomics and histopathological images for predicting stages and survival time-to-event in colon cancer

Authors: Olalekan Ogundipe, Zeyneb Kurt, Wai Lok Woo

Abstract: There exists unexplained diverse variation within the predefined colon cancer stages using only features either from genomics or histopathological whole slide images as prognostic factors. Unraveling this variation will bring about improved in staging and treatment outcome, hence motivated by the advancement of Deep Neural Network libraries and different structures and factors within some genomic… ▽ More There exists unexplained diverse variation within the predefined colon cancer stages using only features either from genomics or histopathological whole slide images as prognostic factors. Unraveling this variation will bring about improved in staging and treatment outcome, hence motivated by the advancement of Deep Neural Network libraries and different structures and factors within some genomic dataset, we aggregate atypical patterns in histopathological images with diverse carcinogenic expression from mRNA, miRNA and DNA Methylation as an integrative input source into an ensemble deep neural network for colon cancer stages classification and samples stratification into low or high risk survival groups. The results of our Ensemble Deep Convolutional Neural Network model show an improved performance in stages classification on the integrated dataset. The fused input features return Area under curve Receiver Operating Characteristic curve (AUC ROC) of 0.95 compared with AUC ROC of 0.71 and 0.68 obtained when only genomics and images features are used for the stage's classification, respectively. Also, the extracted features were used to split the patients into low or high risk survival groups. Among the 2548 fused features, 1695 features showed a statistically significant survival probability differences between the two risk groups defined by the extracted features. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: 21 pages, 5 figures, 4 tables

arXiv:2204.13584 [pdf, ps, other]

Predicting Slee** Quality using Convolutional Neural Networks

Authors: Vidya Rohini Konanur Sathish, Wai Lok Woo, Edmond S. L. Ho

Abstract: Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to slee** patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from diff… ▽ More Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to slee** patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from different methods, including traditional machine learning methods such as Logistic Regression (LR), Decision Trees (DT), k-Nearest Neighbour (k-NN), Naive Bayes (NB) and Support Vector Machine (SVM), on 3 publicly available sleep datasets. The accuracy, sensitivity, specificity, precision, recall, and F-score are reported and will serve as a baseline to simulate the research in this direction in the future. △ Less

Submitted 24 April, 2022; originally announced April 2022.

ACM Class: I.2.10

arXiv:2203.17085 [pdf, other]

RobIn: A Robust Interpretable Deep Network for Schizophrenia Diagnosis

Authors: Daniel Organisciak, Hubert P. H. Shum, Ephraim Nwoye, Wai Lok Woo

Abstract: Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application… ▽ More Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application gap - it is difficult to apply lab research to the real world. We propose to reduce this training-application gap by focusing on readily accessible data. We collect a data set of psychiatric observations of patients based on DSM-5 criteria. Because similar data is already recorded in all mental health clinics that diagnose schizophrenia using DSM-5, our method could be easily integrated into current processes as a tool to assist clinicians, whilst abiding by formal diagnostic criteria. To facilitate real-world usage of our system, we show that it is interpretable and robust. Understanding how a machine learning tool reaches its diagnosis is essential to allow clinicians to trust that diagnosis. To interpret the framework, we fuse two complementary attention mechanisms, 'squeeze and excitation' and 'self-attention', to determine global attribute importance and attribute interactivity, respectively. The model uses these importance scores to make decisions. This allows clinicians to understand how a diagnosis was reached, improving trust in the model. Because machine learning models often struggle to generalise to data from different sources, we perform experiments with augmented test data to evaluate the model's applicability to the real world. We find that our model is more robust to perturbations, and should therefore perform better in a clinical setting. It achieves 98% accuracy with 10-fold cross-validation. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2201.05046 [pdf]

Flood Prediction and Analysis on the Relevance of Features using Explainable Artificial Intelligence

Authors: Sai Prasanth Kadiyala, Wai Lok Woo

Abstract: This paper presents flood prediction models for the state of Kerala in India by analyzing the monthly rainfall data and applying machine learning algorithms including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, and Support Vector Machine. Although these models have shown high accuracy prediction of the occurrence of flood in a particular year, they do not quantitative… ▽ More This paper presents flood prediction models for the state of Kerala in India by analyzing the monthly rainfall data and applying machine learning algorithms including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, and Support Vector Machine. Although these models have shown high accuracy prediction of the occurrence of flood in a particular year, they do not quantitatively and qualitatively explain the prediction decision. This paper shows how the background features are learned that contributed to the prediction decision and further extended to explain the inner workings with the development of explainable artificial intelligence modules. The obtained results have confirmed the validity of the findings uncovered by the explainer modules basing on the historical flood monthly rainfall data in Kerala. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: Proceedings of the 2nd Artificial Intelligence and Complex Systems Conference (AICSconf), accepted, 2021

ACM Class: I.2.6; I.2.1

arXiv:2112.00190 [pdf]

doi 10.1016/j.marpolbul.2022.113853

Is the use of Deep Learning and Artificial Intelligence an appropriate means to locate debris in the ocean without harming aquatic wildlife?

Authors: Zoe Moorton, Zeyneb Kurt, Wai Lok Woo

Abstract: With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystem… ▽ More With the global issue of plastic debris ever expanding, it is about time that the technology industry stepped in. This study aims to assess whether deep learning can successfully distinguish between marine life and man-made debris underwater. The aim is to find if we are safely able to clean up our oceans with Artificial Intelligence without disrupting the delicate balance of the aquatic ecosystems. The research explores the use of Convolutional Neural Networks from the perspective of protecting the ecosystem, rather than primarily collecting rubbish. We did this by building a custom-built, deep learning model, with an original database including 1,644 underwater images and used a binary classification to sort synthesised material from aquatic life. We concluded that although it is possible to safely distinguish between debris and life, further exploration with a larger database and stronger CNN structure has the potential for much more promising results. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: reference list is added/updated; sorry for causing any inconveniences. 3681 words, 14 pages

arXiv:2106.04966 [pdf, other]

Towards Explainable Abnormal Infant Movements Identification: A Body-part Based Prediction and Visualisation Framework

Authors: Kevin D. McCay, Edmond S. L. Ho, Dimitrios Sakkos, Wai Lok Woo, Claire Marcroft, Patricia Dulson, Nicholas D. Embleton

Abstract: Providing early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious. In this paper, we propose a new framework for the automated classification of infant body movements, based upon the GM… ▽ More Providing early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early diagnosis, however these manual methods can be laborious. In this paper, we propose a new framework for the automated classification of infant body movements, based upon the GMA, which unlike previous methods, also incorporates a visualization framework to aid with interpretability. Our proposed framework segments extracted features to detect the presence of Fidgety Movements (FMs) associated with the GMA spatiotemporally. These features are then used to identify the body-parts with the greatest contribution towards a classification decision and highlight the related body-part segment providing visual feedback to the user. We quantitatively compare the proposed framework's classification performance with several other methods from the literature and qualitatively evaluate the visualization's veracity. Our experimental results show that the proposed method performs more robustly than comparable techniques in this setting whilst simultaneously providing relevant visual interpretability. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: Proceedings of the 2021 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), accepted, 2021

ACM Class: I.4.9; I.5.0; J.3; I.2.1

Showing 1–14 of 14 results for author: Woo, W L