Search | arXiv e-print repository

Taking a PEEK into YOLOv5 for Satellite Component Recognition via Entropy-based Visual Explanations

Authors: Mackenzie J. Meni, Trupti Mahendrakar, Olivia D. M. Raney, Ryan T. White, Michael L. Mayo, Kevin Pilkiewicz

Abstract: The escalating risk of collisions and the accumulation of space debris in Low Earth Orbit (LEO) has reached critical concern due to the ever increasing number of spacecraft. Addressing this crisis, especially in dealing with non-cooperative and unidentified space debris, is of paramount importance. This paper contributes to efforts in enabling autonomous swarms of small chaser satellites for targe… ▽ More The escalating risk of collisions and the accumulation of space debris in Low Earth Orbit (LEO) has reached critical concern due to the ever increasing number of spacecraft. Addressing this crisis, especially in dealing with non-cooperative and unidentified space debris, is of paramount importance. This paper contributes to efforts in enabling autonomous swarms of small chaser satellites for target geometry determination and safe flight trajectory planning for proximity operations in LEO. Our research explores on-orbit use of the You Only Look Once v5 (YOLOv5) object detection model trained to detect satellite components. While this model has shown promise, its inherent lack of interpretability hinders human understanding, a critical aspect of validating algorithms for use in safety-critical missions. To analyze the decision processes, we introduce Probabilistic Explanations for Entropic Knowledge extraction (PEEK), a method that utilizes information theoretic analysis of the latent representations within the hidden layers of the model. Through both synthetic in hardware-in-the-loop experiments, PEEK illuminates the decision-making processes of the model, hel** identify its strengths, limitations and biases. △ Less

Submitted 25 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2308.14938 [pdf, other]

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Authors: Mackenzie J. Meni, Ryan T. White, Michael Mayo, Kevin Pilkiewicz

Abstract: Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in… ▽ More Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy. △ Less

Submitted 3 July, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 13 pages, 4 figures

arXiv:2305.09856 [pdf, other]

doi 10.1109/CLOUD60044.2023.00024

Keep It Simple: Fault Tolerance Evaluation of Federated Learning with Unreliable Clients

Authors: Victoria Huang, Shaleeza Sohail, Michael Mayo, Tania Lorido Botran, Mark Rodrigues, Chris Anderson, Melanie Ooi

Abstract: Federated learning (FL), as an emerging artificial intelligence (AI) approach, enables decentralized model training across multiple devices without exposing their local training data. FL has been increasingly gaining popularity in both academia and industry. While research works have been proposed to improve the fault tolerance of FL, the real impact of unreliable devices (e.g., drop** out, misc… ▽ More Federated learning (FL), as an emerging artificial intelligence (AI) approach, enables decentralized model training across multiple devices without exposing their local training data. FL has been increasingly gaining popularity in both academia and industry. While research works have been proposed to improve the fault tolerance of FL, the real impact of unreliable devices (e.g., drop** out, misconfiguration, poor data quality) in real-world applications is not fully investigated. We carefully chose two representative, real-world classification problems with a limited numbers of clients to better analyze FL fault tolerance. Contrary to the intuition, simple FL algorithms can perform surprisingly well in the presence of unreliable clients. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2205.05831 [pdf, other]

Feature Extractor Stacking for Cross-domain Few-shot Learning

Authors: Hongyu Wang, Eibe Frank, Bernhard Pfahringer, Michael Mayo, Geoffrey Holmes

Abstract: Cross-domain few-shot learning (CDFSL) addresses learning problems where knowledge needs to be transferred from one or more source domains into an instance-scarce target domain with an explicitly different distribution. Recently published CDFSL methods generally construct a universal model that combines knowledge of multiple source domains into one feature extractor. This enables efficient inferen… ▽ More Cross-domain few-shot learning (CDFSL) addresses learning problems where knowledge needs to be transferred from one or more source domains into an instance-scarce target domain with an explicitly different distribution. Recently published CDFSL methods generally construct a universal model that combines knowledge of multiple source domains into one feature extractor. This enables efficient inference but necessitates re-computation of the extractor whenever a new source domain is added. Some of these methods are also incompatible with heterogeneous source domain extractor architectures. We propose feature extractor stacking (FES), a new CDFSL method for combining information from a collection of extractors, that can utilise heterogeneous pretrained extractors out of the box and does not maintain a universal model that needs to be re-computed when its extractor collection is updated. We present the basic FES algorithm, which is inspired by the classic stacked generalisation approach, and also introduce two variants: convolutional FES (ConFES) and regularised FES (ReFES). Given a target-domain task, these algorithms fine-tune each extractor independently, use cross-validation to extract training data for stacked generalisation from the support set, and learn a simple linear stacking classifier from this data. We evaluate our FES methods on the well-known Meta-Dataset benchmark, targeting image classification with convolutional neural networks, and show that they can achieve state-of-the-art performance. △ Less

Submitted 24 October, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:1901.10583 [pdf, other]

doi 10.1080/08839514.2020.1718343

Automatic end-to-end De-identification: Is high accuracy the only metric?

Authors: Vithya Yogarajan, Bernhard Pfahringer, Michael Mayo

Abstract: De-identification of electronic health records (EHR) is a vital step towards advancing health informatics research and maximising the use of available data. It is a two-step process where step one is the identification of protected health information (PHI), and step two is replacing such PHI with surrogates. Despite the recent advances in automatic de-identification of EHR, significant obstacles r… ▽ More De-identification of electronic health records (EHR) is a vital step towards advancing health informatics research and maximising the use of available data. It is a two-step process where step one is the identification of protected health information (PHI), and step two is replacing such PHI with surrogates. Despite the recent advances in automatic de-identification of EHR, significant obstacles remain if the abundant health data available are to be used to the full potential. Accuracy in de-identification could be considered a necessary, but not sufficient condition for the use of EHR without individual patient consent. We present here a comprehensive review of the progress to date, both the impressive successes in achieving high accuracy and the significant risks and challenges that remain. To best of our knowledge, this is the first paper to present a complete picture of end-to-end automatic de-identification. We review 18 recently published automatic de-identification systems -designed to de-identify EHR in the form of free text- to show the advancements made in improving the overall accuracy of the system, and in identifying individual PHI. We argue that despite the improvements in accuracy there remain challenges in surrogate generation and replacements of identified PHIs, and the risks posed to patient protection and privacy. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: 17 pages, 1 figure, 7 tables, review journal paper

Report number: 04-Feb-2020

Journal ref: Applied Artificial Intelligence, 2020

arXiv:1812.02627 [pdf]

Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records

Authors: Jeffrey Thompson, **xiang Hu, Dinesh Pal Mudaranthakam, David Streeter, Lisa Neums, Michele Park, Devin C. Koestler, Byron Gajewski, Matthew S. Mayo

Abstract: Objective: Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the relevant information is stored in an unstructured format that makes it difficult to use. Natural language processing approaches that attempt to automatically classify the data depend on vectorization algorithms that impose structure o… ▽ More Objective: Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the relevant information is stored in an unstructured format that makes it difficult to use. Natural language processing approaches that attempt to automatically classify the data depend on vectorization algorithms that impose structure on the text, but these algorithms were not designed for the unique characteristics of EHR. Here, we propose a new algorithm for structuring so-called free-text that may help researchers make better use of EHR. We call this method Relevant Word Order Vectorization (RWOV). Materials and Methods: As a proof-of-concept, we attempted to classify the hormone receptor status of breast cancer patients treated at the University of Kansas Medical Center during a recent year, from the unstructured text of pathology reports. Our approach attempts to account for the semi-structured way that healthcare providers often enter information. We compared this approach to the ngrams and word2vec methods. Results: Our approach resulted in the most consistently high accuracy, as measured by F1 score and area under the receiver operating characteristic curve (AUC). Discussion: Our results suggest that methods of structuring free text that take into account its context may show better performance, and that our approach is promising. Conclusion: By using a method that accounts for the fact that healthcare providers tend to use certain key words repetitively and that the order of these key words is important, we showed improved performance over methods that do not. △ Less

Submitted 6 December, 2018; originally announced December 2018.

arXiv:1810.06765 [pdf, other]

A survey of automatic de-identification of longitudinal clinical narratives

Authors: Vithya Yogarajan, Michael Mayo, Bernhard Pfahringer

Abstract: Use of medical data, also known as electronic health records, in research helps develop and advance medical science. However, protecting patient confidentiality and identity while using medical data for analysis is crucial. Medical data can be in the form of tabular structures (i.e. tables), free-form narratives, and images. This study focuses on medical data in the free form longitudinal text. De… ▽ More Use of medical data, also known as electronic health records, in research helps develop and advance medical science. However, protecting patient confidentiality and identity while using medical data for analysis is crucial. Medical data can be in the form of tabular structures (i.e. tables), free-form narratives, and images. This study focuses on medical data in the free form longitudinal text. De-identification of electronic health records provides the opportunity to use such data for research without it affecting patient privacy, and avoids the need for individual patient consent. In recent years there is increasing interest in develo** an accurate, robust and adaptable automatic de-identification system for electronic health records. This is mainly due to the dilemma between the availability of an abundance of health data, and the inability to use such data in research due to legal and ethical restrictions. De-identification tracks in competitions such as the 2014 i2b2 UTHealth and the 2016 CEGS N-GRID shared tasks have provided a great platform to advance this area. The primary reasons for this include the open source nature of the dataset and the fact that raw psychiatric data were used for 2016 competitions. This study focuses on noticeable trend changes in the techniques used in the development of automatic de-identification for longitudinal clinical narratives. More specifically, the shift from using conditional random fields (CRF) based systems only or rules (regular expressions, dictionary or combinations) based systems only, to hybrid models (combining CRF and rules), and more recently to deep learning based systems. We review the literature and results that arose from the 2014 and the 2016 competitions and discuss the outcomes of these systems. We also provide a list of research questions that emerged from this survey. △ Less

Submitted 15 October, 2018; originally announced October 2018.

arXiv:1807.03346 [pdf, other]

Using Swarm Optimization To Enhance Autoencoders Images

Authors: Maisa Doaud, Michael Mayo

Abstract: Autoencoders learn data representations through reconstruction. Robust training is the key factor affecting the quality of the learned representations and, consequently, the accuracy of the application that use them. Previous works suggested methods for deciding the optimal autoencoder configuration which allows for robust training. Nevertheless, improving the accuracy of a trained autoencoder has… ▽ More Autoencoders learn data representations through reconstruction. Robust training is the key factor affecting the quality of the learned representations and, consequently, the accuracy of the application that use them. Previous works suggested methods for deciding the optimal autoencoder configuration which allows for robust training. Nevertheless, improving the accuracy of a trained autoencoder has got limited, if no, attention. We propose a new approach that improves the accuracy of a trained autoencoders results and answers the following question, Given a trained autoencoder, a test image, and using a real-parameter optimizer, can we generate better quality reconstructed image version than the one generated by the autoencoder?. Our proposed approach combines both the decoder part of a trained Resitricted Boltman Machine-based autoencoder with the Competitive Swarm Optimization algorithm. Experiments show that it is possible to reconstruct images using trained decoder from randomly initialized representations. Results also show that our approach reconstructed better quality images than the autoencoder in most of the test cases. Indicating that, we can use the approach for improving the performance of a pre-trained autoencoder if it does not give satisfactory results. △ Less

Submitted 9 July, 2018; originally announced July 2018.

Comments: 14 pages,2 figures and 17 graphs, conference paper (MDAI2017) http://mdai.cat/mdai2017/mdai2017.usb.pdf

Journal ref: In The 14th International Conference on Modeling Decisions for Artificial Intelligence. USB Proceedings., pages 118-131, Kitakyushu, Japan (October 18 - 20, 2017), 2017

arXiv:1707.04943 [pdf, other]

Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data

Authors: Michael Mayo, Eibe Frank

Abstract: Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These r… ▽ More Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm. △ Less

Submitted 27 November, 2018; v1 submitted 16 July, 2017; originally announced July 2017.

arXiv:1608.06492 [pdf, other]

Multiple Infection Sources Identification with Provable Guarantees

Authors: Hung T. Nguyen, Preetam Ghosh, Michael L. Mayo, Thang N. Dinh

Abstract: Given an aftermath of a cascade in the network, i.e. a set $V_I$ of "infected" nodes after an epidemic outbreak or a propagation of rumors/worms/viruses, how can we infer the sources of the cascade? Answering this challenging question is critical for computer forensic, vulnerability analysis, and risk management. Despite recent interest towards this problem, most of existing works focus only on si… ▽ More Given an aftermath of a cascade in the network, i.e. a set $V_I$ of "infected" nodes after an epidemic outbreak or a propagation of rumors/worms/viruses, how can we infer the sources of the cascade? Answering this challenging question is critical for computer forensic, vulnerability analysis, and risk management. Despite recent interest towards this problem, most of existing works focus only on single source detection or simple network topologies, e.g. trees or grids. In this paper, we propose a new approach to identify infection sources by searching for a seed set $S$ that minimizes the \emph{symmetric difference} between the cascade from $S$ and $V_I$, the given set of infected nodes. Our major result is an approximation algorithm, called SISI, to identify infection sources \emph{without the prior knowledge on the number of source nodes}. SISI, to our best knowledge, is the first algorithm with \emph{provable guarantee} for the problem in general graphs. It returns a $\frac{2}{(1-ε)^2}Δ$-approximate solution with high probability, where $Δ$ denotes the maximum number of nodes in $V_I$ that may infect a single node in the network. Our experiments on real-world networks show the superiority of our approach and SISI in detecting true source(s), boosting the F1-measure from few percents, for the state-of-the-art NETSLEUTH, to approximately 50\%. △ Less

Submitted 23 August, 2016; originally announced August 2016.

Comments: in The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016)

arXiv:1406.3287 [pdf]

A Clustering Analysis of Tweet Length and its Relation to Sentiment

Authors: Matthew Mayo

Abstract: Sentiment analysis of Twitter data is performed. The researcher has made the following contributions via this paper: (1) an innovative method for deriving sentiment score dictionaries using an existing sentiment dictionary as seed words is explored, and (2) an analysis of clustered tweet sentiment scores based on tweet length is performed. Sentiment analysis of Twitter data is performed. The researcher has made the following contributions via this paper: (1) an innovative method for deriving sentiment score dictionaries using an existing sentiment dictionary as seed words is explored, and (2) an analysis of clustered tweet sentiment scores based on tweet length is performed. △ Less

Submitted 18 February, 2015; v1 submitted 12 June, 2014; originally announced June 2014.

Comments: 6 pages

Showing 1–11 of 11 results for author: Mayo, M