-
Non-Uniform Spatial Alignment Errors in sUAS Imagery From Wide-Area Disasters
Authors:
Thomas Manzini,
Priyankari Perali,
Raisa Karnik,
Mihir Godbole,
Hasnat Abdullah,
Robin Murphy
Abstract:
This work presents the first quantitative study of alignment errors between small uncrewed aerial systems (sUAS) geospatial imagery and a priori building polygons and finds that alignment errors are non-uniform and irregular. The work also introduces a publicly available dataset of imagery, building polygons, and human-generated and curated adjustments that can be used to evaluate existing strateg…
▽ More
This work presents the first quantitative study of alignment errors between small uncrewed aerial systems (sUAS) geospatial imagery and a priori building polygons and finds that alignment errors are non-uniform and irregular. The work also introduces a publicly available dataset of imagery, building polygons, and human-generated and curated adjustments that can be used to evaluate existing strategies for aligning building polygons with sUAS imagery. There are no efforts that have aligned pre-existing spatial data with sUAS imagery, and thus, there is no clear state of practice. However, this effort and analysis show that the translational alignment errors present in this type of data, averaging 82px and an intersection over the union of 0.65, which would induce further errors and biases in downstream machine learning systems unless addressed. This study identifies and analyzes the translational alignment errors of 21,619 building polygons in fifty-one orthomosaic images, covering 16787.2 Acres (26.23 square miles), constructed from sUAS raw imagery from nine wide-area disasters (Hurricane Ian, Hurricane Harvey, Hurricane Michael, Hurricane Ida, Hurricane Idalia, Hurricane Laura, the Mayfield Tornado, the Musset Bayou Fire, and the Kilauea Eruption). The analysis finds no uniformity among the angle and distance metrics of the building polygon alignments as they present an average degree variance of 0.4 and an average pixel distance variance of 0.45. This work alerts the sUAS community to the problem of spatial alignment and that a simple linear transform, often used to align satellite imagery, will not be sufficient to align spatial data in sUAS orthomosaic imagery.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report
Authors:
YuHe Ke,
Liyuan **,
Kabilan Elangovan,
Hairil Rizal Abdullah,
Nan Liu,
Alex Tiong Heng Sia,
Chai Rick Soh,
Joshua Yi Min Tung,
Jasmine Chiat Ling Ong,
Daniel Shu Wei Ting
Abstract:
Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Methods: We developed an LLM-RAG model using 3…
▽ More
Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Methods: We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses, with a total of 1260 responses evaluated. The RAG process involved converting clinical documents into text using Python-based frameworks like LangChain and Llamaindex, and processing these texts into chunks for embedding and retrieval. Vector storage techniques and selected embedding models to optimize data retrieval, using Pinecone for vector storage with a dimensionality of 1536 and cosine similarity for loss metrics. Human-generated answers, provided by junior doctors, were used as a comparison.
Results: The LLM-RAG model generated answers within an average of 15-20 seconds, significantly faster than the 10 minutes typically required by humans. Among the basic LLMs, GPT4.0 exhibited the best accuracy of 80.1%. This accuracy was further increased to 91.4% when the model was enhanced with RAG. Compared to the human-generated instructions, which had an accuracy of 86.3%, the performance of the GPT4.0 RAG model demonstrated non-inferiority (p=0.610).
Conclusions: In this case study, we demonstrated a LLM-RAG model for healthcare implementation. The pipeline shows the advantages of grounded knowledge, upgradability, and scalability as important aspects of healthcare LLM deployment.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias
Authors:
Yu He Ke,
Rui Yang,
Sui An Lie,
Taylor Xin Yi Lim,
Hairil Rizal Abdullah,
Daniel Shu Wei Ting,
Nan Liu
Abstract:
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decisi…
▽ More
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy.
Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses.
Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80).
Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.
△ Less
Submitted 12 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Cluster trajectory of SOFA score in predicting mortality in sepsis
Authors:
Yuhe Ke,
Matilda Swee Sun Tang,
Celestine Jia Ling Loh,
Hairil Rizal Abdullah,
Nicholas Brian Shannon
Abstract:
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.…
▽ More
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.
Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time war** and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python.
Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken.
Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward.
Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification
Authors:
S. M. Nabil Ashraf,
Md. Adyelullahil Mamun,
Hasnat Md. Abdullah,
Md. Golam Rabiul Alam
Abstract:
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We co…
▽ More
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays. Code available at:https://github.com/syednabilashraf/SynthEnsemble
△ Less
Submitted 22 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Affective social anthropomorphic intelligent system
Authors:
Md. Adyelullahil Mamun,
Hasnat Md. Abdullah,
Md. Golam Rabiul Alam,
Muhammad Mehedi Hassan,
Md. Zia Uddin
Abstract:
Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that ca…
▽ More
Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible map** of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data
Authors:
Yucheng Ruan,
Xiang Lan,
Daniel J. Tan,
Hairil Rizal Abdullah,
Mengling Feng
Abstract:
Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization…
▽ More
Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remained for existing work to be effectively adapted into medical domain, such as under-utilization of unstructured free-texts, limited exploration of textual information in structured data, and data corruption. To address these issues, we propose P-Transformer, a Prompt-based multimodal Transformer architecture designed specifically for medical tabular data. This framework consists two critical components: a tabular cell embedding generator and a tabular transformer. The former efficiently encodes diverse modalities from both structured and unstructured tabular data into a harmonized language semantic space with the help of pre-trained sentence encoder and medical prompts. The latter integrates cell representations to generate patient embeddings for various medical tasks. In comprehensive experiments on two real-world datasets for three medical tasks, P-Transformer demonstrated the improvements with 10.9%/11.0% on RMSE/MAE, 0.5%/2.2% on RMSE/MAE, and 1.6%/0.8% on BACC/AUROC compared to state-of-the-art (SOTA) baselines in predictability. Notably, the model exhibited strong resilience to data corruption in the structured data, particularly when the corruption rates are high.
△ Less
Submitted 9 January, 2024; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Authors:
Raphael Olivier,
Hadi Abdullah,
Bhiksha Raj
Abstract:
A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition (ASR) system to output attacker-chosen text. To exploit ASR models in real-world, black-box settings, an adversary can leverage the transferability property, i.e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR. However recent work has shown that transferabil…
▽ More
A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition (ASR) system to output attacker-chosen text. To exploit ASR models in real-world, black-box settings, an adversary can leverage the transferability property, i.e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR. However recent work has shown that transferability against large ASR models is very difficult. In this work, we show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability. We successfully demonstrate this phenomenon by evaluating state-of-the-art self-supervised ASR models like Wav2Vec2, HuBERT, Data2Vec and WavLM. We show that with low-level additive noise achieving a 30dB Signal-Noise Ratio, we can achieve target transferability with up to 80% accuracy. Next, we 1) use an ablation study to show that Self-Supervised learning is the main cause of that phenomenon, and 2) we provide an explanation for this phenomenon. Through this we show that modern ASR architectures are uniquely vulnerable to adversarial security threats.
△ Less
Submitted 29 September, 2022; v1 submitted 17 September, 2022;
originally announced September 2022.
-
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems
Authors:
Hadi Abdullah,
Aditya Karlekar,
Saurabh Prasad,
Muhammad Sajidur Rahman,
Logan Blue,
Luke A. Bauer,
Vincent Bindschaedler,
Patrick Traynor
Abstract:
Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible…
▽ More
Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new "robust to transcription" but "easy for humans to understand" CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., $P({\rm transcription}) = 4 \times 10^{-5}$). Finally, we demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems ($P({\rm evasion}) = 1.77 \times 10^{-4}$). In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Beyond $L_p$ clip**: Equalization-based Psychoacoustic Attacks against ASRs
Authors:
Hadi Abdullah,
Muhammad Sajidur Rahman,
Christian Peeters,
Cassidy Gibson,
Washington Garcia,
Vincent Bindschaedler,
Thomas Shrimpton,
Patrick Traynor
Abstract:
Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively i…
▽ More
Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a user study to verify that our method creates low audible distortion. Specifically, 80 of the 100 participants voted in favor of all our attack audio samples as less noisier than the existing state-of-the-art attack. Through this, we demonstrate both types of existing ASR pipelines can be exploited with minimum degradation to attack audio quality.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data
Authors:
Han Yuan,
Feng Xie,
Marcus Eng Hock Ong,
Yilin Ning,
Marcel Lucas Chee,
Seyed Ehsan Saffari,
Hairil Rizal Abdullah,
Benjamin Alan Goldstein,
Bibhas Chakraborty,
Nan Liu
Abstract:
Background: Medical decision-making impacts both individual and public health. Clinical scores are commonly used among a wide variety of decision-making models for determining the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. Its current framework, however, still leaves room for…
▽ More
Background: Medical decision-making impacts both individual and public health. Clinical scores are commonly used among a wide variety of decision-making models for determining the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. Its current framework, however, still leaves room for improvement when addressing unbalanced data of rare events. Methods: Using machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. All scoring models were evaluated on the basis of their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches in the prediction of inpatient mortality. Results: AutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732-0.839) while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685-0.800). The AutoScore-Imbalance sub-model (using down-sampling algorithm) yielded an AUC of 0. 0.771 (0.718-0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Conclusions: The AutoScore-Imbalance tool has the potential to be applied to highly unbalanced datasets to gain further insight into rare medical events and to facilitate real-world clinical decision-making.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems
Authors:
Hadi Abdullah,
Kevin Warren,
Vincent Bindschaedler,
Nicolas Papernot,
Patrick Traynor
Abstract:
Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulner…
▽ More
Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost universally fail to transfer. In so doing, we argue that substantial additional work is required to provide adequate mitigations in this space.
△ Less
Submitted 21 July, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
SmartCoAuth: Smart-Contract privacy preservation mechanism on querying sensitive records in the cloud
Authors:
Muhammed Siraj,
Mohd. Izuan Hafez Hj. Ninggal,
Nur Izura Udzir,
Muhammad Daniel Hafiz Abdullah,
Aziah Asmawi
Abstract:
Sensitive records stored in the cloud such as healthcare records, private conversation and credit card information are targets of hackers and privacy abuse. Current information and record management systems have difficulties achieving privacy protection of such sensitive records in a secure, transparent, decentralized and trustless environment. The Blockchain technology is a nascent and a promisin…
▽ More
Sensitive records stored in the cloud such as healthcare records, private conversation and credit card information are targets of hackers and privacy abuse. Current information and record management systems have difficulties achieving privacy protection of such sensitive records in a secure, transparent, decentralized and trustless environment. The Blockchain technology is a nascent and a promising technology that facilitates data sharing and access in a secure, decentralized and trustless environment. The technology enables the use of smart contracts that can be leveraged to complement existing traditional systems to achieve security objectives that were never possible before. In this paper, we propose a framework based on Blockchain technology to enable privacy-preservation in a secured, decentralized, transparent and trustless environment. We name our framework SmartCoAuth. It is based on Ethereum Smart Contract functions as the secure, decentralized, transparent authentication and authorization mechanism in the framework. It also enables tamper-proof auditing of access to the protected records. We analysed how SmartCoAuth could be integrated into a cloud application to provide reliable privacy-preservation among stakeholders of healthcare records stored in the cloud. The proposed framework provides a satisfactory level of data utility and privacy preservation.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems
Authors:
Hadi Abdullah,
Muhammad Sajidur Rahman,
Washington Garcia,
Logan Blue,
Kevin Warren,
Anurag Swarnim Yadav,
Tom Shrimpton,
Patrick Traynor
Abstract:
Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and…
▽ More
Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and transferable, and demonstrably achieve mistranscription and misidentification rates as high as 100% by modifying only a few frames of audio. We perform a study via Amazon Mechanical Turk demonstrating that there is no statistically significant difference between human perception of regular and perturbed audio. Our findings suggest that models may learn aspects of speech that are generally not perceived by human subjects, but that are crucial for model accuracy. We also find that certain English language phonemes (in particular, vowels) are significantly more susceptible to our attack. We show that the attacks are effective when mounted over cellular networks, where signals are subject to degradation due to transcoding, jitter, and packet loss.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Authors:
Hadi Abdullah,
Washington Garcia,
Christian Peeters,
Patrick Traynor,
Kevin R. B. Butler,
Joseph Wilson
Abstract:
Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human b…
▽ More
Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks.
△ Less
Submitted 18 March, 2019;
originally announced April 2019.
-
Testability Measurement Model for Object Oriented Design (TMMOOD)
Authors:
M. H. Khan Abdullah,
Reena Srivastava
Abstract:
Measuring testability early in the development life cycle especially at design phase is a criterion of crucial importance to software designers, developers, quality controllers and practitioners. However, most of the mechanism available for testability measurement may be used in the later phases of development life cycle. Early estimation of testability, absolutely at design phase helps designers…
▽ More
Measuring testability early in the development life cycle especially at design phase is a criterion of crucial importance to software designers, developers, quality controllers and practitioners. However, most of the mechanism available for testability measurement may be used in the later phases of development life cycle. Early estimation of testability, absolutely at design phase helps designers to improve their designs before the coding starts. Practitioners regularly advocate that testability should be planned early in design phase. Testability measurement early in design phase is greatly emphasized in this study; hence, considered significant for the delivery of quality software. As a result, it extensively reduces rework during and after implementation, as well as facilitate for design effective test plans, better project and resource planning in a practical manner, with a focus on the design phase. An effort has been put forth in this paper to recognize the key factors contributing in testability measurement at design phase. Additionally, testability measurement model is developed to quantify software testability at design phase. Furthermore, the relationship of Testability with these factors has been tested and justified with the help of statistical measures. The developed model has been validated using experimental tryout. Finally, it incorporates the empirical validation of the testability measurement model as the authors most important contribution.
△ Less
Submitted 18 March, 2015;
originally announced March 2015.
-
Edge direction matrixes-based local binar patterns descriptor for shape pattern recognition
Authors:
Mohammed A. Talab,
Siti Norul Huda Sheikh Abdullah,
Bilal Bataineh
Abstract:
Shapes and texture image recognition usage is an essential branch of pattern recognition. It is made up of techniques that aim at extracting information from images via human knowledge and works. Local Binary Pattern (LBP) ensures encoding global and local information and scaling invariance by introducing a look-up table to reflect the uniformity structure of an object. However, edge direction mat…
▽ More
Shapes and texture image recognition usage is an essential branch of pattern recognition. It is made up of techniques that aim at extracting information from images via human knowledge and works. Local Binary Pattern (LBP) ensures encoding global and local information and scaling invariance by introducing a look-up table to reflect the uniformity structure of an object. However, edge direction matrixes (EDMS) only apply global invariant descriptor which employs first and secondary order relationships. The main idea behind this methodology is the need of improved recognition capabilities, a goal achieved by the combinative use of these descriptors. This collaboration aims to make use of the major advantages each one presents, by simultaneously complementing each other, in order to elevate their weak points. By using multiple classifier approaches such as random forest and multi-layer perceptron neural network, the proposed combinative descriptor are compared with the state of the art combinative methods based on Gray-Level Co-occurrence matrix (GLCM with EDMS), LBP and moment invariant on four benchmark dataset MPEG-7 CE-Shape-1, KTH-TIPS image, Enghlishfnt and Arabic calligraphy . The experiments have shown the superiority of the introduced descriptor over the GLCM with EDMS, LBP and moment invariants and other well-known descriptor such as Scale Invariant Feature Transform from the literature.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.
-
Efficient Distance Computation Algorithm between Nearly Intersected Objects Using Dynamic Pivot Point in Virtual Environment Application
Authors:
Hamzah Asyrani Sulaiman,
Abdullah Bade,
Mohd Harun Abdullah
Abstract:
Finding nearly accurate distance between two or more nearly intersecting three-dimensional (3D) objects is vital especially for collision determination such as in virtual surgeon simulation and real-time car crash simulation. Instead of performing broad phase collision detection, we need to check for accuracy of detection by running narrow phase collision detection. One of the important elements f…
▽ More
Finding nearly accurate distance between two or more nearly intersecting three-dimensional (3D) objects is vital especially for collision determination such as in virtual surgeon simulation and real-time car crash simulation. Instead of performing broad phase collision detection, we need to check for accuracy of detection by running narrow phase collision detection. One of the important elements for narrow phase collision detection is to determine the precise distance between two or more nearly intersecting objects or polygons in order to prepare the area for potential colliding. Distance computation plays important roles in determine the exact point of contact between two or more nearly intersecting polygons where the preparation for collision detection is determined at the earlier stage. In this paper, we describes our current works of determining the distance between objects using dynamic pivot point that will be used as reference point to reduce the complexity searching for potential point of contacts. By using Axis-Aligned Bounding Box for each polygon, we calculate a dynamic pivot point that will become our reference point to determine the potential candidates for distance computation. The test our finding distance will be simplified by using our method instead of performing unneeded operations. Our method provides faster solution than the previous method where it helps to determine the point of contact efficiently and faster than the other method.
△ Less
Submitted 16 October, 2014;
originally announced October 2014.
-
Traffic Engineering Based on Effective Envelope Algorithm on Novel Resource Reservation Method over Mobile Internet Protocol Version 6
Authors:
Reza Malekian,
Abdul Hanan Abdullah
Abstract:
The first decade of the 21st century has seen tremendous improvements in mobile internet and its technologies. The high traffic volume of services such as video conference and other real-time traffic applications are imposing a great challenge on networks. In the meantime, demand for the use of mobile devices in computation and communication such as smart phones, personal digital assistants, and m…
▽ More
The first decade of the 21st century has seen tremendous improvements in mobile internet and its technologies. The high traffic volume of services such as video conference and other real-time traffic applications are imposing a great challenge on networks. In the meantime, demand for the use of mobile devices in computation and communication such as smart phones, personal digital assistants, and mobile-enabled laptops has grown rapidly. These services have driven the demand for increasing and guaranteing bandwidth requirements in the network. A direction of this paper is in the case of resource reservation protocol (RSVP) over mobile IPv6 networks. There are numbers of proposed solutions for RSVP and quality of service provision over mobile IPv6 networks, but most of them using advanced resource reservation. In this paper, we propose a mathematical model to determine maximum end-to-end delay bound through intermediate routers along the network. These bounds are sent back to the home agent for further processing. Once the home agent receives maximum end-to-end delay bounds, it calculates cumulative bound and compares this bound with the desired application end-to-end delay bound to make final decision on resource reservation. This approach improves network resource utilization.
△ Less
Submitted 22 November, 2012;
originally announced November 2012.
-
Cryptanalysis and Improvements on Some Graph-based Authentication Schemes
Authors:
Mohammad Eftekhari,
Herish Abdullah
Abstract:
In 2010, Grigoriev and Shpilrain, introduced some graph-based authentication schemes. We present a cryptanalysis of some of these protocols, and introduce some new schemes to fix the problems.
In 2010, Grigoriev and Shpilrain, introduced some graph-based authentication schemes. We present a cryptanalysis of some of these protocols, and introduce some new schemes to fix the problems.
△ Less
Submitted 27 September, 2012;
originally announced September 2012.
-
Improving Overhead Computation and pre-processing Time for Grid Scheduling System
Authors:
Asgarali Bouyer,
Mohammad Javad hoseyni,
Abdul Hanan Abdullah
Abstract:
Computational Grid is enormous environments with heterogeneous resources and stable infrastructures among other Internet-based computing systems. However, the managing of resources in such systems has its special problems. Scheduler systems need to get last information about participant nodes from information centers for the purpose of firmly job scheduling. In this paper, we focus on online updat…
▽ More
Computational Grid is enormous environments with heterogeneous resources and stable infrastructures among other Internet-based computing systems. However, the managing of resources in such systems has its special problems. Scheduler systems need to get last information about participant nodes from information centers for the purpose of firmly job scheduling. In this paper, we focus on online updating resource information centers with processed and provided data based on the assumed hierarchical model. A hybrid knowledge extraction method has been used to classifying grid nodes based on prediction of jobs' features. An affirmative point of this research is that scheduler systems don't waste extra time for getting up-to-date information of grid nodes. The experimental result shows the advantages of our approach compared to other conservative methods, especially due to its ability to predict the behavior of nodes based on comprehensive data tables on each node.
△ Less
Submitted 6 May, 2010;
originally announced May 2010.