-
Spatial-Temporal Anomaly Detection for Sensor Attacks in Autonomous Vehicles
Authors:
Martin Higgins,
Devki Jha,
David Wallom
Abstract:
Time-of-flight (ToF) distance measurement devices such as ultrasonics, LiDAR and radar are widely used in autonomous vehicles for environmental perception, navigation and assisted braking control. Despite their relative importance in making safer driving decisions, these devices are vulnerable to multiple attack types including spoofing, triggering and false data injection. When these attacks are…
▽ More
Time-of-flight (ToF) distance measurement devices such as ultrasonics, LiDAR and radar are widely used in autonomous vehicles for environmental perception, navigation and assisted braking control. Despite their relative importance in making safer driving decisions, these devices are vulnerable to multiple attack types including spoofing, triggering and false data injection. When these attacks are successful they can compromise the security of autonomous vehicles leading to severe consequences for the driver, nearby vehicles and pedestrians. To handle these attacks and protect the measurement devices, we propose a spatial-temporal anomaly detection model \textit{STAnDS} which incorporates a residual error spatial detector, with a time-based expected change detection. This approach is evaluated using a simulated quantitative environment and the results show that \textit{STAnDS} is effective at detecting multiple attack types.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Authors:
Lily H. Zhang,
Veronica Tozzo,
John M. Higgins,
Rajesh Ranganath
Abstract:
Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for predictio…
▽ More
Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for prediction. To address these issues, we introduce the clean path principle for equivariant residual connections and develop set norm, a normalization tailored for sets. With these, we build Deep Sets++ and Set Transformer++, models that reach high depths with comparable or better performance than their original counterparts on a diverse suite of tasks. We additionally introduce Flow-RBC, a new single-cell dataset and real-world application of permutation invariant prediction. We open-source our data and code here: https://github.com/rajesh-lab/deep_permutation_invariant.
△ Less
Submitted 13 July, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Automated speech tools for hel** communities process restricted-access corpora for language revival efforts
Authors:
Nay San,
Martijn Bartelds,
Tolúlopé Ògúnrèmí,
Alison Mount,
Ruben Thompson,
Michael Higgins,
Roy Barker,
Jane Simpson,
Dan Jurafsky
Abstract:
Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We pro…
▽ More
Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. What is the word for 'tree'?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data: 10 utterances per language for SLI and for ASR at most 39 minutes, and possibly as little as 39 seconds.
△ Less
Submitted 24 April, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Actionable Conversational Quality Indicators for Improving Task-Oriented Dialog Systems
Authors:
Michael Higgins,
Dominic Widdows,
Chris Brew,
Gwen Christian,
Andrew Maurer,
Matthew Dunn,
Sujit Mathi,
Akshay Hazare,
George Bonev,
Beth Ann Hockey,
Kristen Howell,
Joe Bradley
Abstract:
Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the…
▽ More
Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the frequency of inappropriate system responses. These analyses and recommendations need to be presented in terms that directly reflect the user experience rather than the internal dialog processing.
This paper introduces and explains the use of Actionable Conversational Quality Indicators (ACQIs), which are used both to recognize parts of dialogs that can be improved, and to recommend how to improve them. This combines benefits of previous approaches, some of which have focused on producing dialog quality scoring while others have sought to categorize the types of errors the dialog system is making.
We demonstrate the effectiveness of using ACQIs on LivePerson internal dialog systems used in commercial customer service applications, and on the publicly available CMU LEGOv2 conversational dataset (Raux et al. 2005). We report on the annotation and analysis of conversational datasets showing which ACQIs are important to fix in various situations.
The annotated datasets are then used to build a predictive model which uses a turn-based vector embedding of the message texts and achieves an 79% weighted average f1-measure at the task of finding the correct ACQI for a given conversation. We predict that if such a model worked perfectly, the range of potential improvement actions a bot-builder must consider at each turn could be reduced by an average of 81%.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
SeMA: Extending and Analyzing Storyboards to Develop Secure Android Apps
Authors:
Joydeep Mitra,
Venkatesh-Prasad Ranganath,
Torben Amtoft,
Mike Higgins
Abstract:
Mobile apps provide various critical services, such as banking, communication, and healthcare. To this end, they have access to our personal information and have the ability to perform actions on our behalf. Hence, securing mobile apps is crucial to ensuring the privacy and safety of its users.
Recent research efforts have focused on develo** solutions to secure mobile ecosystems (i.e., app pl…
▽ More
Mobile apps provide various critical services, such as banking, communication, and healthcare. To this end, they have access to our personal information and have the ability to perform actions on our behalf. Hence, securing mobile apps is crucial to ensuring the privacy and safety of its users.
Recent research efforts have focused on develo** solutions to secure mobile ecosystems (i.e., app platforms, apps, and app stores), specifically in the context of detecting vulnerabilities in Android apps. Despite this attention, known vulnerabilities are often found in mobile apps, which can be exploited by malicious apps to harm the user. Further, fixing vulnerabilities after develo** an app has downsides in terms of time, resources, user inconvenience, and information loss.
In an attempt to address this concern, we have developed SeMA, a mobile app development methodology that builds on existing mobile app design artifacts such as storyboards. With SeMA, security is a first-class citizen in an app's design -- app designers and developers can collaborate to specify and reason about the security properties of an app at an abstract level without being distracted by implementation level details. Our realization of SeMA using Android Studio tooling demonstrates the methodology is complementary to existing design and development practices. An evaluation of the effectiveness of SeMA shows the methodology can detect and help prevent 49 vulnerabilities known to occur in Android apps. Further, a usability study of the methodology involving ten real-world developers shows the methodology is likely to reduce the development time and help developers uncover and prevent known vulnerabilities while designing apps.
△ Less
Submitted 10 March, 2024; v1 submitted 27 January, 2020;
originally announced January 2020.
-
A new method for quantifying network cyclic structure to improve community detection
Authors:
Behnaz Moradi-Jamei,
Heman Shakeri,
Pietro Poggi-Corradini,
Michael J. Higgins
Abstract:
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local "richness" of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a…
▽ More
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local "richness" of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a weight to each edge equal to the probability that a non-backtracking random walk completes a cycle with that edge. Hence, edges with larger weights may be thought of as more important to the formation of cycles. Of note, since separate random walks can be performed in parallel, RNBRW weights can be estimated very quickly, even for large graphs. We give simulation results showing that pre-weighting edges through RNBRW may substantially improve the performance of common community detection algorithms. Our results suggest that RNBRW is especially efficient for the challenging case of detecting communities in sparse graphs.
△ Less
Submitted 11 October, 2019; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Hybridized Threshold Clustering for Massive Data
Authors:
Jianmei Luo,
ChandraVyas Annakula,
Aruna Sai Kannamareddy,
Jasjeet S. Sekhon,
William Henry Hsu,
Michael Higgins
Abstract:
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm desig…
▽ More
As the size $n$ of datasets become massive, many commonly-used clustering algorithms (for example, $k$-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm designed to partition data into many small clusters in linearithmic time (on average). Our proposed clustering method is as follows. First, TC is performed and clusters are reduced into single "prototype" points. Then, TC is applied repeatedly on these prototype points until sufficient data reduction has been obtained. Finally, a more sophisticated clustering algorithm is applied to the reduced prototype points, thereby obtaining a clustering on all $n$ data points. This entire procedure for clustering is called iterative hybridized threshold clustering (IHTC). Through simulation results and by applying our methodology on several real datasets, we show that IHTC combined with $k$-means or HAC substantially reduces the run time and memory usage of the original clustering algorithms while still preserving their performance. Additionally, IHTC helps prevent singular data points from being overfit by clustering algorithms.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
A new method for quantifying network cyclic structure to improve community detection
Authors:
Behnaz Moradi,
Heman Shakeri,
Pietro Poggi-Corradini,
Michael Higgins
Abstract:
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local "richness" of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a…
▽ More
A distinguishing property of communities in networks is that cycles are more prevalent within communities than across communities. Thus, the detection of these communities may be aided through the incorporation of measures of the local "richness" of the cyclic structure. In this paper, we introduce renewal non-backtracking random walks (RNBRW) as a way of quantifying this structure. RNBRW gives a weight to each edge equal to the probability that a non-backtracking random walk completes a cycle with that edge. Hence, edges with larger weights may be thought of as more important to the formation of cycles. Of note, since separate random walks can be performed in parallel, RNBRW weights can be estimated very quickly, even for large graphs. We give simulation results showing that pre-weighting edges through RNBRW may substantially improve the performance of common community detection algorithms. Our results suggest that RNBRW is especially efficient for the challenging case of detecting communities in sparse graphs.
△ Less
Submitted 18 October, 2019; v1 submitted 18 May, 2018;
originally announced May 2018.
-
This robot stinks! Differences between perceived mistreatment of robot and computer partners
Authors:
Zachary Carlson,
Louise Lemmon,
MacCallister Higgins,
David Frank,
David Feil-Seifer
Abstract:
Robots (and computers) are increasingly being used in scenarios where they interact socially with people. How people react to these agents is telling about the perceived animacy of such agents. Mistreatment of robots (or computers) by co-workers might provoke such telling reactions. The purpose of this study was to discover if people perceived mistreatment directed towards a robot any differently…
▽ More
Robots (and computers) are increasingly being used in scenarios where they interact socially with people. How people react to these agents is telling about the perceived animacy of such agents. Mistreatment of robots (or computers) by co-workers might provoke such telling reactions. The purpose of this study was to discover if people perceived mistreatment directed towards a robot any differently than toward a computer. This will provide some understanding of how people perceive robots in collaborative social settings.
We conducted a between-subjects study with 80 participants. Participants worked cooperatively with either a robot or a computer which acted as the "recorder" for the group. A confederate either acted aggressively or neutrally towards the "recorder." We hypothesized that people would not socially accept mistreatment towards an agent that they felt was intelligent and similar to themselves; that participants would perceive the robot as more similar in appearance and emotional capability to themselves than a computer; and would observe more mistreatment. The final results supported our hypothesis; the participants observed mistreatment in the robot, but not the computer. Participants felt significantly more sympathetic towards the robot and also believed that it was much more emotionally capable.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Authors:
Matthew Dunn,
Levent Sagun,
Mike Higgins,
V. Ugur Guney,
Volkan Cirik,
Kyunghyun Cho
Abstract:
We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing qu…
▽ More
We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing question-answer pair, crawled from J! Archive, and augment it with text snippets retrieved by Google. Following this approach, we built SearchQA, which consists of more than 140k question-answer pairs with each pair having 49.6 snippets on average. Each question-answer-context tuple of the SearchQA comes with additional meta-data such as the snippet's URL, which we believe will be valuable resources for future research. We conduct human evaluation as well as test two baseline methods, one simple word selection and the other deep learning based, on the SearchQA. We show that there is a meaningful gap between the human and machine performances. This suggests that the proposed dataset could well serve as a benchmark for question-answering.
△ Less
Submitted 11 June, 2017; v1 submitted 17 April, 2017;
originally announced April 2017.
-
NAIVE: A Method for Representing Uncertainty and Temporal Relationships in an Automated Reasoner
Authors:
Michael C. Higgins
Abstract:
This paper describes NAIVE, a low-level knowledge representation language and inferencing process. NAIVE has been designed for reasoning about nondeterministic dynamic systems like those found in medicine. Knowledge is represented in a graph structure consisting of nodes, which correspond to the variables describing the system of interest, and arcs, which correspond to the procedures used to infer…
▽ More
This paper describes NAIVE, a low-level knowledge representation language and inferencing process. NAIVE has been designed for reasoning about nondeterministic dynamic systems like those found in medicine. Knowledge is represented in a graph structure consisting of nodes, which correspond to the variables describing the system of interest, and arcs, which correspond to the procedures used to infer the value of a variable from the values of other variables. The value of a variable can be determined at an instant in time, over a time interval or for a series of times. Information about the value of a variable is expressed as a probability density function which quantifies the likelihood of each possible value. The inferencing process uses these probability density functions to propagate uncertainty. NAIVE has been used to develop medical knowledge bases including over 100 variables.
△ Less
Submitted 27 March, 2013;
originally announced April 2013.