-
Evaluation of Temporal Change in IR Test Collections
Authors:
Jüri Keller,
Timo Breuer,
Philipp Schaer
Abstract:
Information retrieval systems have been evaluated using the Cranfield paradigm for many years. This paradigm allows a systematic, fair, and reproducible evaluation of different retrieval methods in fixed experimental environments. However, real-world retrieval systems must cope with dynamic environments and temporal changes that affect the document collection, topical trends, and the individual us…
▽ More
Information retrieval systems have been evaluated using the Cranfield paradigm for many years. This paradigm allows a systematic, fair, and reproducible evaluation of different retrieval methods in fixed experimental environments. However, real-world retrieval systems must cope with dynamic environments and temporal changes that affect the document collection, topical trends, and the individual user's perception of what is considered relevant. Yet, the temporal dimension in IR evaluations is still understudied.
To this end, this work investigates how the temporal generalizability of effectiveness evaluations can be assessed. As a conceptual model, we generalize Cranfield-type experiments to the temporal context by classifying the change in the essential components according to the create, update, and delete operations of persistent storage known from CRUD. From the different types of change different evaluation scenarios are derived and it is outlined what they imply. Based on these scenarios, renowned state-of-the-art retrieval systems are tested and it is investigated how the retrieval effectiveness changes on different levels of granularity.
We show that the proposed measures can be well adapted to describe the changes in the retrieval results. The experiments conducted confirm that the retrieval effectiveness strongly depends on the evaluation scenario investigated. We find that not only the average retrieval performance of single systems but also the relative system performance are strongly affected by the components that change and to what extent these components changed.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Under pressure: learning-based analog gauge reading in the wild
Authors:
Maurits Reitsma,
Julian Keller,
Kenneth Blomqvist,
Roland Siegwart
Abstract:
We propose an interpretable framework for reading analog gauges that is deployable on real world robotic systems. Our framework splits the reading task into distinct steps, such that we can detect potential failures at each step. Our system needs no prior knowledge of the type of gauge or the range of the scale and is able to extract the units used. We show that our gauge reading algorithm is able…
▽ More
We propose an interpretable framework for reading analog gauges that is deployable on real world robotic systems. Our framework splits the reading task into distinct steps, such that we can detect potential failures at each step. Our system needs no prior knowledge of the type of gauge or the range of the scale and is able to extract the units used. We show that our gauge reading algorithm is able to extract readings with a relative reading error of less than 2%.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Preliminary Results of a Scientometric Analysis of the German Information Retrieval Community 2020-2023
Authors:
Philipp Schaer,
Svetlana Myshkina,
Jüri Keller
Abstract:
The German Information Retrieval community is located in two different sub-fields: Information and computer science. There are no current studies that investigate these communities on a scientometric level. Available studies only focus on the information scientific part of the community. We generated a data set of 401 recent IR-related publications extracted from six core IR conferences from a mai…
▽ More
The German Information Retrieval community is located in two different sub-fields: Information and computer science. There are no current studies that investigate these communities on a scientometric level. Available studies only focus on the information scientific part of the community. We generated a data set of 401 recent IR-related publications extracted from six core IR conferences from a mainly computer scientific background. We analyze this data set at the institutional and researcher level. The data set is publicly released, and we also demonstrate a map** use case.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Evaluating Temporal Persistence Using Replicability Measures
Authors:
Jüri Keller,
Timo Breuer,
Philipp Schaer
Abstract:
In real-world Information Retrieval (IR) experiments, the Evaluation Environment (EE) is exposed to constant change. Documents are added, removed, or updated, and the information need and the search behavior of users is evolving. Simultaneously, IR systems are expected to retain a consistent quality. The LongEval Lab seeks to investigate the longitudinal persistence of IR systems, and in this work…
▽ More
In real-world Information Retrieval (IR) experiments, the Evaluation Environment (EE) is exposed to constant change. Documents are added, removed, or updated, and the information need and the search behavior of users is evolving. Simultaneously, IR systems are expected to retain a consistent quality. The LongEval Lab seeks to investigate the longitudinal persistence of IR systems, and in this work, we describe our participation. We submitted runs of five advanced retrieval systems, namely a Reciprocal Rank Fusion (RRF) approach, ColBERT, monoT5, Doc2Query, and E5, to both sub-tasks. Further, we cast the longitudinal evaluation as a replicability study to better understand the temporal change observed. As a result, we quantify the persistence of the submitted runs and see great potential in this evaluation method.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Multi-Robot Multi-Room Exploration with Geometric Cue Extraction and Circular Decomposition
Authors:
Seungchan Kim,
Micah Corah,
John Keller,
Graeme Best,
Sebastian Scherer
Abstract:
This work proposes an autonomous multi-robot exploration pipeline that coordinates the behaviors of robots in an indoor environment composed of multiple rooms. Contrary to simple frontier-based exploration approaches, we aim to enable robots to methodically explore and observe an unknown set of rooms in a structured building, kee** track of which rooms are already explored and sharing this infor…
▽ More
This work proposes an autonomous multi-robot exploration pipeline that coordinates the behaviors of robots in an indoor environment composed of multiple rooms. Contrary to simple frontier-based exploration approaches, we aim to enable robots to methodically explore and observe an unknown set of rooms in a structured building, kee** track of which rooms are already explored and sharing this information among robots to coordinate their behaviors in a distributed manner. To this end, we propose (1) a geometric cue extraction method that processes 3D point cloud data and detects the locations of potential cues such as doors and rooms, (2) a circular decomposition for free spaces used for target assignment. Using these two components, our pipeline effectively assigns tasks among robots, and enables a methodical exploration of rooms. We evaluate the performance of our pipeline using a team of up to 3 aerial robots, and show that our method outperforms the baseline by 33.4% in simulation and 26.4% in real-world experiments.
△ Less
Submitted 4 December, 2023; v1 submitted 27 July, 2023;
originally announced July 2023.
-
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
Authors:
Shibo Zhao,
Yuanjun Gao,
Tianhao Wu,
Damanpreet Singh,
Rushan Jiang,
Haoxiang Sun,
Mansi Sarawata,
Yuheng Qiu,
Warren Whittaker,
Ian Higgins,
Yi Du,
Shaoshu Su,
Can Xu,
John Keller,
Jay Karhade,
Lucas Nogueira,
Sourojit Saha,
Ji Zhang,
Wenshan Wang,
Chen Wang,
Sebastian Scherer
Abstract:
Simultaneous localization and map** (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessi…
▽ More
Simultaneous localization and map** (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessing robustness. This limitation significantly restricts the scalability and generalizability of SLAM technologies, impacting their development, validation, and deployment. To address this problem, we present SubT-MRS, an extremely challenging real-world dataset designed to push SLAM towards all-weather environments to pursue the most robust SLAM performance. It contains multi-degraded environments including over 30 diverse scenes such as structureless corridors, varying lighting conditions, and perceptual obscurants like smoke and dust; multimodal sensors such as LiDAR, fisheye camera, IMU, and thermal camera; and multiple locomotions like aerial, legged, and wheeled robots. We develop accuracy and robustness evaluation tracks for SLAM and introduced novel robustness metrics. Comprehensive studies are performed, revealing new observations, challenges, and opportunities for future research.
△ Less
Submitted 29 May, 2024; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation
Authors:
Marcelo Jacinto,
João Pinto,
Jay Patrikar,
John Keller,
Rita Cunha,
Sebastian Scherer,
António Pascoal
Abstract:
Develo** and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extensio…
▽ More
Develo** and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extension that enables real-time simulation of multiple multirotor vehicles in photo-realistic environments, while providing out-of-the-box integration with the widely adopted PX4-Autopilot and ROS2 through its modular implementation and intuitive graphical user interface. To demonstrate some of its capabilities, a nonlinear controller was implemented and simulation results for two drones performing aggressive flight maneuvers are presented. Code and documentation for this framework are also provided as supplementary material.
△ Less
Submitted 15 April, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Automated Statement Extraction from Press Briefings
Authors:
Jüri Keller,
Meik Bittkowski,
Philipp Schaer
Abstract:
Scientific press briefings are a valuable information source. They consist of alternating expert speeches, questions from the audience and their answers. Therefore, they can contribute to scientific and fact-based media coverage. Even though press briefings are highly informative, extracting statements relevant to individual journalistic tasks is challenging and time-consuming. To support this tas…
▽ More
Scientific press briefings are a valuable information source. They consist of alternating expert speeches, questions from the audience and their answers. Therefore, they can contribute to scientific and fact-based media coverage. Even though press briefings are highly informative, extracting statements relevant to individual journalistic tasks is challenging and time-consuming. To support this task, an automated statement extraction system is proposed. Claims are used as the main feature to identify statements in press briefing transcripts. The statement extraction task is formulated as a four-step procedure. First, the press briefings are split into sentences and passages, then claim sentences are identified through sequence classification. Subsequently, topics are detected, and the sentences are filtered to improve the coherence and assess the length of the statements.
The results indicate that claim detection can be used to identify statements in press briefings. While many statements can be extracted automatically with this system, they are not always as coherent as needed to be understood without context and may need further review by knowledgeable persons.
△ Less
Submitted 24 February, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
DYST (Did You See That?): An Amplified Covert Channel That Points To Previously Seen Data
Authors:
Steffen Wendzel,
Tobias Schmidbauer,
Sebastian Zillien,
Jörg Keller
Abstract:
Covert channels are stealthy communication channels that enable manifold adversary and legitimate scenarios, ranging from malware communications to the exchange of confidential information by journalists and censorship circumvention. We introduce a new class of covert channels that we call history covert channels. We further present a new paradigm: covert channel amplification. All covert channels…
▽ More
Covert channels are stealthy communication channels that enable manifold adversary and legitimate scenarios, ranging from malware communications to the exchange of confidential information by journalists and censorship circumvention. We introduce a new class of covert channels that we call history covert channels. We further present a new paradigm: covert channel amplification. All covert channels described until now need to craft seemingly legitimate flows or need to modify third-party flows, mimicking unsuspicious behavior. In contrast, history covert channels can communicate by pointing to unaltered legitimate traffic created by regular network nodes. Only a negligible fraction of the covert communication process requires the transfer of covert information by the covert channel's sender. This information can be sent through different protocols/channels. Our approach allows an amplification of the covert channel's message size, i.e., minimizing the fraction of actually transferred secret data by a covert channel's sender in relation to the overall secret data being exchanged. Further, we extend the current taxonomy for covert channels to show how history channels can be categorized. We describe multiple scenarios in which history covert channels can be realized, analyze the characteristics of these channels, and show how their configuration can be optimized.
△ Less
Submitted 7 June, 2024; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Evaluating Research Dataset Recommendations in a Living Lab
Authors:
Jüri Keller,
Leon Paul Mondrian Munz
Abstract:
The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scient…
▽ More
The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scientific publication searches. We formulated the recommendation task as a retrieval problem by focussing on broad similarities between research datasets and scientific publications. In a multistage approach, initial recommendations were retrieved by the BM25 ranking function and dynamic queries. Subsequently, the initial ranking was re-ranked utilizing click feedback and document embeddings. The proposed system was evaluated live on real user interaction data using the STELLA infrastructure in the LiLAS Lab at CLEF 2021. Our experimental system could efficiently be fine-tuned before the live evaluation by pre-testing the system with a pseudo test collection based on prior user interaction data from the live system. The results indicate that the experimental system outperforms the other participating systems.
△ Less
Submitted 30 September, 2022; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Histogram Layers for Synthetic Aperture Sonar Imagery
Authors:
Joshua Peeples,
Alina Zare,
Jeffrey Dale,
James Keller
Abstract:
Synthetic aperture sonar (SAS) imagery is crucial for several applications, including target recognition and environmental segmentation. Deep learning models have led to much success in SAS analysis; however, the features extracted by these approaches may not be suitable for capturing certain textural information. To address this problem, we present a novel application of histogram layers on SAS i…
▽ More
Synthetic aperture sonar (SAS) imagery is crucial for several applications, including target recognition and environmental segmentation. Deep learning models have led to much success in SAS analysis; however, the features extracted by these approaches may not be suitable for capturing certain textural information. To address this problem, we present a novel application of histogram layers on SAS imagery. The addition of histogram layer(s) within the deep learning models improved performance by incorporating statistical texture information on both synthetic and real-world datasets.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
ir_metadata: An Extensible Metadata Schema for IR Experiments
Authors:
Timo Breuer,
Jüri Keller,
Philipp Schaer
Abstract:
The information retrieval (IR) community has a strong tradition of making the computational artifacts and resources available for future reuse, allowing the validation of experimental results. Besides the actual test collections, the underlying run files are often hosted in data archives as part of conferences like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide much info…
▽ More
The information retrieval (IR) community has a strong tradition of making the computational artifacts and resources available for future reuse, allowing the validation of experimental results. Besides the actual test collections, the underlying run files are often hosted in data archives as part of conferences like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide much information about the underlying experiment. For instance, the single run file is not of much use without the context of the shared task's website or the run data archive. In other domains, like the social sciences, it is good practice to annotate research data with metadata. In this work, we introduce ir_metadata - an extensible metadata schema for TREC run files based on the PRIMAD model. We propose to align the metadata annotations to PRIMAD, which considers components of computational experiments that can affect reproducibility. Furthermore, we outline important components and information that should be reported in the metadata and give evidence from the literature. To demonstrate the usefulness of these metadata annotations, we implement new features in repro_eval that support the outlined metadata schema for the use case of reproducibility studies. Additionally, we curate a dataset with run files derived from experiments with different instantiations of PRIMAD components and annotate these with the corresponding metadata. In the experiments, we cover reproducibility experiments that are identified by the metadata and classified by PRIMAD. With this work, we enable IR researchers to annotate TREC run files and improve the reuse value of experimental artifacts even further.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments
Authors:
Yafei Hu,
Junyi Geng,
Chen Wang,
John Keller,
Sebastian Scherer
Abstract:
Autonomous exploration has many important applications. However, classic information gain-based or frontier-based exploration only relies on the robot current state to determine the immediate exploration goal, which lacks the capability of predicting the value of future states and thus leads to inefficient exploration decisions. This paper presents a method to learn how "good" states are, measured…
▽ More
Autonomous exploration has many important applications. However, classic information gain-based or frontier-based exploration only relies on the robot current state to determine the immediate exploration goal, which lacks the capability of predicting the value of future states and thus leads to inefficient exploration decisions. This paper presents a method to learn how "good" states are, measured by the state value function, to provide a guidance for robot exploration in real-world challenging environments. We formulate our work as an off-policy evaluation (OPE) problem for robot exploration (OPERE). It consists of offline Monte-Carlo training on real-world data and performs Temporal Difference (TD) online adaptation to optimize the trained value estimator. We also design an intrinsic reward function based on sensor information coverage to enable the robot to gain more information with sparse extrinsic rewards. Results show that our method enables the robot to predict the value of future states so as to better guide robot exploration. The proposed algorithm achieves better prediction and exploration performance compared with the state-of-the-arts. To the best of our knowledge, this work for the first time demonstrates value function prediction on real-world dataset for robot exploration in challenging subterranean and urban environments. More details and demo videos can be found at https://jeffreyyh.github.io/opere/.
△ Less
Submitted 24 May, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Possibilistic Fuzzy Local Information C-Means with Automated Feature Selection for Seafloor Segmentation
Authors:
Joshua Peeples,
Daniel Suen,
Alina Zare,
James Keller
Abstract:
The Possibilistic Fuzzy Local Information C-Means (PFLICM) method is presented as a technique to segment side-look synthetic aperture sonar (SAS) imagery into distinct regions of the sea-floor. In this work, we investigate and present the results of an automated feature selection approach for SAS image segmentation. The chosen features and resulting segmentation from the image will be assessed bas…
▽ More
The Possibilistic Fuzzy Local Information C-Means (PFLICM) method is presented as a technique to segment side-look synthetic aperture sonar (SAS) imagery into distinct regions of the sea-floor. In this work, we investigate and present the results of an automated feature selection approach for SAS image segmentation. The chosen features and resulting segmentation from the image will be assessed based on a select quantitative clustering validity criterion and the subset of the features that reach a desired threshold will be used for the segmentation process.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
A Revised Taxonomy of Steganography Embedding Patterns
Authors:
Steffen Wendzel,
Luca Caviglione,
Wojciech Mazurczyk,
Aleksandra Mileva,
Jana Dittmann,
Christian Krätzer,
Kevin Lamshöft,
Claus Vielhauer,
Laura Hartmann,
Jörg Keller,
Tom Neubert
Abstract:
Steganography embraces several hiding techniques which spawn across multiple domains. However, the related terminology is not unified among the different domains, such as digital media steganography, text steganography, cyber-physical systems steganography, network steganography (network covert channels), local covert channels, and out-of-band covert channels. To cope with this, a prime attempt ha…
▽ More
Steganography embraces several hiding techniques which spawn across multiple domains. However, the related terminology is not unified among the different domains, such as digital media steganography, text steganography, cyber-physical systems steganography, network steganography (network covert channels), local covert channels, and out-of-band covert channels. To cope with this, a prime attempt has been done in 2015, with the introduction of the so-called hiding patterns, which allow to describe hiding techniques in a more abstract manner. Despite significant enhancements, the main limitation of such a taxonomy is that it only considers the case of network steganography.
Therefore, this paper reviews both the terminology and the taxonomy of hiding patterns as to make them more general. Specifically, hiding patterns are split into those that describe the embedding and the representation of hidden data within the cover object.
As a first research action, we focus on embedding hiding patterns and we show how they can be applied to multiple domains of steganography instead of being limited to the network scenario. Additionally, we exemplify representation patterns using network steganography. Our pattern collection is available under https://patterns.ztt.hs-worms.de.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Particle-Based Assembly Using Precise Global Control
Authors:
Jakob Keller,
Christian Rieck,
Christian Scheffer,
Arne Schmidt
Abstract:
In micro- and nano-scale systems, particles can be moved by using an external force like gravity or a magnetic field. In the presence of adhesive particles that can attach to each other, the challenge is to decide whether a shape is constructible. Previous work provides a class of shapes for which constructibility can be decided efficiently when particles move maximally into the same direction ind…
▽ More
In micro- and nano-scale systems, particles can be moved by using an external force like gravity or a magnetic field. In the presence of adhesive particles that can attach to each other, the challenge is to decide whether a shape is constructible. Previous work provides a class of shapes for which constructibility can be decided efficiently when particles move maximally into the same direction induced by a global signal.
In this paper we consider the single step model, i.e., a model in which each particle moves one unit step into the given direction. We restrict the assembly process such that at each single time step actually one particle is added to and moved within the workspace. We prove that deciding constructibility is NP-complete for three-dimensional shapes, and that a maximum constructible shape can be approximated. The same approximation algorithm applies for 2D. We further present linear-time algorithms to decide whether or not a tree-shape in 2D or 3D is constructible. Scaling a shape yields constructibility; in particular we show that the $2$-scaled copy of every non-degenerate polyomino is constructible. In the three-dimensional setting we show that the $3$-scaled copy of every non-degenerate polycube is constructible.
△ Less
Submitted 15 June, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Graph-Based Topological Exploration Planning in Large-Scale 3D Environments
Authors:
Fan Yang,
Dung-Han Lee,
John Keller,
Sebastian Scherer
Abstract:
Currently, state-of-the-art exploration methods maintain high-resolution map representations in order to optimize exploration goals in each step that maximizes information gain. However, during exploring, those "optimal" selections could quickly become obsolete due to the influx of new information, especially in large-scale environments, and result in high-frequency re-planning that hinders the ov…
▽ More
Currently, state-of-the-art exploration methods maintain high-resolution map representations in order to optimize exploration goals in each step that maximizes information gain. However, during exploring, those "optimal" selections could quickly become obsolete due to the influx of new information, especially in large-scale environments, and result in high-frequency re-planning that hinders the overall exploration efficiency. In this paper, we propose a graph-based topological planning framework, building a sparse topological map in three-dimensional (3D) space to guide exploration steps with high-level intents so as to render consistent exploration maneuvers. Specifically, this work presents a novel method to estimate 3D space's geometry with convex polyhedrons. Then, the geometry information is utilized to group space into distinctive regions. And those regions are added as nodes into the topological map, directing the exploration process. We compared our method with the state-of-the-art in simulated environments. The proposed method achieves higher space coverage and outperforms exploration efficiency by more than 40% during experiments. Finally, a field experiment was conducted to further evaluate the applicability of our method to empower efficient and robust exploration in real-world environments.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
The Weakly-Labeled Rand Index
Authors:
Dylan Stewart,
Anna Hampton,
Alina Zare,
Jeff Dale,
James Keller
Abstract:
Synthetic Aperture Sonar (SAS) surveys produce imagery with large regions of transition between seabed types. Due to these regions, it is difficult to label and segment the imagery and, furthermore, challenging to score the image segmentations appropriately. While there are many approaches to quantify performance in standard crisp segmentation schemes, drawing hard boundaries in remote sensing ima…
▽ More
Synthetic Aperture Sonar (SAS) surveys produce imagery with large regions of transition between seabed types. Due to these regions, it is difficult to label and segment the imagery and, furthermore, challenging to score the image segmentations appropriately. While there are many approaches to quantify performance in standard crisp segmentation schemes, drawing hard boundaries in remote sensing imagery where gradients and regions of uncertainty exist is inappropriate. These cases warrant weak labels and an associated appropriate scoring approach. In this paper, a labeling approach and associated modified version of the Rand index for weakly-labeled data is introduced to address these issues. Results are evaluated with the new index and compared to traditional segmentation evaluation methods. Experimental results on a SAS data set containing must-link and cannot-link labels show that our Weakly-Labeled Rand index scores segmentations appropriately in reference to qualitative performance and is more suitable than traditional quantitative metrics for scoring weakly-labeled data.
△ Less
Submitted 8 March, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Countering Adaptive Network Covert Communication with Dynamic Wardens
Authors:
Wojciech Mazurczyk,
Steffen Wendzel,
Mehdi Chourib,
Jörg Keller
Abstract:
Network covert channels are hidden communication channels in computer networks. They influence several factors of the cybersecurity economy. For instance, by improving the stealthiness of botnet communications, they aid and preserve the value of darknet botnet sales. Covert channels can also be used to secretly exfiltrate confidential data out of organizations, potentially resulting in loss of mar…
▽ More
Network covert channels are hidden communication channels in computer networks. They influence several factors of the cybersecurity economy. For instance, by improving the stealthiness of botnet communications, they aid and preserve the value of darknet botnet sales. Covert channels can also be used to secretly exfiltrate confidential data out of organizations, potentially resulting in loss of market/research advantage. Considering the above, efforts are needed to develop effective countermeasures against such threats. Thus in this paper, based on the introduced novel warden taxonomy, we present and evaluate a new concept of a dynamic warden. Its main novelty lies in the modification of the warden's behavior over time, making it difficult for the adaptive covert communication parties to infer its strategy and perform a successful hidden data exchange. Obtained experimental results indicate the effectiveness of the proposed approach.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Explainable Systematic Analysis for Synthetic Aperture Sonar Imagery
Authors:
Sarah Walker,
Joshua Peeples,
Jeff Dale,
James Keller,
Alina Zare
Abstract:
In this work, we present an in-depth and systematic analysis using tools such as local interpretable model-agnostic explanations (LIME) (arXiv:1602.04938) and divergence measures to analyze what changes lead to improvement in performance in fine tuned models for synthetic aperture sonar (SAS) data. We examine the sensitivity to factors in the fine tuning process such as class imbalance. Our findin…
▽ More
In this work, we present an in-depth and systematic analysis using tools such as local interpretable model-agnostic explanations (LIME) (arXiv:1602.04938) and divergence measures to analyze what changes lead to improvement in performance in fine tuned models for synthetic aperture sonar (SAS) data. We examine the sensitivity to factors in the fine tuning process such as class imbalance. Our findings show not only an improvement in seafloor texture classification, but also provide greater insight into what features play critical roles in improving performance as well as a knowledge of the importance of balanced data for fine tuning deep learning models for seafloor classification in SAS imagery.
△ Less
Submitted 16 March, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
Divergence Regulated Encoder Network for Joint Dimensionality Reduction and Classification
Authors:
Joshua Peeples,
Sarah Walker,
Connor McCurley,
Alina Zare,
James Keller,
Weihuang Xu
Abstract:
Feature representation is an important aspect of remote-sensing based image classification. While deep convolutional neural networks are able to effectively amalgamate information, large numbers of parameters often make learned features inscrutable and difficult to transfer to alternative models. In order to better represent statistical texture information for remote-sensing image classification,…
▽ More
Feature representation is an important aspect of remote-sensing based image classification. While deep convolutional neural networks are able to effectively amalgamate information, large numbers of parameters often make learned features inscrutable and difficult to transfer to alternative models. In order to better represent statistical texture information for remote-sensing image classification, in this paper, we investigate performing joint dimensionality reduction and classification using a novel histogram neural network. Motivated by a popular dimensionality reduction approach, t-Distributed Stochastic Neighbor Embedding (t-SNE), our proposed method incorporates a classification loss computed on samples in a low-dimensional embedding space. We compare the learned sample embeddings against coordinates found by t-SNE in terms of classification accuracy and qualitative assessment. We also explore use of various divergence measures in the t-SNE objective. The proposed method has several advantages such as readily embedding out-of-sample points and reducing feature dimensionality while retaining class discriminability. Our results show that the proposed approach maintains and/or improves classification performance and reveals characteristics of features produced by neural networks that may be helpful for other applications.
△ Less
Submitted 3 March, 2022; v1 submitted 31 December, 2020;
originally announced December 2020.
-
StreamSoNG: A Soft Streaming Classification Approach
Authors:
Wenlong Wu,
James M. Keller,
Jeffrey Dale,
James C. Bezdek
Abstract:
Examining most streaming clustering algorithms leads to the understanding that they are actually incremental classification models. They model existing and newly discovered structures via summary information that we call footprints. Incoming data is normally assigned a crisp label (into one of the structures) and that structure's footprint is incrementally updated. There is no reason that these as…
▽ More
Examining most streaming clustering algorithms leads to the understanding that they are actually incremental classification models. They model existing and newly discovered structures via summary information that we call footprints. Incoming data is normally assigned a crisp label (into one of the structures) and that structure's footprint is incrementally updated. There is no reason that these assignments need to be crisp. In this paper, we propose a new streaming classification algorithm that uses Neural Gas prototypes as footprints and produces a possibilistic label vector (of typicalities) for each incoming vector. These typicalities are generated by a modified possibilistic k-nearest neighbor algorithm. The approach is tested on synthetic and real image datasets. We compare our approach to three other streaming classifiers based on the Adaptive Random Forest, Very Fast Decision Rules, and the DenStream algorithm with excellent results.
△ Less
Submitted 13 July, 2021; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Influence of Incremental Constraints on Energy Consumption and Static Scheduling Time for Moldable Tasks with Deadline
Authors:
Jörg Keller,
Sebastian Litzinger
Abstract:
Static scheduling of independent, moldable tasks on parallel machines with frequency scaling comprises decisions on core allocation, assignment, frequency scaling and ordering, to meet a deadline and minimize energy consumption. Constraining some of these decisions reduces the solution space, i.e. may increase energy consumption, but may also reduce scheduling time or give the chance to tackle lar…
▽ More
Static scheduling of independent, moldable tasks on parallel machines with frequency scaling comprises decisions on core allocation, assignment, frequency scaling and ordering, to meet a deadline and minimize energy consumption. Constraining some of these decisions reduces the solution space, i.e. may increase energy consumption, but may also reduce scheduling time or give the chance to tackle larger task sets. We investigate the influence of different constraints that lead from an unrestricted scheduler via two intermediate steps to the crown scheduler, by presenting integer linear programs for all four schedulers. We compare scheduling time and energy consumption for a benchmark suite of synthetic task sets of different sizes. Our results indicate that the final step towards the crown scheduler -- the execution order constraint -- is responsible for faster scheduling when task sets are small, and lower energy consumption when we deal with large task sets.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Extending the Morphological Hit-or-Miss Transform to Deep Neural Networks
Authors:
Muhammad Aminul Islam,
Bryce Murray,
Andrew Buck,
Derek T. Anderson,
Grant Scott,
Mihail Popescu,
James Keller
Abstract:
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an ima…
▽ More
While most deep learning architectures are built on convolution, alternative foundations like morphology are being explored for purposes like interpretability and its connection to the analysis and processing of geometric structures. The morphological hit-or-miss operation has the advantage that it takes into account both foreground and background information when evaluating target shape in an image. Herein, we identify limitations in existing hit-or-miss neural definitions and we formulate an optimization problem to learn the transform relative to deeper architectures. To this end, we model the semantically important condition that the intersection of the hit and miss structuring elements (SEs) should be empty and we present a way to express Don't Care (DNC), which is important for denoting regions of an SE that are not relevant to detecting a target pattern. Our analysis shows that convolution, in fact, acts like a hit-miss transform through semantic interpretation of its filter differences. On these premises, we introduce an extension that outperforms conventional convolution on benchmark data. Quantitative experiments are provided on synthetic and benchmark data, showing that the direct encoding hit-or-miss transform provides better interpretability on learned shapes consistent with objects whereas our morphologically inspired generalized convolution yields higher classification accuracy. Last, qualitative hit and miss filter visualizations are provided relative to single morphological layer.
△ Less
Submitted 27 September, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
A Stereo Algorithm for Thin Obstacles and Reflective Objects
Authors:
John Keller,
Sebastian Scherer
Abstract:
Stereo cameras are a popular choice for obstacle avoidance for outdoor lighweight, low-cost robotics applications. However, they are unable to sense thin and reflective objects well. Currently, many algorithms are tuned to perform well on indoor scenes like the Middlebury dataset. When navigating outdoors, reflective objects, like windows and glass, and thin obstacles, like wires, are not well han…
▽ More
Stereo cameras are a popular choice for obstacle avoidance for outdoor lighweight, low-cost robotics applications. However, they are unable to sense thin and reflective objects well. Currently, many algorithms are tuned to perform well on indoor scenes like the Middlebury dataset. When navigating outdoors, reflective objects, like windows and glass, and thin obstacles, like wires, are not well handled by most stereo disparity algorithms. Reflections, repeating patterns and objects parallel to the cameras' baseline causes mismatches between image pairs which leads to bad disparity estimates. Thin obstacles are difficult for many sliding window based disparity methods to detect because they do not take up large portions of the pixels in the sliding window. We use a trinocular camera setup and micropolarizer camera capable of detecting reflective objects to overcome these issues. We present a hierarchical disparity algorithm that reduces noise, separately identify wires using semantic object triangulation in three images, and use information about the polarization of light to estimate the disparity of reflective objects. We evaluate our approach on outdoor data that we collected. Our method contained an average of 9.27% of bad pixels compared to a typical stereo algorithm's 18.4% of bad pixels in scenes containing reflective objects. Our trinocular and semantic wire disparity methods detected 53% of wire pixels, whereas a typical two camera stereo algorithm detected 5%.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Enabling Explainable Fusion in Deep Learning with Fuzzy Integral Neural Networks
Authors:
Muhammad Aminul Islam,
Derek T. Anderson,
Anthony J. Pinar,
Timothy C. Havens,
Grant Scott,
James M. Keller
Abstract:
Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the…
▽ More
Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the same revolution. Specifically, most neural fusion approaches are ad hoc, are not understood, are distributed versus localized, and/or explainability is low (if present at all). Herein, we prove that the fuzzy Choquet integral (ChI), a powerful nonlinear aggregation function, can be represented as a multi-layer network, referred to hereafter as ChIMP. We also put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient descent-based optimization in light of the exponential number of ChI inequality constraints. An additional benefit of ChIMP/iChIMP is that it enables eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP is applied to the fusion of a set of heterogeneous architecture deep models in remote sensing. We show an improvement in model accuracy and our previously established XAI indices shed light on the quality of our data, model, and its decisions.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain
Authors:
Tu Bui,
Daniel Cooper,
John Collomosse,
Mark Bell,
Alex Green,
John Sheridan,
Jez Higgins,
Arindra Das,
Jared Keller,
Olivier Thereaux,
Alan Brown
Abstract:
We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the co…
▽ More
We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
MAVNet: an Effective Semantic Segmentation Micro-Network for MAV-based Tasks
Authors:
Ty Nguyen,
Shreyas S. Shivakumar,
Ian D. Miller,
James Keller,
Elijah S. Lee,
Alex Zhou,
Tolga Ozaslan,
Giuseppe Loianno,
Joseph H. Harwood,
Jennifer Wozencraft,
Camillo J. Taylor,
Vijay Kumar
Abstract:
Real-time semantic image segmentation on platforms subject to size, weight and power (SWaP) constraints is a key area of interest for air surveillance and inspection. In this work, we propose MAVNet: a small, light-weight, deep neural network for real-time semantic segmentation on micro Aerial Vehicles (MAVs). MAVNet, inspired by ERFNet, features 400 times fewer parameters and achieves comparable…
▽ More
Real-time semantic image segmentation on platforms subject to size, weight and power (SWaP) constraints is a key area of interest for air surveillance and inspection. In this work, we propose MAVNet: a small, light-weight, deep neural network for real-time semantic segmentation on micro Aerial Vehicles (MAVs). MAVNet, inspired by ERFNet, features 400 times fewer parameters and achieves comparable performance with some reference models in empirical experiments. Our model achieves a trade-off between speed and accuracy, achieving up to 48 FPS on an NVIDIA 1080Ti and 9 FPS on the NVIDIA Jetson Xavier when processing high resolution imagery. Additionally, we provide two novel datasets that represent challenges in semantic segmentation for real-time MAV tracking and infrastructure inspection tasks and verify MAVNet on these datasets. Our algorithm and datasets are made publicly available.
△ Less
Submitted 8 June, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Comparison of Possibilistic Fuzzy Local Information C-Means and Possibilistic K-Nearest Neighbors for Synthetic Aperture Sonar Image Segmentation
Authors:
Joshua Peeples,
Matthew Cook,
Daniel Suen,
Alina Zare,
James Keller
Abstract:
Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic…
▽ More
Synthetic aperture sonar (SAS) imagery can generate high resolution images of the seafloor. Thus, segmentation algorithms can be used to partition the images into different seafloor environments. In this paper, we compare two possibilistic segmentation approaches. Possibilistic approaches allow for the ability to detect novel or outlier environments as well as well known classes. The Possibilistic Fuzzy Local Information C-Means (PFLICM) algorithm has been previously applied to segment SAS imagery. Additionally, the Possibilistic K-Nearest Neighbors (PKNN) algorithm has been used in other domains such as landmine detection and hyperspectral imagery. In this paper, we compare the segmentation performance of a semi-supervised approach using PFLICM and a supervised method using Possibilistic K-NN. We include final segmentation results on multiple SAS images and a quantitative assessment of each algorithm.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
U-Net for MAV-based Penstock Inspection: an Investigation of Focal Loss in Multi-class Segmentation for Corrosion Identification
Authors:
Ty Nguyen,
Tolga Ozaslan,
Ian D. Miller,
James Keller,
Giuseppe Loianno,
Camillo J. Taylor,
Daniel D. Lee,
Vijay Kumar,
Joseph H. Harwood,
Jennifer Wozencraft
Abstract:
Periodical inspection and maintenance of critical infrastructure such as dams, penstocks, and locks are of significant importance to prevent catastrophic failures. Conventional manual inspection methods require inspectors to climb along a penstock to spot corrosion, rust and crack formation which is unsafe, labor-intensive, and requires intensive training. This work presents an alternative approac…
▽ More
Periodical inspection and maintenance of critical infrastructure such as dams, penstocks, and locks are of significant importance to prevent catastrophic failures. Conventional manual inspection methods require inspectors to climb along a penstock to spot corrosion, rust and crack formation which is unsafe, labor-intensive, and requires intensive training. This work presents an alternative approach using a Micro Aerial Vehicle (MAV) that autonomously flies to collect imagery which is then fed into a pretrained deep-learning model to identify corrosion. Our simplified U-Net trained with less than 40 image samples can do inference at 12 fps on a single GPU. We analyze different loss functions to solve the class imbalance problem, followed by a discussion on choosing proper metrics and weights for object classes. Results obtained with the dataset collected from Center Hill Dam, TN show that focal loss function, combined with a proper set of class weights yield better segmentation results than the base loss, Softmax cross entropy. Our method can be used in combination with planning algorithm to offer a complete, safe and cost-efficient solution to autonomous infrastructure inspection.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Possibilistic Fuzzy Local Information C-Means for Sonar Image Segmentation
Authors:
Alina Zare,
Nicholas Young,
Daniel Suen,
Thomas Nabelek,
Aquila Galusha,
James Keller
Abstract:
Side-look synthetic aperture sonar (SAS) can produce very high quality images of the sea-floor. When viewing this imagery, a human observer can often easily identify various sea-floor textures such as sand ripple, hard-packed sand, sea grass and rock. In this paper, we present the Possibilistic Fuzzy Local Information C-Means (PFLICM) approach to segment SAS imagery into sea-floor regions that exh…
▽ More
Side-look synthetic aperture sonar (SAS) can produce very high quality images of the sea-floor. When viewing this imagery, a human observer can often easily identify various sea-floor textures such as sand ripple, hard-packed sand, sea grass and rock. In this paper, we present the Possibilistic Fuzzy Local Information C-Means (PFLICM) approach to segment SAS imagery into sea-floor regions that exhibit these various natural textures. The proposed PFLICM method incorporates fuzzy and possibilistic clustering methods and leverages (local) spatial information to perform soft segmentation. Results are shown on several SAS scenes and compared to alternative segmentation approaches.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Diverse Large-Scale ITS Dataset Created from Continuous Learning for Real-Time Vehicle Detection
Authors:
Justin A. Eichel,
Akshaya Mishra,
Nicholas Miller,
Nicholas Jankovic,
Mohan A. Thomas,
Tyler Abbott,
Douglas Swanson,
Joel Keller
Abstract:
In traffic engineering, vehicle detectors are trained on limited datasets resulting in poor accuracy when deployed in real world applications. Annotating large-scale high quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud based positive and negative mining (PNM) proces…
▽ More
In traffic engineering, vehicle detectors are trained on limited datasets resulting in poor accuracy when deployed in real world applications. Annotating large-scale high quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud based positive and negative mining (PNM) process and a large-scale learning and evaluation system for the application of traffic event detection. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on $1,000,000$ Haar-like features extracted from $70,000$ annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least $95\%$ for $1/2$ and about $78\%$ for $19/20$ of the time when tested on approximately $7,500,000$ video frames. At the end of 2015, the dataset is expect to have over one billion annotated video frames.
△ Less
Submitted 7 October, 2015;
originally announced October 2015.
-
A Case Study on Covert Channel Establishment via Software Caches in High-Assurance Computing Systems
Authors:
Wolfgang Schmidt,
Michael Hanspach,
Jörg Keller
Abstract:
Covert channels can be utilized to secretly deliver information from high privileged processes to low privileged processes in the context of a high-assurance computing system. In this case study, we investigate the possibility of covert channel establishment via software caches in the context of a framework for component-based operating systems. While component-based operating systems offer securi…
▽ More
Covert channels can be utilized to secretly deliver information from high privileged processes to low privileged processes in the context of a high-assurance computing system. In this case study, we investigate the possibility of covert channel establishment via software caches in the context of a framework for component-based operating systems. While component-based operating systems offer security through the encapsulation of system service processes, complete isolation of these processes is not reasonably feasible. This limitation is practically demonstrated with our concept of a specific covert timing channel based on file system caching. The stability of the covert channel is evaluated and a methodology to disrupt the covert channel transmission is presented. While these kinds of attacks are not limited to high-assurance computing systems, our study practically demonstrates that even security-focused computing systems with a minimal trusted computing base are vulnerable for such kinds of attacks and careful design decisions are necessary for secure operating system architectures.
△ Less
Submitted 21 August, 2015;
originally announced August 2015.
-
Micro protocol engineering for unstructured carriers: On the embedding of steganographic control protocols into audio transmissions
Authors:
Matthias Naumann,
Steffen Wendzel,
Wojciech Mazurczyk,
Jörg Keller
Abstract:
Network steganography conceals the transfer of sensitive information within unobtrusive data in computer networks. So-called micro protocols are communication protocols placed within the payload of a network steganographic transfer. They enrich this transfer with features such as reliability, dynamic overlay routing, or performance optimization --- just to mention a few. We present different desig…
▽ More
Network steganography conceals the transfer of sensitive information within unobtrusive data in computer networks. So-called micro protocols are communication protocols placed within the payload of a network steganographic transfer. They enrich this transfer with features such as reliability, dynamic overlay routing, or performance optimization --- just to mention a few. We present different design approaches for the embedding of hidden channels with micro protocols in digitized audio signals under consideration of different requirements. On the basis of experimental results, our design approaches are compared, and introduced into a protocol engineering approach for micro protocols.
△ Less
Submitted 28 May, 2015;
originally announced May 2015.
-
A Taxonomy for Attack Patterns on Information Flows in Component-Based Operating Systems
Authors:
Michael Hanspach,
Jörg Keller
Abstract:
We present a taxonomy and an algebra for attack patterns on component-based operating systems. In a multilevel security scenario, where isolation of partitions containing data at different security classifications is the primary security goal and security breaches are mainly defined as undesired disclosure or modification of classified data, strict control of information flows is the ultimate goal…
▽ More
We present a taxonomy and an algebra for attack patterns on component-based operating systems. In a multilevel security scenario, where isolation of partitions containing data at different security classifications is the primary security goal and security breaches are mainly defined as undesired disclosure or modification of classified data, strict control of information flows is the ultimate goal. In order to prevent undesired information flows, we provide a classification of information flow types in a component-based operating system and, by this, possible patterns to attack the system. The systematic consideration of informations flows reveals a specific type of operating system covert channel, the covert physical channel, which connects two former isolated partitions by emitting physical signals into the computer's environment and receiving them at another interface.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
Towards a Scalable Dynamic Spatial Database System
Authors:
Joaquín Keller,
Raluca Diaconu,
Mathieu Valero
Abstract:
With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a nove…
▽ More
With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.
△ Less
Submitted 19 November, 2012;
originally announced November 2012.
-
A structural analysis of the A5/1 state transition graph
Authors:
Andreas Beckmann,
Jaroslaw Fedorowicz,
Jörg Keller,
Ulrich Meyer
Abstract:
We describe efficient algorithms to analyze the cycle structure of the graph induced by the state transition function of the A5/1 stream cipher used in GSM mobile phones and report on the results of the implementation. The analysis is performed in five steps utilizing HPC clusters, GPGPU and external memory computation. A great reduction of this huge state transition graph of 2^64 nodes is achieve…
▽ More
We describe efficient algorithms to analyze the cycle structure of the graph induced by the state transition function of the A5/1 stream cipher used in GSM mobile phones and report on the results of the implementation. The analysis is performed in five steps utilizing HPC clusters, GPGPU and external memory computation. A great reduction of this huge state transition graph of 2^64 nodes is achieved by focusing on special nodes in the first step and removing leaf nodes that can be detected with limited effort in the second step. This step does not break the overall structure of the graph and keeps at least one node on every cycle. In the third step the nodes of the reduced graph are connected by weighted edges. Since the number of nodes is still huge an efficient bitslice approach is presented that is implemented with NVIDIA's CUDA framework and executed on several GPUs concurrently. An external memory algorithm based on the STXXL library and its parallel pipelining feature further reduces the graph in the fourth step. The result is a graph containing only cycles that can be further analyzed in internal memory to count the number and size of the cycles. This full analysis which previously would take months can now be completed within a few days and allows to present structural results for the full graph for the first time. The structure of the A5/1 graph deviates notably from the theoretical results for random map**s.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
Evaluation of Authors and Journals
Authors:
Joseph B. Keller
Abstract:
A method is presented for evaluating authors on the basis of citations. It assigns to each author a citation score which depends upon the number of times he is cited, and upon the scores of the citers. The scores are found to be the components of an eigenvector of a normalized citation matrix. The same method can be applied to citation of journals by other journals, to evaluating teams in a leag…
▽ More
A method is presented for evaluating authors on the basis of citations. It assigns to each author a citation score which depends upon the number of times he is cited, and upon the scores of the citers. The scores are found to be the components of an eigenvector of a normalized citation matrix. The same method can be applied to citation of journals by other journals, to evaluating teams in a league [1], etc.
△ Less
Submitted 5 October, 2008;
originally announced October 2008.
-
When being Weak is Brave: Privacy in Recommender Systems
Authors:
Naren Ramakrishnan,
Benjamin J. Keller,
Batul J. Mirza,
Ananth Y. Grama,
George Karypis
Abstract:
We explore the conflict between personalization and privacy that arises from the existence of weak ties. A weak tie is an unexpected connection that provides serendipitous recommendations. However, information about weak ties could be used in conjunction with other sources of data to uncover identities and reveal other personal information. In this article, we use a graph-theoretic model to stud…
▽ More
We explore the conflict between personalization and privacy that arises from the existence of weak ties. A weak tie is an unexpected connection that provides serendipitous recommendations. However, information about weak ties could be used in conjunction with other sources of data to uncover identities and reveal other personal information. In this article, we use a graph-theoretic model to study the benefit and risk from weak ties.
△ Less
Submitted 18 May, 2001;
originally announced May 2001.
-
Evaluating Recommendation Algorithms by Graph Analysis
Authors:
Batul J. Mirza,
Benjamin J. Keller,
Naren Ramakrishnan
Abstract:
We present a novel framework for evaluating recommendation algorithms in terms of the `jumps' that they make to connect people to artifacts. This approach emphasizes reachability via an algorithm within the implicit graph structure underlying a recommender dataset, and serves as a complement to evaluation in terms of predictive accuracy. The framework allows us to consider questions relating alg…
▽ More
We present a novel framework for evaluating recommendation algorithms in terms of the `jumps' that they make to connect people to artifacts. This approach emphasizes reachability via an algorithm within the implicit graph structure underlying a recommender dataset, and serves as a complement to evaluation in terms of predictive accuracy. The framework allows us to consider questions relating algorithmic parameters to properties of the datasets. For instance, given a particular algorithm `jump,' what is the average path length from a person to an artifact? Or, what choices of minimum ratings and jumps maintain a connected graph? We illustrate the approach with a common jump called the `hammock' using movie recommender datasets.
△ Less
Submitted 3 April, 2001;
originally announced April 2001.