-
Reproducibility study of FairAC
Authors:
Gijs de Jong,
Macha J. Meijer,
Derck W. E. Prinzhorn,
Harold Ruiter
Abstract:
This work aims to reproduce the findings of the paper "Fair Attribute Completion on Graph with Missing Attributes" written by Guo, Chu, and Li arXiv:2302.12977 by investigating the claims made in the paper. This paper suggests that the results of the original paper are reproducible and thus, the claims hold. However, the claim that FairAC is a generic framework for many downstream tasks is very br…
▽ More
This work aims to reproduce the findings of the paper "Fair Attribute Completion on Graph with Missing Attributes" written by Guo, Chu, and Li arXiv:2302.12977 by investigating the claims made in the paper. This paper suggests that the results of the original paper are reproducible and thus, the claims hold. However, the claim that FairAC is a generic framework for many downstream tasks is very broad and could therefore only be partially tested. Moreover, we show that FairAC is generalizable to various datasets and sensitive attributes and show evidence that the improvement in group fairness of the FairAC framework does not come at the expense of individual fairness. Lastly, the codebase of FairAC has been refactored and is now easily applicable for various datasets and models.
△ Less
Submitted 10 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Situational Graphs for Robotic First Responders: an application to dismantling drug labs
Authors:
W. J. Meijer,
A. C. Kemmeren,
J. M. van Bruggen,
T. Haije,
J. E. Fransman,
J. D. van Mil
Abstract:
In this work, we support experts in the safety domain with safer dismantling of drug labs, by deploying robots for the initial inspection. Being able to act on the discovered environment is key to enabling this (semi-)autonomous inspection, e.g. to open doors or take a closer at suspicious items. Our approach addresses this with a novel environmental representation, the Behavior-Oriented Situation…
▽ More
In this work, we support experts in the safety domain with safer dismantling of drug labs, by deploying robots for the initial inspection. Being able to act on the discovered environment is key to enabling this (semi-)autonomous inspection, e.g. to open doors or take a closer at suspicious items. Our approach addresses this with a novel environmental representation, the Behavior-Oriented Situational Graph, where we extend on the classical situational graph by merging a perception-driven backbone with prior actionable knowledge via a situational affordance schema. Linking situations to robot behaviors facilitates both autonomous mission planning and situational understanding of the operator. Planning over the graph is easier and faster, since it directly incorporates actionable information, which is critical for online mission systems. Moreover, the representation allows the human operator to seamlessly transition between different levels of autonomy of the robot, from remote control to behavior execution to full autonomous exploration. We test the effectiveness of our approach in a real-world drug lab scenario at a Dutch police training facility using a mobile Spot robot and use the results to iterate on the system design.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Majorana Demonstrator Data Release for AI/ML Applications
Authors:
I. J. Arnquist,
F. T. Avignone III,
A. S. Barabash,
C. J. Barton,
K. H. Bhimani,
E. Blalock,
B. Bos,
M. Busch,
M. Buuck,
T. S. Caldwell,
Y. -D. Chan,
C. D. Christofferson,
P. -H. Chu,
M. L. Clark,
C. Cuesta,
J. A. Detwiler,
Yu. Efremenko,
H. Ejiri,
S. R. Elliott,
N. Fuad,
G. K. Giovanetti,
M. P. Green,
J. Gruszko,
I. S. Guinn,
V. E. Guiseppe
, et al. (35 additional authors not shown)
Abstract:
The enclosed data release consists of a subset of the calibration data from the Majorana Demonstrator experiment. Each Majorana event is accompanied by raw Germanium detector waveforms, pulse shape discrimination cuts, and calibrated final energies, all shared in an HDF5 file format along with relevant metadata. This release is specifically designed to support the training and testing of Artificia…
▽ More
The enclosed data release consists of a subset of the calibration data from the Majorana Demonstrator experiment. Each Majorana event is accompanied by raw Germanium detector waveforms, pulse shape discrimination cuts, and calibrated final energies, all shared in an HDF5 file format along with relevant metadata. This release is specifically designed to support the training and testing of Artificial Intelligence (AI) and Machine Learning (ML) algorithms upon our data. This document is structured as follows. Section I provides an overview of the dataset's content and format; Section II outlines the location of this dataset and the method for accessing it; Section III presents the NPML Machine Learning Challenge associated with this dataset; Section IV contains a disclaimer from the Majorana collaboration regarding the use of this dataset; Appendix A contains technical details of this data release. Please direct questions about the material provided within this release to [email protected] (A. Li).
△ Less
Submitted 14 September, 2023; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Interpretable Boosted Decision Tree Analysis for the Majorana Demonstrator
Authors:
I. J. Arnquist,
F. T. Avignone III,
A. S. Barabash,
C. J. Barton,
K. H. Bhimani,
E. Blalock,
B. Bos,
M. Busch,
M. Buuck,
T. S. Caldwell,
Y -D. Chan,
C. D. Christofferson,
P. -H. Chu,
M. L. Clark,
C. Cuesta,
J. A. Detwiler,
Yu. Efremenko,
S. R. Elliott,
G. K. Giovanetti,
M. P. Green,
J. Gruszko,
I. S. Guinn,
V. E. Guiseppe,
C. R. Haufe,
R. Henning
, et al. (30 additional authors not shown)
Abstract:
The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logi…
▽ More
The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logic, allowing us to learn from the machine to feedback to the traditional analysis. In this work, we have presented the first machine learning analysis of the data from the Majorana Demonstrator; this is also the first interpretable machine learning analysis of any germanium detector experiment. Two gradient boosted decision tree models are trained to learn from the data, and a game-theory-based model interpretability study is conducted to understand the origin of the classification power. By learning from data, this analysis recognizes the correlations among reconstruction parameters to further enhance the background rejection performance. By learning from the machine, this analysis reveals the importance of new background categories to reciprocally benefit the standard Majorana analysis. This model is highly compatible with next-generation germanium detector experiments like LEGEND since it can be simultaneously trained on a large number of detectors.
△ Less
Submitted 15 February, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF
Authors:
H. J. Meijer,
J. Truong,
R. Karimi
Abstract:
Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public available corpuses such as Wikipedia or other news and social media sources. However, these studies are limited to generic text and thus lack technical and scientific n…
▽ More
Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public available corpuses such as Wikipedia or other news and social media sources. However, these studies are limited to generic text and thus lack technical and scientific nuances such as domain specific vocabulary, abbreviations, or scientific formulas which are commonly used in academic context. This research focuses on the performance of word embeddings applied to a large scale academic corpus. More specifically, we compare quality and efficiency of trained word embeddings to TFIDF representations in modeling content of scientific articles. We use a word2vec skip-gram model trained on titles and abstracts of about 70 million scientific articles. Furthermore, we have developed a benchmark to evaluate content models in a scientific context. The benchmark is based on a categorization task that matches articles to journals for about 1.3 million articles published in 2017. Our results show that content models based on word embeddings are better for titles (short text) while TFIDF works better for abstracts (longer text). However, the slight improvement of TFIDF for larger text comes at the expense of 3.7 times more memory requirement as well as up to 184 times higher computation times which may make it inefficient for online applications. In addition, we have created a 2-dimensional visualization of the journals modeled via embeddings to qualitatively inspect embedding model. This graph shows useful insights and can be used to find competitive journals or gaps to propose new journals.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
A shallow residual neural network to predict the visual cortex response
Authors:
Anne-Ruth José Meijer,
Arnoud Visser
Abstract:
Understanding how the visual cortex of the human brain really works is still an open problem for science today. A better understanding of natural intelligence could also benefit object-recognition algorithms based on convolutional neural networks. In this paper we demonstrate the asset of using a shallow residual neural network for this task. The benefit of this approach is that earlier stages of…
▽ More
Understanding how the visual cortex of the human brain really works is still an open problem for science today. A better understanding of natural intelligence could also benefit object-recognition algorithms based on convolutional neural networks. In this paper we demonstrate the asset of using a shallow residual neural network for this task. The benefit of this approach is that earlier stages of the network can be accurately trained, which allows us to add more layers at the earlier stage. With this additional layer the prediction of the visual brain activity improves from $10.4\%$ (block 1) to $15.53\%$ (last fully connected layer). By training the network for more than 10 epochs this improvement can become even larger.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
MattockFS; Page-cache and access-control concerns in asynchronous message-based forensic frameworks on the Linux platform
Authors:
Rob J Meijer
Abstract:
In this dissertation the feasibility of creating a page-cache efficient storage- and messaging solution with integrity geared access control for a scalable forensic framework is researched. The Open Computer Forensics Architecture (OCFA),a lab-side scalable computer forensics framework, introduced the concept of a message passing concurrency based forensic framework. Since then, the amount of per-…
▽ More
In this dissertation the feasibility of creating a page-cache efficient storage- and messaging solution with integrity geared access control for a scalable forensic framework is researched. The Open Computer Forensics Architecture (OCFA),a lab-side scalable computer forensics framework, introduced the concept of a message passing concurrency based forensic framework. Since then, the amount of per-investigation data to be processed in a lab environment has continued to grow significantly while available RAM and CPU processing power combined with prohibitive cost and limited capacity of SSD solutions have shifted processing from being largely CPU constrained to being much more IO constrained. OCFA suffered from several page-cache-miss related performance issues that have grown more significant as a result of this shift. In the light of anti-forensics and general issues related to process integrity, OCFA did not leverage the power of its message passing based design to address integrity concerns.
The main purpose of this dissertation is to analyze and evaluate a number of page-cache friendly technologies that could contribute to the creation of a computer forensics lab-geared scalable message-passing-concurrency based forensic framework with a significantly reduced quantity of page-cache-miss induced spurious IO operations, taking into account integrity related issues.
Provenance logs from historic investigations conducted using the Open Computer Forensics Architecture were thoroughly analyzed in this study, during which several bottlenecks and design flaws in OCFA were identified. A number of strategies were devised to address these bottlenecks in future computer forensic frameworks. Finally, the most prominently page-cache related strategies were consolidated with access-control measures into a user-space file-system and low-level API prototype.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Symbolic Reachability Analysis of B through ProB and LTSmin
Authors:
Jens Bendisposto,
Philipp Koerner,
Michael Leuschel,
Jeroen Meijer,
Jaco van de Pol,
Helen Treharne,
Jorden Whitefield
Abstract:
We present a symbolic reachability analysis approach for B that can provide a significant speedup over traditional explicit state model checking. The symbolic analysis is implemented by linking ProB to LTSmin, a high-performance language independent model checker. The link is achieved via LTSmin's PINS interface, allowing ProB to benefit from LTSmin's analysis algorithms, while only writing a few…
▽ More
We present a symbolic reachability analysis approach for B that can provide a significant speedup over traditional explicit state model checking. The symbolic analysis is implemented by linking ProB to LTSmin, a high-performance language independent model checker. The link is achieved via LTSmin's PINS interface, allowing ProB to benefit from LTSmin's analysis algorithms, while only writing a few hundred lines of glue-code, along with a bridge between ProB and C using ZeroMQ. ProB supports model checking of several formal specification languages such as B, Event-B, Z and TLA. Our experiments are based on a wide variety of B-Method and Event-B models to demonstrate the efficiency of the new link. Among the tested categories are state space generation and deadlock detection; but action detection and invariant checking are also feasible in principle. In many cases we observe speedups of several orders of magnitude. We also compare the results with other approaches for improving model checking, such as partial order reduction or symmetry reduction. We thus provide a new scalable, symbolic analysis algorithm for the B-Method and Event-B, along with a platform to integrate other model checking improvements via LTSmin in the future.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.
-
Bandwidth and Wavefront Reduction for Static Variable Ordering in Symbolic Model Checking
Authors:
Jeroen Meijer,
Jaco van de Pol
Abstract:
We demonstrate the applicability of bandwidth and wavefront reduction algorithms to static variable ordering. In symbolic model checking event locality plays a major role in time and memory usage. For example, in Petri nets event locality can be captured by dependency matrices, where nonzero entries indicate whether a transition modifies a place. The quality of event locality has been expressed as…
▽ More
We demonstrate the applicability of bandwidth and wavefront reduction algorithms to static variable ordering. In symbolic model checking event locality plays a major role in time and memory usage. For example, in Petri nets event locality can be captured by dependency matrices, where nonzero entries indicate whether a transition modifies a place. The quality of event locality has been expressed as a metric called (weighted) event span. The bandwidth of a matrix is a metric indicating the distance of nonzero elements to the diagonal. Wavefront is a metric indicating the degree of nonzeros on one end of the diagonal of the matrix. Bandwidth and wavefront are well studied metrics used in sparse matrix solvers.
In this work we prove that span is limited by twice the bandwidth of a matrix. This observation makes bandwidth reduction algorithms useful for obtaining good variable orders. One major issue we address is that the reduction algorithms can only be applied on symmetric matrices, while the dependency matrices are asymmetric. We show that the Sloan algorithm executed on the total graph of the adjacency graph gives the best variable orders. Practically, we demonstrate that our work allows to call standard sparse matrix operations in Boost and ViennaCL, computing very good static variable orders in milliseconds. Future work is promising, because a whole new spectrum of more off-the-shelf algorithms, including metaheuristic ones, become available for variable ordering.
△ Less
Submitted 27 November, 2015;
originally announced November 2015.