Search | arXiv e-print repository

Interactive Simulations of Backdoors in Neural Networks

Abstract: This work addresses the problem of planting and defending cryptographic-based backdoors in artificial intelligence (AI) models. The motivation comes from our lack of understanding and the implications of using cryptographic techniques for planting undetectable backdoors under theoretical assumptions in the large AI model systems deployed in practice. Our approach is based on designing a web-based… ▽ More This work addresses the problem of planting and defending cryptographic-based backdoors in artificial intelligence (AI) models. The motivation comes from our lack of understanding and the implications of using cryptographic techniques for planting undetectable backdoors under theoretical assumptions in the large AI model systems deployed in practice. Our approach is based on designing a web-based simulation playground that enables planting, activating, and defending cryptographic backdoors in neural networks (NN). Simulations of planting and activating backdoors are enabled for two scenarios: in the extension of NN model architecture to support digital signature verification and in the modified architectural block for non-linear operators. Simulations of backdoor defense against backdoors are available based on proximity analysis and provide a playground for a game of planting and defending against backdoors. The simulations are available at https://pages.nist.gov/nn-calculator △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 13 pages, 7 figures, 1 Table

MSC Class: 68 ACM Class: I.2; I.6; E.3

arXiv:2401.13023 [pdf]

Enabling Global Image Data Sharing in the Life Sciences

Authors: Peter Bajcsy, Sreenivas Bhattiprolu, Katy Boerner, Beth A Cimini, Lucy Collinson, Jan Ellenberg, Reto Fiolka, Maryellen Giger, Wojtek Goscinski, Matthew Hartley, Nathan Hotaling, Rick Horwitz, Florian Jug, Anna Kreshuk, Emma Lundberg, Aastha Mathur, Kedar Narayan, Shuichi Onami, Anne L. Plant, Fred Prior, Jason Swedlow, Adam Taylor, Antje Keppler

Abstract: Coordinated collaboration is essential to realize the added value of and infrastructure requirements for global image data sharing in the life sciences. In this White Paper, we take a first step at presenting some of the most common use cases as well as critical/emerging use cases of (including the use of artificial intelligence for) biological and medical image data, which would benefit tremendou… ▽ More Coordinated collaboration is essential to realize the added value of and infrastructure requirements for global image data sharing in the life sciences. In this White Paper, we take a first step at presenting some of the most common use cases as well as critical/emerging use cases of (including the use of artificial intelligence for) biological and medical image data, which would benefit tremendously from better frameworks for sharing (including technical, resourcing, legal, and ethical aspects). In the second half of this paper, we paint an ideal world scenario for how global image data sharing could work and benefit all life sciences and beyond. As this is still a long way off, we conclude by suggesting several concrete measures directed toward our institutions, existing imaging communities and data initiatives, and national funders, as well as publishers. Our vision is that within the next ten years, most researchers in the world will be able to make their datasets openly available and use quality image data of interest to them for their research and benefit. This paper is published in parallel with a companion White Paper entitled Harmonizing the Generation and Pre-publication Stewardship of FAIR Image Data, which addresses challenges and opportunities related to producing well-documented and high-quality image data that is ready to be shared. The driving goal is to address remaining challenges and democratize access to everyday practices and tools for a spectrum of biomedical researchers, regardless of their expertise, access to resources, and geographical location. △ Less

Submitted 2 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: This manuscript (arXiv:2401.13023) is published with a closely related companion entitled, Harmonizing the Generation and Pre-publication Stewardship of FAIR Image Data, which can be found at the following link, arXiv:2401.13022

arXiv:2212.06576 [pdf, other]

AI Model Utilization Measurements For Finding Class Encoding Patterns

Authors: Peter Bajcsy, Antonio Cardone, Chenyi Ling, Philippe Dessauw, Michael Majurski, Tim Blattner, Derek Juba, Walid Keyrouz

Abstract: This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in sel… ▽ More This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 45 pages, 29 figures, 7 tables

arXiv:2101.12016 [pdf, other]

Baseline Pruning-Based Approach to Trojan Detection in Neural Networks

Authors: Peter Bajcsy, Michael Majurski

Abstract: This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing systematically pruned NN models. Our pruning-based approach consists of three main steps. First, detect any deviations from the reference look-up tables of model file sizes and model graphs. Next, measure the accuracy of a set of systematically pruned NN models following multiple pruning schemas. Finally, c… ▽ More This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing systematically pruned NN models. Our pruning-based approach consists of three main steps. First, detect any deviations from the reference look-up tables of model file sizes and model graphs. Next, measure the accuracy of a set of systematically pruned NN models following multiple pruning schemas. Finally, classify a NN model as clean or poisoned by applying a map** between accuracy measurements and NN model labels. This work outlines a theoretical and experimental framework for finding the optimal map** over a large search space of pruning parameters. Based on our experiments using Round 1 and Round 2 TrojAI Challenge datasets, the approach achieves average classification accuracy of 69.73 % and 82.41% respectively with an average processing time of less than 60 s per model. For both datasets random guessing would produce 50% classification accuracy. Reference model graphs and source code are available from GitHub. △ Less

Submitted 9 February, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

Comments: The funding for all authors was provided by IARPA: IARPA-20001-D2020-2007180011

arXiv:2101.09153 [pdf]

doi 10.1111/jmi.13041

QUAREP-LiMi: A community-driven initiative to establish guidelines for quality assessment and reproducibility for instruments and images in light microscopy

Authors: Glyn Nelson, Ulrike Boehm, Steve Bagley, Peter Bajcsy, Johanna Bischof, Claire M Brown, Aurelien Dauphin, Ian M Dobbie, John E Eriksson, Orestis Faklaris, Julia Fernandez-Rodriguez, Alexia Ferrand, Laurent Gelman, Ali Gheisari, Hella Hartmann, Christian Kukat, Alex Laude, Miso Mitkovski, Sebastian Munck, Alison J North, Tobias M Rasse, Ute Resch-Genger, Lucas C Schuetz, Arne Seitz, Caterina Strambio-De-Castillia , et al. (75 additional authors not shown)

Abstract: In April 2020, the QUality Assessment and REProducibility for Instruments and Images in Light Microscopy (QUAREP-LiMi) initiative was formed. This initiative comprises imaging scientists from academia and industry who share a common interest in achieving a better understanding of the performance and limitations of microscopes and improved quality control (QC) in light microscopy. The ultimate goal… ▽ More In April 2020, the QUality Assessment and REProducibility for Instruments and Images in Light Microscopy (QUAREP-LiMi) initiative was formed. This initiative comprises imaging scientists from academia and industry who share a common interest in achieving a better understanding of the performance and limitations of microscopes and improved quality control (QC) in light microscopy. The ultimate goal of the QUAREP-LiMi initiative is to establish a set of common QC standards, guidelines, metadata models, and tools, including detailed protocols, with the ultimate aim of improving reproducible advances in scientific research. This White Paper 1) summarizes the major obstacles identified in the field that motivated the launch of the QUAREP-LiMi initiative; 2) identifies the urgent need to address these obstacles in a grassroots manner, through a community of stakeholders including, researchers, imaging scientists, bioimage analysts, bioimage informatics developers, corporate partners, funding agencies, standards organizations, scientific publishers, and observers of such; 3) outlines the current actions of the QUAREP-LiMi initiative, and 4) proposes future steps that can be taken to improve the dissemination and acceptance of the proposed guidelines to manage QC. To summarize, the principal goal of the QUAREP-LiMi initiative is to improve the overall quality and reproducibility of light microscope image data by introducing broadly accepted standard practices and accurately captured image data metrics. △ Less

Submitted 27 January, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

Comments: 17 pages, 3 figures, shortened abstract, Co-Lead Authors: Glyn Nelson and Ulrike Boehm, Corresponding author: Roland Nitschke

Journal ref: J. Microsc. 2021;1-18

arXiv:2006.03707 [pdf, other]

Scientific Calculator for Designing Trojan Detectors in Neural Networks

Authors: Peter Bajcsy, Nicholas J. Schaub, Michael Majurski

Abstract: This work presents a web-based interactive neural network (NN) calculator and a NN inefficiency measurement that has been investigated for the purpose of detecting trojans embedded in NN models. This NN Calculator is designed on top of TensorFlow Playground with in-memory storage of data and NN graphs plus coefficients. It is "like a scientific calculator" with analytical, visualization, and outpu… ▽ More This work presents a web-based interactive neural network (NN) calculator and a NN inefficiency measurement that has been investigated for the purpose of detecting trojans embedded in NN models. This NN Calculator is designed on top of TensorFlow Playground with in-memory storage of data and NN graphs plus coefficients. It is "like a scientific calculator" with analytical, visualization, and output operations performed on training datasets and NN architectures. The prototype is aaccessible at https://pages.nist.gov/nn-calculator. The analytical capabilities include a novel measurement of NN inefficiency using modified Kullback-Liebler (KL) divergence applied to histograms of NN model states, as well as a quantification of the sensitivity to variables related to data and NNs. Both NN Calculator and KL divergence are used to devise a trojan detector approach for a variety of trojan embeddings. Experimental results document desirable properties of the KL divergence measurement with respect to NN architectures and dataset perturbations, as well as inferences about embedded trojans. △ Less

Submitted 24 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: Presented at AAAI FSS-20: Artificial Intelligence in Government and Public Sector, Washington, DC, USA

arXiv:1910.11370 [pdf]

A perspective on Microscopy Metadata: data provenance and quality control

Authors: Maximiliaan Huisman, Mathias Hammer, Alex Rigano, Ulrike Boehm, James J. Chambers, Nathalie Gaudreault, Alison J. North, Jaime A. Pimentel, Damir Sudar, Peter Bajcsy, Claire M. Brown, Alexander D. Corbett, Orestis Faklaris, Judith Lacoste, Alex Laude, Glyn Nelson, Roland Nitschke, David Grunwald, Caterina Strambio-De-Castillia

Abstract: The application of microscopy in biomedical research has come a long way since Antonie van Leeuwenhoek discovered unicellular organisms. Countless innovations have positioned light microscopy as a cornerstone of modern biology and a method of choice for connecting omics datasets to their biological and clinical correlates. Still, regardless of how convincing published imaging data looks, it does n… ▽ More The application of microscopy in biomedical research has come a long way since Antonie van Leeuwenhoek discovered unicellular organisms. Countless innovations have positioned light microscopy as a cornerstone of modern biology and a method of choice for connecting omics datasets to their biological and clinical correlates. Still, regardless of how convincing published imaging data looks, it does not always convey meaningful information about the conditions in which it was acquired, processed, and analyzed. Adequate record-kee**, reporting, and quality control are therefore essential to ensure experimental rigor and data fidelity, allow experiments to be reproducibly repeated, and promote the proper evaluation, interpretation, comparison, and re-use. To this end, microscopy images should be accompanied by complete descriptions detailing experimental procedures, biological samples, microscope hardware specifications, image acquisition parameters, and image analysis procedures, as well as metrics accounting for instrument performance and calibration. However, universal, community-accepted Microscopy Metadata standards and reporting specifications that would result in Findable Accessible Interoperable and Reproducible (FAIR) microscopy data have not yet been established. To understand this shortcoming and to propose a way forward, here we provide an overview of the nature of microscopy metadata and its importance for fostering data quality, reproducibility, scientific rigor, and sharing value in light microscopy. The proposal for tiered Microscopy Metadata Specifications that extend the OME Data Model put forth by the 4D Nucleome Initiative and by Bioimaging North America [1-3] as well as a suite of three complementary and interoperable tools are being developed to facilitate the process of image data documentation and are presented in related manuscripts [4-6]. △ Less

Submitted 31 May, 2021; v1 submitted 24 October, 2019; originally announced October 2019.

arXiv:0902.0744 [pdf]

Embedding Data within Knowledge Spaces

Authors: James D. Myers, Joe Futrelle, Jeff Gaynor, Joel Plutchak, Peter Bajcsy, Jason Kastner, Kailash Kotwani, Jong Sung Lee, Luigi Marini, Rob Kooper, Robert E. McGrath, Terry McLaren, Alejandro Rodriguez, Yong Liu

Abstract: The promise of e-Science will only be realized when data is discoverable, accessible, and comprehensible within distributed teams, across disciplines, and over the long-term--without reliance on out-of-band (non-digital) means. We have developed the open-source Tupelo semantic content management framework and are employing it to manage a wide range of e-Science entities (including data, document… ▽ More The promise of e-Science will only be realized when data is discoverable, accessible, and comprehensible within distributed teams, across disciplines, and over the long-term--without reliance on out-of-band (non-digital) means. We have developed the open-source Tupelo semantic content management framework and are employing it to manage a wide range of e-Science entities (including data, documents, workflows, people, and projects) and a broad range of metadata (including provenance, social networks, geospatial relationships, temporal relations, and domain descriptions). Tupelo couples the use of global identifiers and resource description framework (RDF) statements with an aggregatable content repository model to provide a unified space for securely managing distributed heterogeneous content and relationships. △ Less

Submitted 4 February, 2009; originally announced February 2009.

Comments: 10 pages with 1 figure. Corrected incorrect transliteration in abstract

ACM Class: H.3.5; H.5.3; I.2.4

Showing 1–8 of 8 results for author: Bajcsy, P