Search | arXiv e-print repository

doi 10.1016/j.ebiom.2024.105006

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

Authors: Congyu Fang, Adam Dziedzic, Lin Zhang, Laura Oliva, Amol Verma, Fahad Razak, Nicolas Papernot, Bo Wang

Abstract: Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucia… ▽ More Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). It offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets; (2) it safeguards patient privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized server. We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing it enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability. △ Less

Submitted 28 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: page 6 and 12, typos corrected. Results unchanged

Journal ref: eBioMedicine, vol. 101, p. 105006, 2024

arXiv:2202.00765 [pdf, other]

doi 10.1109/LRA.2022.3142905

A Model for Multi-View Residual Covariances based on Perspective Deformation

Authors: Alejandro Fontan, Laura Oliva, Javier Civera, Rudolph Triebel

Abstract: In this work, we derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. The core of our approach is the formulation of the residual covariances as a combination of geometric and photometric noise sources. And our key novel contribution is the derivation of a term modelling how local 2D patches suffer from perspective deformation when imaging 3D surfa… ▽ More In this work, we derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. The core of our approach is the formulation of the residual covariances as a combination of geometric and photometric noise sources. And our key novel contribution is the derivation of a term modelling how local 2D patches suffer from perspective deformation when imaging 3D surfaces around a point. Together, these add up to an efficient and general formulation which not only improves the accuracy of both feature-based and direct methods, but can also be used to estimate more accurate measures of the state entropy and hence better founded point visibility thresholds. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment, improving their accuracy with a negligible overhead. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2004.05248 [pdf, other]

Experiences and Lessons Learned Creating and Validating Concept Inventories for Cybersecurity

Authors: Alan T. Sherman, Geoffrey L. Herman, Linda Oliva, Peter A. H. Peterson, Enis Golaszewski, Seth Poulsen, Travis Scheponik, Akshita Gorti

Abstract: We reflect on our ongoing journey in the educational Cybersecurity Assessment Tools (CATS) Project to create two concept inventories for cybersecurity. We identify key steps in this journey and important questions we faced. We explain the decisions we made and discuss the consequences of those decisions, highlighting what worked well and what might have gone better. The CATS Project is creating… ▽ More We reflect on our ongoing journey in the educational Cybersecurity Assessment Tools (CATS) Project to create two concept inventories for cybersecurity. We identify key steps in this journey and important questions we faced. We explain the decisions we made and discuss the consequences of those decisions, highlighting what worked well and what might have gone better. The CATS Project is creating and validating two concept inventories---conceptual tests of understanding---that can be used to measure the effectiveness of various approaches to teaching and learning cybersecurity. The Cybersecurity Concept Inventory (CCI) is for students who have recently completed any first course in cybersecurity; the Cybersecurity Curriculum Assessment (CCA) is for students who have recently completed an undergraduate major or track in cybersecurity. Each assessment tool comprises 25 multiple-choice questions (MCQs) of various difficulties that target the same five core concepts, but the CCA assumes greater technical background. Key steps include defining project scope, identifying the core concepts, uncovering student misconceptions, creating scenarios, drafting question stems, develo** distractor answer choices, generating educational materials, performing expert reviews, recruiting student subjects, organizing workshops, building community acceptance, forming a team and nurturing collaboration, adopting tools, and obtaining and using funding. Creating effective MCQs is difficult and time-consuming, and cybersecurity presents special challenges. Because cybersecurity issues are often subtle, where the adversarial model and details matter greatly, it is challenging to construct MCQs for which there is exactly one best but non-obvious answer. We hope that our experiences and lessons learned may help others create more effective concept inventories and assessments in STEM. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: Invited paper for the 2020 National Cyber Summit, June 2-4, 2020, in Huntsville, AL

arXiv:1909.04230 [pdf]

Investigating Crowdsourcing to Generate Distractors for Multiple-Choice Assessments

Authors: Travis Scheponik, Enis Golaszewski, Geoffrey Herman, Spencer Offenberger, Linda Oliva, Peter A. H. Peterson, Alan T. Sherman

Abstract: We present and analyze results from a pilot study that explores how crowdsourcing can be used in the process of generating distractors (incorrect answer choices) in multiple-choice concept inventories (conceptual tests of understanding). To our knowledge, we are the first to propose and study this approach. Using Amazon Mechanical Turk, we collected approximately 180 open-ended responses to severa… ▽ More We present and analyze results from a pilot study that explores how crowdsourcing can be used in the process of generating distractors (incorrect answer choices) in multiple-choice concept inventories (conceptual tests of understanding). To our knowledge, we are the first to propose and study this approach. Using Amazon Mechanical Turk, we collected approximately 180 open-ended responses to several question stems from the Cybersecurity Concept Inventory of the Cybersecurity Assessment Tools Project and from the Digital Logic Concept Inventory. We generated preliminary distractors by filtering responses, grou** similar responses, selecting the four most frequent groups, and refining a representative distractor for each of these groups. We analyzed our data in two ways. First, we compared the responses and resulting distractors with those from the aforementioned inventories. Second, we obtained feedback from Amazon Mechanical Turk on the resulting new draft test items (including distractors) from additional subjects. Challenges in using crowdsourcing include controlling the selection of subjects and filtering out responses that do not reflect genuine effort. Despite these challenges, our results suggest that crowdsourcing can be a very useful tool in generating effective distractors (attractive to subjects who do not understand the targeted concept). Our results also suggest that this method is faster, easier, and cheaper than is the traditional method of having one or more experts draft distractors, and building on talk-aloud interviews with subjects to uncover their misconceptions. Our results are significant because generating effective distractors is one of the most difficult steps in creating multiple-choice assessments. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1901.09286 [pdf, ps, other]

The CATS Hackathon: Creating and Refining Test Items for Cybersecurity Concept Inventories

Authors: Alan T. Sherman, Linda Oliva, Enis Golaszewski, Dhananjay Phatak, Travis Scheponik, Geoffrey L. Herman, Dong San Choi, Spencer E. Offenberger, Peter Peterson, Josiah Dykstra, Gregory V. Bard, Ankur Chattopadhyay, Filipo Sharevski, Rakesh Verma, Ryan Vrecenar

Abstract: For two days in February 2018, 17 cybersecurity educators and professionals from government and industry met in a "hackathon" to refine existing draft multiple-choice test items, and to create new ones, for a Cybersecurity Concept Inventory (CCI) and Cybersecurity Curriculum Assessment (CCA) being developed as part of the Cybersecurity Assessment Tools (CATS) Project. We report on the results of t… ▽ More For two days in February 2018, 17 cybersecurity educators and professionals from government and industry met in a "hackathon" to refine existing draft multiple-choice test items, and to create new ones, for a Cybersecurity Concept Inventory (CCI) and Cybersecurity Curriculum Assessment (CCA) being developed as part of the Cybersecurity Assessment Tools (CATS) Project. We report on the results of the CATS Hackathon, discussing the methods we used to develop test items, highlighting the evolution of a sample test item through this process, and offering suggestions to others who may wish to organize similar hackathons. Each test item embodies a scenario, question stem, and five answer choices. During the Hackathon, participants organized into teams to (1) Generate new scenarios and question stems, (2) Extend CCI items into CCA items, and generate new answer choices for new scenarios and stems, and (3) Review and refine draft CCA test items. The CATS Project provides rigorous evidence-based instruments for assessing and evaluating educational practices; these instruments can help identify pedagogies and content that are effective in teaching cybersecurity. The CCI measures how well students understand basic concepts in cybersecurity---especially adversarial thinking---after a first course in the field. The CCA measures how well students understand core concepts after completing a full cybersecurity curriculum. △ Less

Submitted 26 January, 2019; originally announced January 2019.

Comments: Submitted to IEEE Secuirty & Privacy

arXiv:1811.04794 [pdf, other]

The SFS Summer Research Study at UMBC: Project-Based Learning Inspires Cybersecurity Students

Authors: Alan Sherman, Enis Golaszewski, Edward LaFemina, Ethan Goldschen, Mohammed Khan, Lauren Mundy, Mykah Rather, Bryan Solis, Wubnyonga Tete, Edwin Valdez, Brian Weber, Damian Doyle, Casey O'Brien, Linda Oliva, Joseph Roundy, Jack Suess

Abstract: May 30-June 2, 2017, Scholarship for Service (SFS) scholars at the University of Maryland, Baltimore County (UMBC) analyzed the security of a targeted aspect of the UMBC computer systems. During this hands-on study, with complete access to source code, students identified vulnerabilities, devised and implemented exploits, and suggested mitigations. As part of a pioneering program at UMBC to extend… ▽ More May 30-June 2, 2017, Scholarship for Service (SFS) scholars at the University of Maryland, Baltimore County (UMBC) analyzed the security of a targeted aspect of the UMBC computer systems. During this hands-on study, with complete access to source code, students identified vulnerabilities, devised and implemented exploits, and suggested mitigations. As part of a pioneering program at UMBC to extend SFS scholarships to community colleges, the study helped initiate six students from two nearby community colleges, who transferred to UMBC in fall 2017 to complete their four-year degrees in computer science and information systems. The study examined the security of a set of "NetAdmin" custom scripts that enable UMBC faculty and staff to open the UMBC firewall to allow external access to machines they control for research purposes. Students discovered vulnerabilities stemming from weak architectural design, record overflow, and failure to sanitize inputs properly. For example, they implemented a record-overflow and code-injection exploit that exfiltrated the vital API key of the UMBC firewall. This report summarizes student activities and findings, and reflects on lessons learned for students, educators, and system administrators. Our students found the collaborative experience inspirational, students and educators appreciated the authentic case study, and IT administrators gained access to future employees and received free recommendations for improving the security of their systems. We hope that other universities can benefit from our motivational and educational strategy of teaming educators and system administrators to engage students in active project-based learning centering on focused questions about their university computer systems. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Comments: Full-length report with 18 pages, 4 figures

arXiv:1706.05092 [pdf, ps, other]

Creating a Cybersecurity Concept Inventory: A Status Report on the CATS Project

Authors: Alan T. Sherman, Linda Oliva, David DeLatte, Enis Golaszewski, Michael Neary, Konstantinos Patsourakos, Dhananjay Phatak, Travis Scheponik, Geoffrey L. Herman, Julia Thompson

Abstract: We report on the status of our Cybersecurity Assessment Tools (CATS) project that is creating and validating a concept inventory for cybersecurity, which assesses the quality of instruction of any first course in cybersecurity. In fall 2014, we carried out a Delphi process that identified core concepts of cybersecurity. In spring 2016, we interviewed twenty-six students to uncover their understand… ▽ More We report on the status of our Cybersecurity Assessment Tools (CATS) project that is creating and validating a concept inventory for cybersecurity, which assesses the quality of instruction of any first course in cybersecurity. In fall 2014, we carried out a Delphi process that identified core concepts of cybersecurity. In spring 2016, we interviewed twenty-six students to uncover their understandings and misconceptions about these concepts. In fall 2016, we generated our first assessment tool--a draft Cybersecurity Concept Inventory (CCI), comprising approximately thirty multiple-choice questions. Each question targets a concept; incorrect answers are based on observed misconceptions from the interviews. This year we are validating the draft CCI using cognitive interviews, expert reviews, and psychometric testing. In this paper, we highlight our progress to date in develo** the CCI. The CATS project provides infrastructure for a rigorous evidence-based improvement of cybersecurity education. The CCI permits comparisons of different instructional methods by assessing how well students learned the core concepts of the field (especially adversarial thinking), where instructional methods refer to how material is taught (e.g., lab-based, case-studies, collaborative, competitions, gaming). Specifically, the CCI is a tool that will enable researchers to scientifically quantify and measure the effect of their approaches to, and interventions in, cybersecurity education. △ Less

Submitted 15 June, 2017; originally announced June 2017.

Comments: Appears in the proceedings of the 2017 National Cyber Summit (June 6--8, 2017, Huntsville, AL)

arXiv:1703.08859 [pdf, ps, other]

The INSuRE Project: CAE-Rs Collaborate to Engage Students in Cybersecurity Research

Authors: Alan Sherman, M. Dark, A. Chan, R. Chong, T. Morris, L. Oliva, J. Springer, B. Thuraisingham, C. Vatcher, R. Verma, S. Wetzel

Abstract: Since fall 2012, several National Centers of Academic Excellence in Cyber Defense Research (CAE-Rs) fielded a collaborative course to engage students in solving applied cybersecurity research problems. We describe our experiences with this Information Security Research and Education (INSuRE) research collaborative. We explain how we conducted our project-based research course, give examples of stu… ▽ More Since fall 2012, several National Centers of Academic Excellence in Cyber Defense Research (CAE-Rs) fielded a collaborative course to engage students in solving applied cybersecurity research problems. We describe our experiences with this Information Security Research and Education (INSuRE) research collaborative. We explain how we conducted our project-based research course, give examples of student projects, and discuss the outcomes and lessons learned. △ Less

Submitted 26 March, 2017; originally announced March 2017.

Comments: A shorter version of this paper has been submitted to IEEE Security and Privacy

Showing 1–8 of 8 results for author: Oliva, L