-
A framework for develo** a knowledge management platform
Authors:
Marie Lisandra Zepeda Mendoza,
Sonali Agarwal,
James A. Blackshaw,
Vanesa Bol,
Audrey Fazzi,
Filippo Fiorini,
Amy Louise Foreman,
Nancy George,
Brett R. Johnson,
Brian Martin,
Dave McComb,
Euphemia Mutasa-Gottgens,
Helen Parkinson,
Martin Romacker,
Rolf Russell,
Valérien Ségard,
Shawn Zheng Kai Tan,
Wei Kheng Teh,
F. P. Winstanley,
Benedict Wong,
Adrian M. Smith
Abstract:
Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu…
▽ More
Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide guidance on envisioning, executing, evaluating, and evolving knowledge management platforms. We emphasize essential considerations such as setting knowledge domain boundaries and measuring success, as well as the importance of making knowledge accessible for downstream applications and non-computational users and highlights necessary personal and organizational skills for success. We stress the importance of collaboration and the need for convergence on shared principles and commitment to provide or seek resources to advance KM. The community is invited to join the journey of KM and contribute to the advancement of the field by applying and improving on the guidelines described.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Cycling on the Freeway: The Perilous State of Open Source Neuroscience Software
Authors:
Britta U. Westner,
Daniel R. McCloy,
Eric Larson,
Alexandre Gramfort,
Daniel S. Katz,
Arfon M. Smith,
invited co-signees
Abstract:
Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - o…
▽ More
Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - or to generate such data if we work with simulations. In recent years there has been a shift toward relying on free, open-source scientific software (FOSSS) for neuroscience data analysis (Poldrack et al., 2019), in line with the broader open science movement in academia (McKiernan et al., 2016) and wider industry trends (Eghbal, 2016). Importantly, FOSSS is typically developed by working scientists (not professional software developers) which sets up a precarious situation given the nature of the typical academic workplace (wherein academics, especially in their early careers, are on short and fixed term contracts). In this paper, we will argue that the existing ecosystem of neuroscientific open source software is brittle, and discuss why and how the neuroscience community needs to come together to ensure a healthy growth of our software landscape to the benefit of all.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Modeling Disease Progression in Mild Cognitive Impairment and Alzheimer's Disease with Digital Twins
Authors:
Daniele Bertolini,
Anton D. Loukianov,
Aaron M. Smith,
David Li-Bland,
Yannick Pouliot,
Jonathan R. Walsh,
Charles K. Fisher
Abstract:
Alzheimer's Disease (AD) is a neurodegenerative disease that affects subjects in a broad range of severity and is assessed in clinical trials with multiple cognitive and functional instruments. As clinical trials in AD increasingly focus on earlier stages of the disease, especially Mild Cognitive Impairment (MCI), the ability to model subject outcomes across the disease spectrum is extremely impor…
▽ More
Alzheimer's Disease (AD) is a neurodegenerative disease that affects subjects in a broad range of severity and is assessed in clinical trials with multiple cognitive and functional instruments. As clinical trials in AD increasingly focus on earlier stages of the disease, especially Mild Cognitive Impairment (MCI), the ability to model subject outcomes across the disease spectrum is extremely important. We use unsupervised machine learning models called Conditional Restricted Boltzmann Machines (CRBMs) to create Digital Twins of AD subjects. Digital Twins are simulated clinical records that share baseline data with actual subjects and comprehensively model their outcomes under standard-of-care. The CRBMs are trained on a large set of records from subjects in observational studies and the placebo arms of clinical trials across the AD spectrum. These data exhibit a challenging, but common, patchwork of measured and missing observations across subjects in the dataset, and we present a novel model architecture designed to learn effectively from it. We evaluate performance against a held-out test dataset and show how Digital Twins simultaneously capture the progression of a number of key endpoints in clinical trials across a broad spectrum of disease severity, including MCI and mild-to-moderate AD.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Reconfigurable Integrated Optical Interferometer Network-Based Physically Unclonable Function
Authors:
A. Matthew Smith,
H S. Jacinto
Abstract:
In this article we describe the characteristics of a large integrated linear optical device containing Mach-Zehnder interferometers and describe its potential use as a physically unclonable function. We propose that any tunable interferometric device of practical scale will be intrinsically unclonable and will possess an inherent randomness that can be useful for many practical applications. The d…
▽ More
In this article we describe the characteristics of a large integrated linear optical device containing Mach-Zehnder interferometers and describe its potential use as a physically unclonable function. We propose that any tunable interferometric device of practical scale will be intrinsically unclonable and will possess an inherent randomness that can be useful for many practical applications. The device under test has the additional use-case as a general-purpose photonic manipulation tool, with various applications based on the experimental results of our prototype. Once our tunable interferometric device is set to work as a physically unclonable function, we find that there are approximately 6.85x10E35 challenge-response pairs, where each challenge can be quickly reconfigured by tuning the interferometer array for subsequent challenges.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Utilizing a Fully Optical and Reconfigurable PUF as a Quantum Authentication Mechanism
Authors:
H S. Jacinto,
A. Matthew Smith
Abstract:
In this work the novel usage of a physically unclonable function composed of a network of Mach-Zehnder interferometers for authentication tasks is described. The physically unclonable function hardware is completely reconfigurable, allowing for a large number of seemingly independent devices to be utilized, thus imitating a large array of single-response physically unclonable functions. It is prop…
▽ More
In this work the novel usage of a physically unclonable function composed of a network of Mach-Zehnder interferometers for authentication tasks is described. The physically unclonable function hardware is completely reconfigurable, allowing for a large number of seemingly independent devices to be utilized, thus imitating a large array of single-response physically unclonable functions. It is proposed that any reconfigurable array of Mach-Zehnder interferometers can be used as an authentication mechanism, not only for physical objects, but for information transmitted both classically and quantumly. The proposed use-case for a fully-optical physically unclonable function, designed with reconfigurable hardware, is to authenticate messages between a trusted and possibly untrusted party; verifying that the messages received are generated by the holder of the authentic device.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks
Authors:
Jonathan R. Walsh,
Aaron M. Smith,
Yannick Pouliot,
David Li-Bland,
Anton Loukianov,
Charles K. Fisher
Abstract:
Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins,…
▽ More
Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures.
△ Less
Submitted 19 April, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era
Authors:
Brian Nord,
Andrew J. Connolly,
Jamie Kinney,
Jeremy Kubica,
Gautaum Narayan,
Joshua E. G. Peek,
Chad Schafer,
Erik J. Tollerud,
Camille Avestruz,
G. Jogesh Babu,
Simon Birrer,
Douglas Burke,
João Caldeira,
Douglas A. Caldwell,
Joleen K. Carlberg,
Yen-Chi Chen,
Chuanfei Dong,
Eric D. Feigelson,
V. Zach Golkhou,
Vinay Kashyap,
T. S. Li,
Thomas Loredo,
Luisa Lucie-Smith,
Kaisey S. Mandel,
J. R. Martínez-Galarza
, et al. (13 additional authors not shown)
Abstract:
The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt --- e.g., via the onset of artificial intelligence --- which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our…
▽ More
The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt --- e.g., via the onset of artificial intelligence --- which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists' contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops (https://petabytestoscience.github.io/).
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
The demise of the filesystem and multi level service architecture
Authors:
William O'Mullane,
Niall Gaffney,
Frossie Economou,
Arfon M. Smith,
J. Ross Thomson,
Tim Jenness
Abstract:
Many astronomy data centres still work on filesystems. Industry has moved on; current practice in computing infrastructure is to achieve Big Data scalability using object stores rather than POSIX file systems. This presents us with opportunities for portability and reuse of software underlying processing and archive systems but it also causes problems for legacy implementations in current data cen…
▽ More
Many astronomy data centres still work on filesystems. Industry has moved on; current practice in computing infrastructure is to achieve Big Data scalability using object stores rather than POSIX file systems. This presents us with opportunities for portability and reuse of software underlying processing and archive systems but it also causes problems for legacy implementations in current data centers.
△ Less
Submitted 31 July, 2019; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Taking the Scenic Route: Automatic Exploration for Videogames
Authors:
Ze** Zhan,
Batu Aytemiz,
Adam M. Smith
Abstract:
Machine playtesting tools and game moment search engines require exposure to the diversity of a game's state space if they are to report on or index the most interesting moments of possible play. Meanwhile, mobile app distribution services would like to quickly determine if a freshly-uploaded game is fit to be published. Having access to a semantic map of reachable states in the game would enable…
▽ More
Machine playtesting tools and game moment search engines require exposure to the diversity of a game's state space if they are to report on or index the most interesting moments of possible play. Meanwhile, mobile app distribution services would like to quickly determine if a freshly-uploaded game is fit to be published. Having access to a semantic map of reachable states in the game would enable efficient inference in these applications. However, human gameplay data is expensive to acquire relative to the coverage of a game that it provides. We show that off-the-shelf automatic exploration strategies can explore with an effectiveness comparable to human gameplay on the same timescale. We contribute generic methods for quantifying exploration quality as a function of time and demonstrate our metric on several elementary techniques and human players on a collection of commercial games sampled from multiple game platforms (from Atari 2600 to Nintendo 64). Emphasizing the diversity of states reached and the semantic map extracted, this work makes productive contrast with the focus on finding a behavior policy or optimizing game score used in most automatic game playing research.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Addressing the Fundamental Tension of PCGML with Discriminative Learning
Authors:
Isaac Karth,
Adam M. Smith
Abstract:
Procedural content generation via machine learning (PCGML) is typically framed as the task of fitting a generative model to full-scale examples of a desired content distribution. This approach presents a fundamental tension: the more design effort expended to produce detailed training examples for sha** a generator, the lower the return on investment from applying PCGML in the first place. In re…
▽ More
Procedural content generation via machine learning (PCGML) is typically framed as the task of fitting a generative model to full-scale examples of a desired content distribution. This approach presents a fundamental tension: the more design effort expended to produce detailed training examples for sha** a generator, the lower the return on investment from applying PCGML in the first place. In response, we propose the use of discriminative models (which capture the validity of a design rather the distribution of the content) trained on positive and negative examples. Through a modest modification of WaveFunctionCollapse, a commercially-adopted PCG approach that we characterize as using elementary machine learning, we demonstrate a new mode of control for learning-based generators. We demonstrate how an artist might craft a focused set of additional positive and negative examples by critique of the generator's previous outputs. This interaction mode bridges PCGML with mixed-initiative design assistance tools by working with a machine to define a space of valid designs rather than just one new design.
△ Less
Submitted 10 September, 2018;
originally announced September 2018.
-
Deep learning for comprehensive forecasting of Alzheimer's Disease progression
Authors:
Charles K. Fisher,
Aaron M. Smith,
Jonathan R. Walsh,
the Coalition Against Major Diseases
Abstract:
Most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative that uses unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1908 patients with Mild Cognitive Impairment or Alzheimer's Disease to train a model for personalized forecasting…
▽ More
Most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative that uses unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1908 patients with Mild Cognitive Impairment or Alzheimer's Disease to train a model for personalized forecasting of disease progression. We simulate synthetic patient data including the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics, generating both predictions and their confidence intervals. Our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifically trained supervised models and identifies sub-components associated with word recall as predictive of progression. The ability to simultaneously simulate dozens of patient characteristics is a crucial step towards personalized medicine for Alzheimer's Disease.
△ Less
Submitted 7 November, 2018; v1 submitted 10 July, 2018;
originally announced July 2018.
-
Boltzmann Encoded Adversarial Machines
Authors:
Charles K. Fisher,
Aaron M. Smith,
Jonathan R. Walsh
Abstract:
Restricted Boltzmann Machines (RBMs) are a class of generative neural network that are typically trained to maximize a log-likelihood objective function. We argue that likelihood-based training strategies may fail because the objective does not sufficiently penalize models that place a high probability in regions where the training data distribution has low probability. To overcome this problem, w…
▽ More
Restricted Boltzmann Machines (RBMs) are a class of generative neural network that are typically trained to maximize a log-likelihood objective function. We argue that likelihood-based training strategies may fail because the objective does not sufficiently penalize models that place a high probability in regions where the training data distribution has low probability. To overcome this problem, we introduce Boltzmann Encoded Adversarial Machines (BEAMs). A BEAM is an RBM trained against an adversary that uses the hidden layer activations of the RBM to discriminate between the training data and the probability distribution generated by the model. We present experiments demonstrating that BEAMs outperform RBMs and GANs on multiple benchmarks.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
Journal of Open Source Software (JOSS): design and first-year review
Authors:
Arfon M Smith,
Kyle E Niemeyer,
Daniel S Katz,
Lorena A Barba,
George Githinji,
Melissa Gymrek,
Kathryn D Huff,
Christopher R Madan,
Abigail Cabunoc Mayes,
Kevin M Moerman,
Pjotr Prins,
Karthik Ram,
Ariel Rokem,
Tracy K Teal,
Roman Valls Guimera,
Jacob T Vanderplas
Abstract:
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit s…
▽ More
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a DOI, deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative.
△ Less
Submitted 24 January, 2018; v1 submitted 7 July, 2017;
originally announced July 2017.
-
The challenge and promise of software citation for credit, identification, discovery, and reuse
Authors:
Kyle E. Niemeyer,
Arfon M. Smith,
Daniel S. Katz
Abstract:
In this article, we present the challenge of software citation as a method to ensure credit for and identification, discovery, and reuse of software in scientific and engineering research. We discuss related work and key challenges/research directions, including suggestions for metadata necessary for software citation.
In this article, we present the challenge of software citation as a method to ensure credit for and identification, discovery, and reuse of software in scientific and engineering research. We discuss related work and key challenges/research directions, including suggestions for metadata necessary for software citation.
△ Less
Submitted 6 July, 2016; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Multidefender Security Games
Authors:
Jian Lou,
Andrew M. Smith,
Yevgeniy Vorobeychik
Abstract:
Stackelberg security game models and associated computational tools have seen deployment in a number of high-consequence security settings, such as LAX canine patrols and Federal Air Marshal Service. These models focus on isolated systems with only one defender, despite being part of a more complex system with multiple players. Furthermore, many real systems such as transportation networks and the…
▽ More
Stackelberg security game models and associated computational tools have seen deployment in a number of high-consequence security settings, such as LAX canine patrols and Federal Air Marshal Service. These models focus on isolated systems with only one defender, despite being part of a more complex system with multiple players. Furthermore, many real systems such as transportation networks and the power grid exhibit interdependencies between targets and, consequently, between decision makers jointly charged with protecting them. To understand such multidefender strategic interactions present in security, we investigate game theoretic models of security games with multiple defenders. Unlike most prior analysis, we focus on the situations in which each defender must protect multiple targets, so that even a single defender's best response decision is, in general, highly non-trivial. We start with an analytical investigation of multidefender security games with independent targets, offering an equilibrium and price-of-anarchy analysis of three models with increasing generality. In all models, we find that defenders have the incentive to over-protect targets, at times significantly. Additionally, in the simpler models, we find that the price of anarchy is unbounded, linearly increasing both in the number of defenders and the number of targets per defender. Considering interdependencies among targets, we develop a novel mixed-integer linear programming formulation to compute a defender's best response, and make use of this formulation in approximating Nash equilibria of the game. We apply this approach towards computational strategic analysis of several models of networks representing interdependencies, including real-world power networks. Our analysis shows how network structure and the probability of failure spread determine the propensity of defenders to over- or under-invest in security.
△ Less
Submitted 28 May, 2015;
originally announced May 2015.
-
Characterizing short-term stability for Boolean networks over any distribution of transfer functions
Authors:
C. Seshadhri,
Andrew M. Smith,
Yevgeniy Vorobeychik,
Jackson Mayo,
Robert C. Armstrong
Abstract:
We present a characterization of short-term stability of random Boolean networks under \emph{arbitrary} distributions of transfer functions. Given any distribution of transfer functions for a random Boolean network, we present a formula that decides whether short-term chaos (damage spreading) will happen. We provide a formal proof for this formula, and empirically show that its predictions are acc…
▽ More
We present a characterization of short-term stability of random Boolean networks under \emph{arbitrary} distributions of transfer functions. Given any distribution of transfer functions for a random Boolean network, we present a formula that decides whether short-term chaos (damage spreading) will happen. We provide a formal proof for this formula, and empirically show that its predictions are accurate. Previous work only works for special cases of balanced families. It has been observed that these characterizations fail for unbalanced families, yet such families are widespread in real biological networks.
△ Less
Submitted 15 September, 2014;
originally announced September 2014.
-
Implementing Transitive Credit with JSON-LD
Authors:
Daniel S. Katz,
Arfon M. Smith
Abstract:
Science and engineering research increasingly relies on activities that facilitate research but are not currently rewarded or recognized, such as: data sharing; develo** common data resources, software and methodologies; and annotating data and publications. To promote and advance these activities, we must develop mechanisms for assigning credit, facilitate the appropriate attribution of researc…
▽ More
Science and engineering research increasingly relies on activities that facilitate research but are not currently rewarded or recognized, such as: data sharing; develo** common data resources, software and methodologies; and annotating data and publications. To promote and advance these activities, we must develop mechanisms for assigning credit, facilitate the appropriate attribution of research outcomes, devise incentives for activities that facilitate research, and allocate funds to maximize return on investment. In this article, we focus on addressing the issue of assigning credit for both direct and indirect contributions, specifically by using JSON-LD to implement a prototype transitive credit system.
△ Less
Submitted 2 September, 2014; v1 submitted 18 July, 2014;
originally announced July 2014.