Search | arXiv e-print repository

Why Larp?! A Synthesis Paper on Live Action Roleplay in Relation to HCI Research and Practice

Authors: Karin Johansson, Raquel Breejon Robinson, Jon Back, Sarah Lynne Bowman, James Fey, Elena Márquez Segura, Annika Waern, Katherine Isbister

Abstract: Live action roleplay (larp) has a wide range of applications, and can be relevant in relation to HCI. While there has been research about larp in relation to topics such as embodied interaction, playfulness and futuring published in HCI venues since the early 2000s, there is not yet a compilation of this knowledge. In this paper, we synthesise knowledge about larp and larp-adjacent work within the… ▽ More Live action roleplay (larp) has a wide range of applications, and can be relevant in relation to HCI. While there has been research about larp in relation to topics such as embodied interaction, playfulness and futuring published in HCI venues since the early 2000s, there is not yet a compilation of this knowledge. In this paper, we synthesise knowledge about larp and larp-adjacent work within the domain of HCI. We present a practitioner overview from an expert group of larp researchers, the results of a literature review, and highlight particular larp research exemplars which all work together to showcase the diverse set of ways that larp can be utilised in relation to HCI topics and research. This paper identifies the need for further discussions toward establishing best practices for utilising larp in relation to HCI research, as well as advocating for increased engagement with larps outside academia. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.01848 [pdf, other]

doi 10.1145/3613905.3644063

"That's Not Good Science!": An Argument for the Thoughtful Use of Formative Situations in Research through Design

Authors: Raquel B Robinson, Anya Osborne, Chen Ji, James Collin Fey, Ella Dagan, Katherine Isbister

Abstract: Most currently accepted approaches to evaluating Research through Design (RtD) presume that design prototypes are finalized and ready for robust testing in laboratory or in-the-wild settings. However, it is also valuable to assess designs at intermediate phases with mid-fidelity prototypes, not just to inform an ongoing design process, but also to glean knowledge of broader use to the research com… ▽ More Most currently accepted approaches to evaluating Research through Design (RtD) presume that design prototypes are finalized and ready for robust testing in laboratory or in-the-wild settings. However, it is also valuable to assess designs at intermediate phases with mid-fidelity prototypes, not just to inform an ongoing design process, but also to glean knowledge of broader use to the research community. We propose 'formative situations' as a frame for examining mid-fidelity prototypes-in-process in this way. We articulate a set of criteria to help the community better assess the rigor of formative situations, in the service of opening conversation about establishing formative situations as a valuable contribution type within the RtD community. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 8 pages, 1 figure

arXiv:2401.06421 [pdf]

Uncertainty quantification for probabilistic machine learning in earth observation using conformal prediction

Authors: Geethen Singh, Glenn Moncrieff, Zander Venter, Kerry Cawse-Nicholson, Jasper Slingsby, Tamara B Robinson

Abstract: Unreliable predictions can occur when using artificial intelligence (AI) systems with negative consequences for downstream applications, particularly when employed for decision-making. Conformal prediction provides a model-agnostic framework for uncertainty quantification that can be applied to any dataset, irrespective of its distribution, post hoc. In contrast to other pixel-level uncertainty qu… ▽ More Unreliable predictions can occur when using artificial intelligence (AI) systems with negative consequences for downstream applications, particularly when employed for decision-making. Conformal prediction provides a model-agnostic framework for uncertainty quantification that can be applied to any dataset, irrespective of its distribution, post hoc. In contrast to other pixel-level uncertainty quantification methods, conformal prediction operates without requiring access to the underlying model and training dataset, concurrently offering statistically valid and informative prediction regions, all while maintaining computational efficiency. In response to the increased need to report uncertainty alongside point predictions, we bring attention to the promise of conformal prediction within the domain of Earth Observation (EO) applications. To accomplish this, we assess the current state of uncertainty quantification in the EO domain and found that only 20% of the reviewed Google Earth Engine (GEE) datasets incorporated a degree of uncertainty information, with unreliable methods prevalent. Next, we introduce modules that seamlessly integrate into existing GEE predictive modelling workflows and demonstrate the application of these tools for datasets spanning local to global scales, including the Dynamic World and Global Ecosystem Dynamics Investigation (GEDI) datasets. These case studies encompass regression and classification tasks, featuring both traditional and deep learning-based workflows. Subsequently, we discuss the opportunities arising from the use of conformal prediction in EO. We anticipate that the increased availability of easy-to-use implementations of conformal predictors, such as those provided here, will drive wider adoption of rigorous uncertainty quantification in EO, thereby enhancing the reliability of uses such as operational monitoring and decision making. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2310.16169 [pdf, other]

A Bayesian model calibration framework for stochastic compartmental models with both time-varying and time-invariant parameters

Authors: Brandon Robinson, Philippe Bisaillon, Jodi D. Edwards, Tetyana Kendzerska, Mohammad Khalil, Dominique Poirel, Abhijit Sarkar

Abstract: We consider state and parameter estimation for compartmental models having both time-varying and time-invariant parameters. Though the described Bayesian computational framework is general, we look at a specific application to the susceptible-infectious-removed (SIR) model which describes a basic mechanism for the spread of infectious diseases through a system of coupled nonlinear differential equ… ▽ More We consider state and parameter estimation for compartmental models having both time-varying and time-invariant parameters. Though the described Bayesian computational framework is general, we look at a specific application to the susceptible-infectious-removed (SIR) model which describes a basic mechanism for the spread of infectious diseases through a system of coupled nonlinear differential equations. The SIR model consists of three states, namely, the three compartments, and two parameters which control the coupling among the states. The deterministic SIR model with time-invariant parameters has shown to be overly simplistic for modelling the complex long-term dynamics of diseases transmission. Recognizing that certain model parameters will naturally vary in time due to seasonal trends, non-pharmaceutical interventions, and other random effects, the estimation procedure must systematically permit these time-varying effects to be captured, without unduly introducing artificial dynamics into the system. To this end, we leverage the robustness of the Markov Chain Monte Carlo (MCMC) algorithm for the estimation of time-invariant parameters alongside nonlinear filters for the joint estimation of the system state and time-varying parameters. We demonstrate performance of the framework by first considering a series of examples using synthetic data, followed by an exposition on public health data collected in the province of Ontario. △ Less

Submitted 4 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15614 [pdf, other]

Sparse Bayesian neural networks for regression: Tackling overfitting and computational challenges in uncertainty quantification

Authors: Nastaran Dabiran, Brandon Robinson, Rimple Sandhu, Mohammad Khalil, Dominique Poirel, Abhijit Sarkar

Abstract: Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requiremen… ▽ More Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requirements pose significant challenges. Moreover, both frequentist NNs and BNNs suffer from overfitting issues when dealing with noisy and sparse data, which render their predictions unwieldy away from the available data space. To address both these problems simultaneously, we leverage insights from a hierarchical setting in which the parameter priors are conditional on hyperparameters to construct a BNN by applying a semi-analytical framework known as nonlinear sparse Bayesian learning (NSBL). We call our network sparse Bayesian neural network (SBNN) which aims to address the practical and computational issues associated with BNNs. Simultaneously, imposing a sparsity-inducing prior encourages the automatic pruning of redundant parameters based on the automatic relevance determination (ARD) concept. This process involves removing redundant parameters by optimally selecting the precision of the parameters prior probability density functions (pdfs), resulting in a tractable treatment for overfitting. To demonstrate the benefits of the SBNN algorithm, the study presents an illustrative regression problem and compares the results of a BNN using standard Bayesian inference, hierarchical Bayesian inference, and a BNN equipped with the proposed algorithm. Subsequently, we demonstrate the importance of considering the full parameter posterior by comparing the results with those obtained using the Laplace approximation with and without NSBL. △ Less

Submitted 25 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.14749 [pdf, other]

Exploring hierarchical framework of nonlinear sparse Bayesian learning algorithm through numerical investigations

Authors: Nastaran Dabiran, Brandon Robinson, Rimple Sandhu, Mohammad Khalil, Chris L. Pettit, Dominique Poirel, Abhijit Sarkar

Abstract: Sparse Bayesian learning (SBL) has been extensively utilized in data-driven modeling to combat the issue of overfitting. While SBL excels in linear-in-parameter models, its direct applicability is limited in models where observations possess nonlinear relationships with unknown parameters. Recently, a semi-analytical Bayesian framework known as nonlinear sparse Bayesian learning (NSBL) was introdu… ▽ More Sparse Bayesian learning (SBL) has been extensively utilized in data-driven modeling to combat the issue of overfitting. While SBL excels in linear-in-parameter models, its direct applicability is limited in models where observations possess nonlinear relationships with unknown parameters. Recently, a semi-analytical Bayesian framework known as nonlinear sparse Bayesian learning (NSBL) was introduced by the authors to induce sparsity among model parameters during the Bayesian inversion of nonlinear-in-parameter models. NSBL relies on optimally selecting the hyperparameters of sparsity-inducing Gaussian priors. It is inherently an approximate method since the uncertainty in the hyperparameter posterior is disregarded as we instead seek the maximum a posteriori (MAP) estimate of the hyperparameters (type-II MAP estimate). This paper aims to investigate the hierarchical structure that forms the basis of NSBL and validate its accuracy through a comparison with a one-level hierarchical Bayesian inference as a benchmark in the context of three numerical experiments: (i) a benchmark linear regression example with Gaussian prior and Gaussian likelihood, (ii) the same regression problem with a highly non-Gaussian prior, and (iii) an example of a dynamical system with a non-Gaussian prior and a highly non-Gaussian likelihood function, to explore the performance of the algorithm in these new settings. Through these numerical examples, it can be shown that NSBL is well-suited for physics-based models as it can be readily applied to models with non-Gaussian prior distributions and non-Gaussian likelihood functions. Moreover, we illustrate the accuracy of the NSBL algorithm as an approximation to the one-level hierarchical Bayesian inference and its ability to reduce the computational cost while adequately exploring the parameter posteriors. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2307.06392 [pdf, other]

Deep learning-based Segmentation of Rabbit fetal skull with limited and sub-optimal annotations

Authors: Rajath Soans, Alexa Gleason, Tosha Shah, Corey Miller, Barbara Robinson, Kimberly Brannen, Antong Chen

Abstract: In this paper, we propose a deep learning-based method to segment the skeletal structures in the micro-CT images of Dutch-Belted rabbit fetuses which can assist in the assessment of drug-induced skeletal abnormalities as a required study in developmental and reproductive toxicology (DART). Our strategy leverages sub-optimal segmentation labels of 22 skull bones from 26 micro-CT volumes and maps th… ▽ More In this paper, we propose a deep learning-based method to segment the skeletal structures in the micro-CT images of Dutch-Belted rabbit fetuses which can assist in the assessment of drug-induced skeletal abnormalities as a required study in developmental and reproductive toxicology (DART). Our strategy leverages sub-optimal segmentation labels of 22 skull bones from 26 micro-CT volumes and maps them to 250 unlabeled volumes on which a deep CNN-based segmentation model is trained. In the experiments, our model was able to achieve an average Dice Similarity Coefficient (DSC) of 0.89 across all bones on the testing set, and 14 out of the 26 skull bones reached average DSC >0.93. Our next steps are segmenting the whole body followed by develo** a model to classify abnormalities. △ Less

Submitted 24 May, 2023; originally announced July 2023.

Comments: Accepted short paper - MIDL 2023

arXiv:2305.17300 [pdf, other]

Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence

Authors: Erik C. Johnson, Brian S. Robinson, Gautam K. Vallabha, Justin Joyce, Jordan K. Matelsky, Raphael Norman-Tenazas, Isaac Western, Marisel Villafañe-Delgado, Martha Cervantes, Michael S. Robinette, Arun V. Reddy, Lindsey Kitchell, Patricia K. Rivlin, Elizabeth P. Reilly, Nathan Drenkow, Matthew J. Roos, I-Jeng Wang, Brock A. Wester, William R. Gray-Roncal, Joan A. Hoffmann

Abstract: Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursue… ▽ More Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery of repeated subcircuits, or motifs. These were incorporated into a neural architecture search approach to evolve network architectures. Second, we have conducted analysis of the heading direction circuit in the fruit fly, which performs fusion of visual and angular velocity features, to explore augmenting existing computational models with new insight. Our team discovered a novel pattern of connectivity, implemented a new model, and demonstrated sensor fusion on a robotic platform. Third, the team analyzed circuitry for memory formation in the fruit fly connectome, enabling the design of a novel generative replay approach. Finally, the team has begun analysis of connectivity in mammalian cortex to explore potential improvements to transformer networks. These constraints increased network robustness on the most challenging examples in the CIFAR-10-C computer vision robustness benchmark task, while reducing learnable attention parameters by over an order of magnitude. Taken together, these results demonstrate multiple potential approaches to utilize insight from neural systems for develo** robust and efficient machine learning techniques. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures

arXiv:2303.09647 [pdf, other]

Anomaly Search Over Many Sequences With Switching Costs

Authors: Matthew Ubl, Benjamin D. Robinson, Matthew T. Hale

Abstract: This paper considers the quickest search problem to identify anomalies among large numbers of data streams. These streams can model, for example, disjoint regions monitored by a mobile robot. A particular challenge is a version of the problem in which the experimenter must suffer a cost each time the data stream being sampled changes, such as the time the robot must spend moving between regions. I… ▽ More This paper considers the quickest search problem to identify anomalies among large numbers of data streams. These streams can model, for example, disjoint regions monitored by a mobile robot. A particular challenge is a version of the problem in which the experimenter must suffer a cost each time the data stream being sampled changes, such as the time the robot must spend moving between regions. In this paper, we propose an algorithm which accounts for switching costs by varying a confidence threshold that governs when the algorithm switches to a new data stream. Our main contributions are easily computable approximations for both the optimal value of this threshold and the optimal value of the parameter that determines when a stream must be re-sampled. Further, we empirically show (i) a uniform improvement for switching costs of interest and (ii) roughly equivalent performance for small switching costs when comparing to the closest available algorithm. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures

arXiv:2210.11476 [pdf, other]

Encoding nonlinear and unsteady aerodynamics of limit cycle oscillations using nonlinear sparse Bayesian learning

Authors: Rimple Sandhu, Brandon Robinson, Mohammad Khalil, Chris L. Pettit, Dominique Poirel, Abhijit Sarkar

Abstract: This paper investigates the applicability of a recently-proposed nonlinear sparse Bayesian learning (NSBL) algorithm to identify and estimate the complex aerodynamics of limit cycle oscillations. NSBL provides a semi-analytical framework for determining the data-optimal sparse model nested within a (potentially) over-parameterized model. This is particularly relevant to nonlinear dynamical systems… ▽ More This paper investigates the applicability of a recently-proposed nonlinear sparse Bayesian learning (NSBL) algorithm to identify and estimate the complex aerodynamics of limit cycle oscillations. NSBL provides a semi-analytical framework for determining the data-optimal sparse model nested within a (potentially) over-parameterized model. This is particularly relevant to nonlinear dynamical systems where modelling approaches involve the use of physics-based and data-driven components. In such cases, the data-driven components, where analytical descriptions of the physical processes are not readily available, are often prone to overfitting, meaning that the empirical aspects of these models will often involve the calibration of an unnecessarily large number of parameters. While it may be possible to fit the data well, this can become an issue when using these models for predictions in regimes that are different from those where the data was recorded. In view of this, it is desirable to not only calibrate the model parameters, but also to identify the optimal compromise between data-fit and model complexity. In this paper, this is achieved for an aeroelastic system where the structural dynamics are well-known and described by a differential equation model, coupled with a semi-empirical aerodynamic model for laminar separation flutter resulting in low-amplitude limit cycle oscillations. For the purpose of illustrating the benefit of the algorithm, in this paper, we use synthetic data to demonstrate the ability of the algorithm to correctly identify the optimal model and model parameters, given a known data-generating model. The synthetic data are generated from a forward simulation of a known differential equation model with parameters selected so as to mimic the dynamics observed in wind-tunnel experiments. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.08730 [pdf, other]

Robust Bayesian state and parameter estimation framework for stochastic dynamical systems with combined time-varying and time-invariant parameters

Authors: Philippe Bisaillon, Brandon Robinson, Mohammad Khalil, Chris L. Pettit, Dominique Poirel, Abhijit Sarkar

Abstract: We consider state and parameter estimation for a dynamical system having both time-varying and time-invariant parameters. It has been shown that the robustness of the Markov Chain Monte Carlo (MCMC) algorithm for estimating time-invariant parameters alongside nonlinear filters for state estimation provided more reliable estimates than the estimates obtained solely using nonlinear filters for combi… ▽ More We consider state and parameter estimation for a dynamical system having both time-varying and time-invariant parameters. It has been shown that the robustness of the Markov Chain Monte Carlo (MCMC) algorithm for estimating time-invariant parameters alongside nonlinear filters for state estimation provided more reliable estimates than the estimates obtained solely using nonlinear filters for combined state and parameter estimation. In a similar fashion, we adopt the extended Kalman filter (EKF) for state estimation and the estimation of the time-varying system parameters, but reserve the task of estimating time-invariant parameters to the MCMC algorithm. In a standard method, we augment the state vector to include the original states of the system and the subset of the parameters that are time-varying. Each time-varying parameter is perturbed by a white noise process, and we treat the strength of this artificial noise as an additional time-invariant parameter to be estimated by MCMC, circumventing the need for manual tuning. Conventionally, both time-varying and time-invariant parameters are appended in the state vector, and thus for the purpose of estimation, both are free to vary in time. However, allowing time-invariant system parameters to vary in time introduces artificial dynamics into the system, which we avoid by treating these time-invariant parameters as static and estimating them using MCMC. Furthermore, by estimating the time-invariant parameters by MCMC, the augmented state is smaller and the nonlinearity in the ensuing state space model will tend to be weaker than in the conventional approach. We illustrate the above-described approach for a simple dynamical system in which some model parameters are time-varying, while the remaining parameters are time-invariant. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2209.05245 [pdf, other]

Continual learning benefits from multiple sleep mechanisms: NREM, REM, and Synaptic Downscaling

Authors: Brian S. Robinson, Clare W. Lau, Alexander New, Shane M. Nichols, Erik C. Johnson, Michael Wolmetz, William G. Coon

Abstract: Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial ad… ▽ More Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial adaptation. Here, we investigate how modeling three distinct components of mammalian sleep together affects continual learning in artificial neural networks: (1) a veridical memory replay process observed during non-rapid eye movement (NREM) sleep; (2) a generative memory replay process linked to REM sleep; and (3) a synaptic downscaling process which has been proposed to tune signal-to-noise ratios and support neural upkeep. We find benefits from the inclusion of all three sleep components when evaluating performance on a continual learning CIFAR-100 image classification benchmark. Maximum accuracy improved during training and catastrophic forgetting was reduced during later tasks. While some catastrophic forgetting persisted over the course of network training, higher levels of synaptic downscaling lead to better retention of early tasks and further facilitated the recovery of early task accuracy during subsequent training. One key takeaway is that there is a trade-off at hand when considering the level of synaptic downscaling to use - more aggressive downscaling better protects early tasks, but less downscaling enhances the ability to learn new tasks. Intermediate levels can strike a balance with the highest overall accuracies during training. Overall, our results both provide insight into how to adapt sleep components to enhance artificial continual learning systems and highlight areas for future neuroscientific sleep research to further such systems. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 9 pages, 12 figures, code available upon reasonable request. Corresponding author: William G. Coon ([email protected])

arXiv:2208.01839 [pdf, other]

Scalable Computational Algorithms for Geo-spatial Covid-19 Spread in High Performance Computing

Authors: Sudhi P. V., Victorita Dolean, Pierre Jolivet, Brandon Robinson, Jodi D. Edwards, Tetyana Kendzerska, Abhijit Sarkar

Abstract: A nonlinear partial differential equation (PDE) based compartmental model of COVID-19 provides a continuous trace of infection over space and time. Finer resolutions in the spatial discretization, the inclusion of additional model compartments and model stratifications based on clinically relevant categories contribute to an increase in the number of unknowns to the order of millions. We adopt a p… ▽ More A nonlinear partial differential equation (PDE) based compartmental model of COVID-19 provides a continuous trace of infection over space and time. Finer resolutions in the spatial discretization, the inclusion of additional model compartments and model stratifications based on clinically relevant categories contribute to an increase in the number of unknowns to the order of millions. We adopt a parallel scalable solver allowing faster solutions for these high fidelity models. The solver combines domain decomposition and algebraic multigrid preconditioners at multiple levels to achieve the desired strong and weak scalability. As a numerical illustration of this general methodology, a five-compartment susceptible-exposed-infected-recovered-deceased (SEIRD) model of COVID-19 is used to demonstrate the scalability and effectiveness of the proposed solver for a large geographical domain (Southern Ontario). It is possible to predict the infections up to three months for a system size of 92 million (using 1780 processes) within 7 hours saving months of computational effort needed for the conventional solvers. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:1907.08325 [pdf, other]

Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications

Authors: Shusen Liu, Di Wang, Dan Maljovec, Rushil Anirudh, Jayaraman J. Thiagarajan, Sam Ade Jacobs, Brian C. Van Essen, David Hysom, Jae-Seung Yeom, Jim Gaffney, Luc Peterson, Peter B. Robinson, Harsh Bhatia, Valerio Pascucci, Brian K. Spears, Peer-Timo Bremer

Abstract: With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that re… ▽ More With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications. △ Less

Submitted 18 July, 2019; originally announced July 2019.

arXiv:1905.08674 [pdf]

Software Citation Implementation Challenges

Authors: Daniel S. Katz, Daina Bouquin, Neil P. Chue Hong, Jessica Hausman, Catherine Jones, Daniel Chivvis, Tim Clark, Mercè Crosas, Stephan Druskat, Martin Fenner, Tom Gillespie, Alejandra Gonzalez-Beltran, Morane Gruenpeter, Ted Habermann, Robert Haines, Melissa Harrison, Edwin Henneken, Lorraine Hwang, Matthew B. Jones, Alastair A. Kelly, David N. Kennedy, Katrin Leinweber, Fernando Rios, Carly B. Robinson, Ilian Todorov , et al. (2 additional authors not shown)

Abstract: The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit… ▽ More The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Showing 1–15 of 15 results for author: Robinson, B