Ten simple rules for teaching sustainable software engineering
Authors:
Kit Gallagher,
Richard Creswell,
Ben Lambert,
Martin Robinson,
Chon Lok Lei,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize an…
▽ More
Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, develo** high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers ho** to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
Probabilistic Inference on Noisy Time Series (PINTS)
Authors:
Michael Clerx,
Martin Robinson,
Ben Lambert,
Chon Lok Lei,
Sanmitra Ghosh,
Gary R. Mirams,
David J. Gavaghan
Abstract:
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of…
▽ More
Time series models are ubiquitous in science, arising in any situation where researchers seek to understand how a system's behaviour changes over time. A key problem in time series modelling is \emph{inference}; determining properties of the underlying system based on observed time series. For both statistical and mechanistic models, inference involves finding parameter values, or distributions of parameters values, for which model outputs are consistent with observations. A wide variety of inference techniques are available and different approaches are suitable for different classes of problems. This variety presents a challenge for researchers, who may not have the resources or expertise to implement and experiment with these methods. PINTS (Probabilistic Inference on Noisy Time Series - https://github.com/pints-team/pints is an open-source (BSD 3-clause license) Python library that provides researchers with a broad suite of non-linear optimisation and sampling methods. It allows users to wrap a model and data in a transparent and straightforward interface, which can then be used with custom or pre-defined error measures for optimisation, or with likelihood functions for Bayesian inference or maximum-likelihood estimation. Derivative-free optimisation algorithms - which work without harder-to-obtain gradient information - are included, as well as inference algorithms such as adaptive Markov chain Monte Carlo and nested sampling which estimate distributions over parameter values. By making these statistical techniques available in an open and easy-to-use framework, PINTS brings the power of modern statistical techniques to a wider scientific audience.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.