-
ARTEMIS emulator: exploring the effect of cosmology and galaxy formation physics on Milky Way-mass haloes and their satellites
Authors:
Shaun T. Brown,
Azadeh Fattahi,
Ian G. McCarthy,
Andreea S. Font,
Kyle A. Oman,
Alexander H. Riley
Abstract:
We present the new ARTEMIS emulator suite of high resolution (baryon mass of $2.23 \times 10^{4}$ $h^{-1}$M$_{\odot}$) zoom-in simulations of Milky Way mass systems. Here, three haloes from the original ARTEMIS sample have been rerun multiple times, systematically varying parameters for the stellar feedback model, the density threshold for star formation, the reionisation redshift and the assumed…
▽ More
We present the new ARTEMIS emulator suite of high resolution (baryon mass of $2.23 \times 10^{4}$ $h^{-1}$M$_{\odot}$) zoom-in simulations of Milky Way mass systems. Here, three haloes from the original ARTEMIS sample have been rerun multiple times, systematically varying parameters for the stellar feedback model, the density threshold for star formation, the reionisation redshift and the assumed warm dark matter (WDM) particle mass (assuming a thermal relic). From these simulations emulators are trained for a wide range of statistics that allow for fast predictions at combinations of parameters not originally sampled, running in $\sim 1$ms (a factor of $\sim 10^{11}$ faster than the simulations). In this paper we explore the dependence of the central haloes' stellar mass on the varied parameters, finding the stellar feedback parameters to be the most important. When constraining the parameters to match the present-day stellar mass halo mass relation inferred from abundance matching we find that there is a strong degeneracy in the stellar feedback parameters, corresponding to a freedom in formation time of the stellar component for a fixed halo assembly history. We additionally explore the dependence of the satellite stellar mass function, where it is found that variations in stellar feedback, the reionisation redshift and the WDM mass all have a significant effect. The presented emulators are a powerful tool which allows for fundamentally new ways of analysing and interpreting cosmological hydrodynamic simulations. Crucially, allowing their free (subgrid) parameters to be varied and marginalised, leading to more robust constraints and predictions.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Creating a Discipline-specific Commons for Infectious Disease Epidemiology
Authors:
Michael M. Wagner,
William Hogan,
John Levander,
Adam Darr,
Matt Diller,
Max Sibilla,
Alexander T. Loiacono. Terence Sperringer, Jr.,
Shawn T. Brown
Abstract:
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potenti…
▽ More
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
brainlife.io: A decentralized and open source cloud platform to support neuroscience research
Authors:
Soichi Hayashi,
Bradley A. Caron,
Anibal Sólon Heinsfeld,
Sophia Vinci-Booher,
Brent McPherson,
Daniel N. Bullock,
Giulia Bertò,
Guiomar Niso,
Sandra Hanekamp,
Daniel Levitas,
Kimberly Ray,
Anne MacKenzie,
Lindsey Kitchell,
Josiah K. Leong,
Filipi Nascimento-Silva,
Serge Koudoro,
Hanna Willis,
Jasleen K. Jolly,
Derek Pisner,
Taylor R. Zuidema,
Jan W. Kurzawski,
Kyriaki Mikellidou,
Aurore Bussalb,
Christopher Rorden,
Conner Victory
, et al. (39 additional authors not shown)
Abstract:
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to red…
▽ More
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research.
△ Less
Submitted 11 August, 2023; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Intrinsic alignments of the extended radio continuum emission of galaxies in the EAGLE simulations
Authors:
Alexander D. Hill,
Robert A. Crain,
Ian G. McCarthy,
Shaun T. Brown
Abstract:
We present measurements of the intrinsic alignments (IAs) of the star-forming gas of galaxies in the EAGLE simulations. Radio continuum imaging of this gas enables cosmic shear measurements complementary to optical surveys. We measure the orientation of star-forming gas with respect to the direction to, and orientation of, neighbouring galaxies. Star-forming gas exhibits a preferentially radial or…
▽ More
We present measurements of the intrinsic alignments (IAs) of the star-forming gas of galaxies in the EAGLE simulations. Radio continuum imaging of this gas enables cosmic shear measurements complementary to optical surveys. We measure the orientation of star-forming gas with respect to the direction to, and orientation of, neighbouring galaxies. Star-forming gas exhibits a preferentially radial orientation-direction alignment that is a decreasing function of galaxy pair separation, but remains significant to $\gtrsim 1$ Mpc at $z=0$. The alignment is qualitatively similar to that exhibited by the stars, but is weaker at fixed separation. Pairs of galaxies hosted by more massive subhaloes exhibit stronger alignment at fixed separation, but the strong alignment of close pairs is dominated by ${\sim}L^\star$ galaxies and their satellites. At fixed comoving separation, the radial alignment is stronger at higher redshift. The orientation-orientation alignment is consistent with random at all separations, despite subhaloes exhibiting preferential parallel minor axis alignment. The weaker IA of star-forming gas than for stars stems from the former's tendency to be less well aligned with the dark matter structure of galaxies than the latter, and implies that the systematic uncertainty due to IA may be less severe in radio continuum weak lensing surveys than in optical counterparts. Alignment models equating the orientation of star-forming gas discs to that of stellar discs or the DM structure of host subhaloes will therefore overestimate the impact of IAs on radio continuum cosmic shear measurements.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Towards a universal model for the density profiles of dark matter haloes
Authors:
Shaun T. Brown,
Ian G. McCarthy,
Sam G. Stafford,
Andreea S. Font
Abstract:
It is well established from cosmological simulations that dark matter haloes are not precisely self-similar and an additional parameter, beyond their concentration, is required to accurately describe their spherically-averaged mass density profiles. We present, for the first time, a model to consistently predict both halo concentration, $c$, and this additional `shape' parameter, $α$, for a halo o…
▽ More
It is well established from cosmological simulations that dark matter haloes are not precisely self-similar and an additional parameter, beyond their concentration, is required to accurately describe their spherically-averaged mass density profiles. We present, for the first time, a model to consistently predict both halo concentration, $c$, and this additional `shape' parameter, $α$, for a halo of given mass and redshift for a specified cosmology. Following recent studies, we recast the dependency on mass, redshift, and cosmology to a dependence on `peak height'. We show that, when adopting the standard definition of peak height, which employs the so-called spherical top hat (STH) window function, the concentration--peak height relation has a strong residual dependence on cosmology (i.e., it is not uniquely determined by peak height), whereas the $α$--peak height relation is approximately universal when employing the STH window function. Given the freedom in the choice of window function, we explore a simple modification of the STH function, constraining its form so that it produces universal relations for concentration and $α$ as a function of peak height using a large suite of cosmological simulations. It is found that universal relations for the two density profile parameters can indeed be derived and that these parameters are set by the linear power spectrum, $P(k)$, filtered on different scales. We show that the results of this work generalise to any (reasonable) combination of $P(k)$ and background expansion history, $H(z)$, resulting in accurate predictions of the density profiles of dark matter haloes for a wide range of cosmologies.
△ Less
Submitted 22 November, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Testing extensions to LCDM on small scales with forthcoming cosmic shear surveys
Authors:
Sam G. Stafford,
Ian G McCarthy,
Juliana Kwan,
Shaun T. Brown,
Andreea S. Font,
Andrew Robertson
Abstract:
We investigate the constraining power of forthcoming Stage-IV weak lensing surveys (Euclid, LSST, and NGRST) for extensions to the LCDM model on small scales, via their impact on the cosmic shear power spectrum. We use high-resolution cosmological simulations to calculate how warm dark matter (WDM), self-interacting dark matter (SIDM) and a running of the spectral index affect the non-linear matte…
▽ More
We investigate the constraining power of forthcoming Stage-IV weak lensing surveys (Euclid, LSST, and NGRST) for extensions to the LCDM model on small scales, via their impact on the cosmic shear power spectrum. We use high-resolution cosmological simulations to calculate how warm dark matter (WDM), self-interacting dark matter (SIDM) and a running of the spectral index affect the non-linear matter power spectrum, P(k), as a function of scale and redshift. We evaluate the cosmological constraining power using synthetic weak lensing observations derived from these power spectra and that take into account the anticipated source densities, shape noise and cosmic variance errors of upcoming surveys. We show that upcoming Stage-IV surveys will be able to place useful, independent constraints on both WDM models (ruling out models with a particle mass of < 0.5 keV) and SIDM models (ruling out models with a velocity-independent cross-section of > 10 cm^2 g^-1) through their effects on the small-scale cosmic shear power spectrum. Similarly, they will be able to strongly constrain cosmologies with a running spectral index. Finally, we explore the error associated with the cosmic shear cross-spectrum between tomographic bins, finding that it can be significantly affected by Poisson noise (the standard assumption is that the Poisson noise cancels between tomographic bins). We provide a new analytic form for the error on the cross-spectrum which accurately captures this effect.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Quenching of satellite galaxies of Milky Way analogues: reconciling theory and observations
Authors:
Andreea S. Font,
Ian G. McCarthy,
Vasily Belokurov,
Shaun T. Brown,
Sam G. Stafford
Abstract:
The vast majority of low-mass satellite galaxies around the Milky Way and M31 appear virtually devoid of cool gas and show no signs of recent or ongoing star formation. Cosmological simulations demonstrate that such quenching is expected and is due to the harsh environmental conditions that satellites face when joining the Local Group (LG). However, recent observations of Milky Way analogues in th…
▽ More
The vast majority of low-mass satellite galaxies around the Milky Way and M31 appear virtually devoid of cool gas and show no signs of recent or ongoing star formation. Cosmological simulations demonstrate that such quenching is expected and is due to the harsh environmental conditions that satellites face when joining the Local Group (LG). However, recent observations of Milky Way analogues in the SAGA survey present a very different picture, showing the majority of observed satellites to be actively forming stars, calling into question the realism of current simulations and the typicality of the LG. Here we use the ARTEMIS suite of high-resolution cosmological hydrodynamical simulations to carry out a careful comparison with observations of dwarf satellites in the LG, SAGA, and the Local Volume (LV) survey. We show that differences between SAGA and the LG and LV surveys, as well as between SAGA and the ARTEMIS simulations, can be strongly reduced by considering differences in the host mass distributions and (more importantly) observational selection effects, specifically that low-mass satellites which have only recently been accreted are more likely to be star-forming, have a higher optical surface brightness, and are therefore more likely to be included in the SAGA survey. This picture is confirmed using data from the deeper LV survey, which shows pronounced quenching at low masses, in accordance with the predictions of LCDM-based simulations.
△ Less
Submitted 19 January, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Lightning generation in moist convective clouds and constraints on the water abundance in Jupiter
Authors:
Yury S. Aglyamov,
Jonathan Lunine,
Heidi N. Becker,
Tristan Guillot,
Seran G. Gibbard,
Sushil Atreya,
Scott J. Bolton,
Steven Levin,
Shannon T. Brown,
Michael H. Wong
Abstract:
Recent Juno observations have greatly extended the temporal and spatial coverage of lightning detection on Jupiter. We use these data to constrain a model of moist convection and lightning generation in Jupiter's atmosphere, and derive a roughly solar abundance of water at the base of the water cloud. Shallow lightning, observed by Juno (Becker et al., 2020, Nature, 584, 55-58) and defined as flas…
▽ More
Recent Juno observations have greatly extended the temporal and spatial coverage of lightning detection on Jupiter. We use these data to constrain a model of moist convection and lightning generation in Jupiter's atmosphere, and derive a roughly solar abundance of water at the base of the water cloud. Shallow lightning, observed by Juno (Becker et al., 2020, Nature, 584, 55-58) and defined as flashes originating at altitudes corresponding to pressure less than 2 bars, is reproduced, as is lightning at a deeper range of pressures, including those below the water cloud base. It is found that the generation of lightning requires ammonia to stabilize liquid water at altitudes corresponding to sub-freezing temperatures. We find a range of local water abundances in which lightning is possible, including subsolar values of water--consistent with other determinations of deep oxygen abundance.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Informing dark matter direct detection limits with the ARTEMIS simulations
Authors:
Robert Poole-McKenzie,
Andreea S. Font,
Billy Boxer,
Ian G. McCarthy,
Sergey Burdin,
Sam G. Stafford,
Shaun T. Brown
Abstract:
Dark matter (DM) direct detection experiments aim to place constraints on the DM--nucleon scattering cross-section and the DM particle mass. These constraints depend sensitively on the assumed local DM density and velocity distribution function. While astrophysical observations can inform the former (in a model-dependent way), the latter is not directly accessible with observations. Here we use th…
▽ More
Dark matter (DM) direct detection experiments aim to place constraints on the DM--nucleon scattering cross-section and the DM particle mass. These constraints depend sensitively on the assumed local DM density and velocity distribution function. While astrophysical observations can inform the former (in a model-dependent way), the latter is not directly accessible with observations. Here we use the high-resolution ARTEMIS cosmological hydrodynamical simulation suite of 42 Milky Way-mass halos to explore the spatial and kinematical distributions of the DM in the solar neighbourhood, and we examine how these quantities are influenced by substructures, baryons, the presence of dark discs, as well as general halo-to-halo scatter (cosmic variance). We also explore the accuracy of the standard Maxwellian approach for modelling the velocity distribution function. We find significant halo-to-halo scatter in the density and velocity functions which, if propagated through the standard halo model for predicting the DM detection limits, implies a significant scatter about the typically quoted limit. We also show that, in general, the Maxwellian approximation works relatively well for simulations that include the important gravitational effects of baryons, but is less accurate for collisionless (DM-only) simulations. Given the significant halo-to-halo scatter in quantities relevant for DM direct detection, we advocate propagating this source of uncertainty through in order to derive conservative DM detection limits.
△ Less
Submitted 24 September, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Connecting the structure of dark matter haloes to the primordial power spectrum
Authors:
Shaun T. Brown,
Ian G. McCarthy,
Benedikt Diemer,
Andreea S. Font,
Sam G. Stafford,
Simon Pfeifer
Abstract:
A large body of work based on collisionless cosmological N-body simulations going back over two decades has advanced the idea that collapsed dark matter haloes have simple and approximately universal forms for their mass density and pseudo-phase space density (PPSD) distributions. However, a general consensus on the physical origin of these results has not yet been reached. In the present study, w…
▽ More
A large body of work based on collisionless cosmological N-body simulations going back over two decades has advanced the idea that collapsed dark matter haloes have simple and approximately universal forms for their mass density and pseudo-phase space density (PPSD) distributions. However, a general consensus on the physical origin of these results has not yet been reached. In the present study, we explore to what extent the apparent universality of these forms holds when we vary the initial conditions (i.e., the primordial power spectrum of density fluctuations) away from the standard CMB-normalised case, but still within the context of LCDM with a fixed expansion history. Using simulations that vary the initial amplitude and shape, we show that the structure of dark matter haloes retains a clear memory of the initial conditions. Specifically, increasing (lowering) the amplitude of fluctuations increases (decreases) the concentration of haloes and, if pushed far enough, the density profiles deviate strongly from the NFW form that is a good approximation for the CMB-normalised case. Although, an Einasto form works well. Rather than being universal, the slope of the PPSD (or pseudo-entropy) profile steepens (flattens) with increasing (decreasing) power spectrum amplitude and can exhibit a strong halo mass dependence. Our results therefore indicate that the previously identified universality of the structure of dark matter haloes is mostly a consequence of adopting a narrow range of (CMB-normalised) initial conditions for the simulations. Our new suite provides a useful test-bench against which physical models for the origin of halo structure can be validated.
△ Less
Submitted 9 June, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
The BAHAMAS project: Effects of dynamical dark energy on large-scale structure
Authors:
Simon Pfeifer,
Ian G. McCarthy,
Sam G. Stafford,
Shaun T. Brown,
Andreea S. Font,
Juliana Kwan,
Jaime Salcido,
Joop Schaye
Abstract:
In this work we consider the impact of spatially-uniform but time-varying dark energy (or `dynamical dark energy', DDE) on large-scale structure in a spatially flat universe, using large cosmological hydrodynamical simulations that form part of the BAHAMAS project. As DDE changes the expansion history of the universe, it impacts the growth of structure. We explore variations in DDE that are constr…
▽ More
In this work we consider the impact of spatially-uniform but time-varying dark energy (or `dynamical dark energy', DDE) on large-scale structure in a spatially flat universe, using large cosmological hydrodynamical simulations that form part of the BAHAMAS project. As DDE changes the expansion history of the universe, it impacts the growth of structure. We explore variations in DDE that are constrained to be consistent with the cosmic microwave background. We find that DDE can affect the clustering of matter and haloes at the ~10% level (suppressing it for so-called `freezing' models, while enhancing it for `thawing' models), which should be distinguishable with upcoming large-scale structure surveys. DDE cosmologies can also enhance or suppress the halo mass function (with respect to LCDM) over a wide range of halo masses. The internal properties of haloes are minimally affected by changes in DDE, however. Finally, we show that the impact of baryons and associated feedback processes is largely independent of the change in cosmology and that these processes can be modelled separately to typically better than a few percent accuracy
△ Less
Submitted 27 July, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Exploring extensions to the standard cosmological model and the impact of baryons on small scales
Authors:
Sam G. Stafford,
Shaun T. Brown,
Ian G. McCarthy,
Andreea S. Font,
Andrew Robertson,
Robert Poole-Mckenzie
Abstract:
It has been claimed that the standard model of cosmology (LCDM) cannot easily account for a number of observations on relatively small scales, motivating extensions to the standard model. Here we introduce a new suite of cosmological simulations that systematically explores three plausible extensions: warm dark matter, self-interacting dark matter, and a running of the scalar spectral index of den…
▽ More
It has been claimed that the standard model of cosmology (LCDM) cannot easily account for a number of observations on relatively small scales, motivating extensions to the standard model. Here we introduce a new suite of cosmological simulations that systematically explores three plausible extensions: warm dark matter, self-interacting dark matter, and a running of the scalar spectral index of density fluctuations. Current observational constraints are used to specify the additional parameters that come with these extensions. We examine a large range of observable metrics on small scales, including the halo mass function, density and circular velocity profiles, the abundance of satellite subhaloes, and halo concentrations. For any given metric, significant degeneracies can be present between the extensions. In detail, however, the different extensions have quantitatively distinct mass and radial dependencies, suggesting that a multi-probe approach over a range of scales can be used to break the degeneracies. We also demonstrate that the relative effects on the radial density profiles in the different extensions (compared to the standard model) are converged down to significantly smaller radii than are the absolute profiles. We compare the derived cosmological trends with the impact of baryonic physics using the EAGLE and ARTEMIS simulations. Significant degeneracies are also present between baryonic physics and cosmological variations (with both having similar magnitude effects on some observables). Given the inherent uncertainties both in the modelling of galaxy formation physics and extensions to LCDM, a systematic and simultaneous exploration of both is strongly warranted.
△ Less
Submitted 9 July, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
The ARTEMIS simulations: stellar haloes of Milky Way-mass galaxies
Authors:
Andreea S. Font,
Ian G. McCarthy,
Robert Poole-Mckenzie,
Sam G. Stafford,
Shaun T. Brown,
Joop Schaye,
Robert A. Crain,
Tom Theuns,
Matthieu Schaller
Abstract:
We introduce the ARTEMIS simulations, a new set of 42 zoomed-in, high-resolution (baryon particle mass of ~ 2x10^4 Msun/h), hydrodynamical simulations of galaxies residing in haloes of Milky Way mass, simulated with the EAGLE galaxy formation code with re-calibrated stellar feedback. In this study, we analyse the structure of stellar haloes, specifically the mass density, surface brightness, metal…
▽ More
We introduce the ARTEMIS simulations, a new set of 42 zoomed-in, high-resolution (baryon particle mass of ~ 2x10^4 Msun/h), hydrodynamical simulations of galaxies residing in haloes of Milky Way mass, simulated with the EAGLE galaxy formation code with re-calibrated stellar feedback. In this study, we analyse the structure of stellar haloes, specifically the mass density, surface brightness, metallicity, colour and age radial profiles, finding generally very good agreement with recent observations of local galaxies. The stellar density profiles are well fitted by broken power laws, with inner slopes of ~ -3, outer slopes of ~ -4 and break radii that are typically ~ 20-40 kpc. The break radii generally mark the transition between in situ formation and accretion-driven formation of the halo. The metallicity, colour and age profiles show mild large-scale gradients, particularly when spherically-averaged or viewed along the major axes. Along the minor axes, however, the profiles are nearly flat, in agreement with observations. Overall, the structural properties can be understood by two factors: that in situ stars dominate the inner regions and that they reside in a spatially-flattened distribution that is aligned with the disc. Observations targeting both the major and minor axes of galaxies are thus required to obtain a complete picture of stellar haloes.
△ Less
Submitted 12 August, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Deploying large fixed file datasets with SquashFS and Singularity
Authors:
Pierre Rioux,
Gregory Kiar,
Alexandre Hutton,
Alan C. Evans,
Shawn T. Brown
Abstract:
Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi-user, large capacity storage environment. These suffer performance penalties as the numb…
▽ More
Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi-user, large capacity storage environment. These suffer performance penalties as the number of files increases due to network contention and metadata performance. We demonstrate how a combination of two technologies, Singularity and SquashFS, can help developers, integrators, architects, and scientists deploy large datasets (O(10M) files) on these shared systems with minimal performance limitations. The proposed integration enables more efficient access and indexing than normal file-based dataset installations, while providing transparent file access to users and processes. Furthermore, the approach does not require administrative privileges on the target system. While the examples studied here have been taken from the field of neuroimaging, the technologies adopted are not specific to that field. Currently, this solution is limited to read-only datasets. We propose the adoption of this technology for the consumption and dissemination of community datasets across shared computing resources.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Performance benefits of Intel(R) OptaneTM DC persistent memory for the parallel processing of large neuroimaging data
Authors:
Valerie Hayot-Sasson,
Shawn T Brown,
Tristan Glatard
Abstract:
Open-access neuroimaging datasets have reached petabyte scale, and continue to grow. The ability to leverage the entirety of these datasets is limited to a restricted number of labs with both the capacity and infrastructure to process the data. Whereas Big Data engines have significantly reduced application performance penalties with respect to data movement, their applied strategies (e.g. data lo…
▽ More
Open-access neuroimaging datasets have reached petabyte scale, and continue to grow. The ability to leverage the entirety of these datasets is limited to a restricted number of labs with both the capacity and infrastructure to process the data. Whereas Big Data engines have significantly reduced application performance penalties with respect to data movement, their applied strategies (e.g. data locality, in-memory computing and lazy evaluation) are not necessarily practical within neuroimaging workflows where intermediary results may need to be materialized to shared storage for post-processing analysis. In this paper we evaluate the performance advantage brought by Intel(R) OptaneTM DC persistent memory for the processing of large neuroimaging datasets using the two available configurations modes: Memory mode and App Direct mode. We employ a synthetic algorithm on the 76 GiB and 603 GiB BigBrain, as well as apply a standard neuroimaging application on the Consortium for Reliability and Reproducibility (CoRR) dataset using 25 and 96 parallel processes in both cases. Our results show that the performance of applications leveraging persistent memory is superior to that of other storage devices,with the exception of DRAM. This is the case in both Memory and App Direct mode and irrespective of the amount of data and parallelism. Furthermore, persistent memory in App Direct mode is believed to benefit from the use of DRAM as a cache for writing when output data is significantly smaller than available memory. We believe the use of persistent memory will be beneficial to both neuroimaging applications running on HPC or visualization of large, high-resolution images.
△ Less
Submitted 26 December, 2019;
originally announced December 2019.
-
Comparing Perturbation Models for Evaluating Stability of Neuroimaging Pipelines
Authors:
Gregory Kiar,
Pablo de Oliveira Castro,
Pierre Rioux,
Eric Petit,
Shawn T. Brown,
Alan C. Evans,
Tristan Glatard
Abstract:
A lack of software reproducibility has become increasingly apparent in the last several years, calling into question the validity of scientific findings affected by published tools. Reproducibility issues may have numerous sources of error, including the underlying numerical stability of algorithms and implementations employed. Various forms of instability have been observed in neuroimaging, inclu…
▽ More
A lack of software reproducibility has become increasingly apparent in the last several years, calling into question the validity of scientific findings affected by published tools. Reproducibility issues may have numerous sources of error, including the underlying numerical stability of algorithms and implementations employed. Various forms of instability have been observed in neuroimaging, including across operating system versions, minor noise injections, and implementation of theoretically equivalent algorithms. In this paper we explore the effect of various perturbation methods on a typical neuroimaging pipeline through the use of i) targeted noise injections, ii) Monte Carlo Arithmetic, and iii) varying operating systems to identify the quality and severity of their impact. The work presented here demonstrates that even low order computational models such as the connectome estimation pipeline that we used are susceptible to noise. This suggests that stability is a relevant axis upon which tools should be compared, developed, or improved, alongside more commonly considered axes such as accuracy/biological feasibility or performance. The heterogeneity observed across participants clearly illustrates that stability is a property of not just the data or tools independently, but their interaction. Characterization of stability should therefore be evaluated for specific analyses and performed on a representative set of subjects for consideration in subsequent statistical testing. Additionally, identifying how this relationship scales to higher-order models is an exciting next step which will be explored. Finally, the joint application of perturbation methods with post-processing approaches such as bagging or signal normalization may lead to the development of more numerically stable analyses while maintaining sensitivity to meaningful variation.
△ Less
Submitted 22 April, 2020; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Performance Evaluation of Big Data Processing Strategies for Neuroimaging
Authors:
Valérie Hayot-Sasson,
Shawn T Brown,
Tristan Glatard
Abstract:
Neuroimaging datasets are rapidly growing in size as a result of advancements in image acquisition methods, open-science and data sharing. However, the adoption of Big Data processing strategies by neuroimaging processing engines remains limited. Here, we evaluate three Big Data processing strategies (in-memory computing, data locality and lazy evaluation) on typical neuroimaging use cases, repres…
▽ More
Neuroimaging datasets are rapidly growing in size as a result of advancements in image acquisition methods, open-science and data sharing. However, the adoption of Big Data processing strategies by neuroimaging processing engines remains limited. Here, we evaluate three Big Data processing strategies (in-memory computing, data locality and lazy evaluation) on typical neuroimaging use cases, represented by the BigBrain dataset. We contrast these various strategies using Apache Spark and Nipype as our representative Big Data and neuroimaging processing engines, on Dell EMC's Top-500 cluster. Big Data thresholds were modelled by comparing the data-write rate of the application to the filesystem bandwidth and number of concurrent processes. This model acknowledges the fact that page caching provided by the Linux kernel is critical to the performance of Big Data applications. Results show that in-memory computing alone speeds-up executions by a factor of up to 1.6, whereas when combined with data locality, this factor reaches 5.3. Lazy evaluation strategies were found to increase the likelihood of cache hits, further improving processing time. Such important speed-up values are likely to be observed on typical image processing operations performed on images of size larger than 75GB. A ballpark speculation from our model showed that in-memory computing alone will not speed-up current functional MRI analyses unless coupled with data locality and processing around 280 subjects concurrently. Furthermore, we observe that emulating in-memory computing using in-memory file systems (tmpfs) does not reach the performance of an in-memory engine, presumably due to swap** to disk and the lack of data cleanup. We conclude that Big Data processing strategies are worth develo** for neuroimaging applications.
△ Less
Submitted 2 April, 2019; v1 submitted 16 December, 2018;
originally announced December 2018.
-
A Serverless Tool for Platform Agnostic Computational Experiment Management
Authors:
Gregory Kiar,
Shawn T Brown,
Tristan Glatard,
Alan C Evans
Abstract:
Neuroscience has been carried into the domain of big data and high performance computing (HPC) on the backs of initiatives in data collection and an increasingly compute-intensive tools. While managing HPC experiments requires considerable technical acumen, platforms and standards have been developed to ease this burden on scientists. While web-portals make resources widely accessible, data organi…
▽ More
Neuroscience has been carried into the domain of big data and high performance computing (HPC) on the backs of initiatives in data collection and an increasingly compute-intensive tools. While managing HPC experiments requires considerable technical acumen, platforms and standards have been developed to ease this burden on scientists. While web-portals make resources widely accessible, data organizations such as the Brain Imaging Data Structure and tool description languages such as Boutiques provide researchers with a foothold to tackle these problems using their own datasets, pipelines, and environments. While these standards lower the barrier to adoption of HPC and cloud systems for neuroscience applications, they still require the consolidation of disparate domain-specific knowledge. We present Clowdr, a lightweight tool to launch experiments on HPC systems and clouds, record rich execution records, and enable the accessible sharing of experimental summaries and results. Clowdr uniquely sits between web platforms and bare-metal applications for experiment management by preserving the flexibility of do-it-yourself solutions while providing a low barrier for develo**, deploying and disseminating neuroscientific analysis.
△ Less
Submitted 2 September, 2018;
originally announced September 2018.
-
Advanced Techniques for Scientific Programming and Collaborative Development of Open Source Software Packages at the International Centre for Theoretical Physics (ICTP)
Authors:
Ivan Girotto,
Axel Kohlmeyer,
David Grellscheid,
Shawn T. Brown
Abstract:
A large number of computational scientific research projects make use of open source software packages. However, the development process of such tools frequently differs from conventional software development; partly because of the nature of research, where the problems being addressed are not always fully understood; partly because the majority of the development is often carried out by scientist…
▽ More
A large number of computational scientific research projects make use of open source software packages. However, the development process of such tools frequently differs from conventional software development; partly because of the nature of research, where the problems being addressed are not always fully understood; partly because the majority of the development is often carried out by scientists with limited experience and exposure to best practices of software engineering. Often the software development suffers from the pressure to publish scientific results and that credit for software development is limited in comparison. Fundamental components of software engineering like modular and reusable design, validation, documentation, and software integration as well as effective maintenance and user support tend to be disregarded due to lack of resources and qualified specialists. Thus innovative developments are often hindered by steep learning curves required to master development for legacy software packages full of ad hoc solutions. The growing complexity of research, however, requires suitable and maintainable computational tools, resulting in a widening gap between the potential users (often growing in number) and contributors to the development of such a package. In this paper we share our experiences aiming to improve the situation by training particularly young scientists, through disseminating our own experiences at contributing to open source software packages and practicing key components of software engineering adapted for scientists and scientific software development. Specifically we summarize the outcome of the Workshop in Advanced Techniques for Scientific Programming and Collaborative Development of Open Source Software Packages run at the Abdus Salam International Centre for Theoretical Physics in March 2013, and discuss our conclusions for future efforts.
△ Less
Submitted 6 September, 2013;
originally announced September 2013.
-
Weak Lensing by Large-Scale Structure with the FIRST Radio Survey
Authors:
A. Refregier,
S. T. Brown,
M. Kamionkowski,
D. J. Helfand,
C. M. Cress,
A. Babul,
R. Becker,
R. L. White
Abstract:
The coherent image distortions induced by weak gravitational lensing can be used to measure the power spectrum of density inhomogeneities in the universe. We present our on-going effort to detect this effect with the FIRST radio survey, which currently contains about 400,000 sources over 4,200 square degrees, and thus provides a unique resource for this purpose. We discuss the sensitivity of our…
▽ More
The coherent image distortions induced by weak gravitational lensing can be used to measure the power spectrum of density inhomogeneities in the universe. We present our on-going effort to detect this effect with the FIRST radio survey, which currently contains about 400,000 sources over 4,200 square degrees, and thus provides a unique resource for this purpose. We discuss the sensitivity of our measurement in the context of various cosmological models. We then discuss the crucial issue of systematic effects, the most serious of which are source fragmentation, image-noise correlation, and VLA-beam anisotropy. After accounting for these effects, we expect our experiment to yield a detection, or at least a tight upper limit, for the weak lensing power spectrum on 0.2-20 degree scales.
△ Less
Submitted 1 October, 1998;
originally announced October 1998.
-
Effect of Correlated Noise on Source Shape Parameters and Weak Lensing Measurements
Authors:
A. Refregier,
S. T. Brown
Abstract:
The measurement of shape parameters of sources in astronomical images is usually performed by assuming that the underlying noise is uncorrelated. Spatial noise correlation is however present in practice due to various observational effects and can affect source shape parameters. This effect is particularly important for measurements of weak gravitational lensing, for which the sought image disto…
▽ More
The measurement of shape parameters of sources in astronomical images is usually performed by assuming that the underlying noise is uncorrelated. Spatial noise correlation is however present in practice due to various observational effects and can affect source shape parameters. This effect is particularly important for measurements of weak gravitational lensing, for which the sought image distortions are typically of the order of only 1%. We compute the effect of correlated noise on two-dimensional gaussian fits in full generality. The noise properties are naturally quantified by the noise autocorrelation function (ACF), which is easily measured in practice. We compute the resulting bias on the mean, variance and covariance of the source parameters, and the induced correlation between the shapes of neighboring sources. We show that these biases are of second order in the inverse signal-to-noise ratio of the source, and could thus be overlooked if bright stars are used to monitor systematic distortions. Radio interferometric surveys are particularly prone to this effect because of the long-range pixel correlations produced by the Fourier inversion involved in their image construction. As a concrete application, we consider the search for weak lensing by large-scale structure with the FIRST radio survey. We measure the noise ACF for a FIRST coadded field, and compute the resulting ellipticity correlation function induced by the noise. In comparison with the weak-lensing signal expected in CDM models, the noise correlation effect is important on small angular scales, but is negligible for source separations greater than about 1 arcmin. We also discuss how noise correlation can affect weak-lensing studies with optical surveys.
△ Less
Submitted 24 March, 1998;
originally announced March 1998.