-
Investigating the validity of structure learning algorithms in identifying risk factors for intervention in patients with diabetes
Authors:
Sheresh Zahoor,
Anthony C. Constantinou,
Tim M Curtis,
Mohammed Hasanuzzaman
Abstract:
Diabetes, a pervasive and enduring health challenge, imposes significant global implications on health, financial healthcare systems, and societal well-being. This study undertakes a comprehensive exploration of various structural learning algorithms to discern causal pathways amongst potential risk factors influencing diabetes progression. The methodology involves the application of these algorit…
▽ More
Diabetes, a pervasive and enduring health challenge, imposes significant global implications on health, financial healthcare systems, and societal well-being. This study undertakes a comprehensive exploration of various structural learning algorithms to discern causal pathways amongst potential risk factors influencing diabetes progression. The methodology involves the application of these algorithms to relevant diabetes data, followed by the conversion of their output graphs into Causal Bayesian Networks (CBNs), enabling predictive analysis and the evaluation of discrepancies in the effect of hypothetical interventions within our context-specific case study.
This study highlights the substantial impact of algorithm selection on intervention outcomes. To consolidate insights from diverse algorithms, we employ a model-averaging technique that helps us obtain a unique causal model for diabetes derived from a varied set of structural learning algorithms. We also investigate how each of those individual graphs, as well as the average graph, compare to the structures elicited by a domain expert who categorised graph edges into high confidence, moderate, and low confidence types, leading into three individual graphs corresponding to the three levels of confidence.
The resulting causal model and data are made available online, and serve as a valuable resource and a guide for informed decision-making by healthcare practitioners, offering a comprehensive understanding of the interactions between relevant risk factors and the effect of hypothetical interventions. Therefore, this research not only contributes to the academic discussion on diabetes, but also provides practical guidance for healthcare professionals in develo** efficient intervention and risk management strategies.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Black-Box Access is Insufficient for Rigorous AI Audits
Authors:
Stephen Casper,
Carson Ezell,
Charlotte Siegmann,
Noam Kolt,
Taylor Lynn Curtis,
Benjamin Bucknall,
Andreas Haupt,
Kevin Wei,
Jérémy Scheurer,
Marius Hobbhahn,
Lee Sharkey,
Satyapriya Krishna,
Marvin Von Hagen,
Silas Alberti,
Alan Chan,
Qinyi Sun,
Michael Gerovitch,
David Bau,
Max Tegmark,
David Krueger,
Dylan Hadfield-Menell
Abstract:
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workin…
▽ More
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.
△ Less
Submitted 29 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code
Authors:
Catherine Feldman,
Smeet Chheda,
Alan C. Calder,
Eva Siegmann,
John Dey,
Tony Curtis,
Robert J. Harrison
Abstract:
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but fu…
▽ More
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Comparing OpenMP Implementations With Applications Across A64FX Platforms
Authors:
Benjamin Michalowicz,
Eric Raut,
Yan Kang,
Tony Curtis,
Barbara Chapman,
Dossay Oryspayev
Abstract:
The development of the A64FX processor by Fujitsu has created a massive innovation in High-Performance Computing and the birth of Fugaku: the current world's fastest supercomputer. A variety of tools are used to analyze the run-times and performances of several applications, and in particular, how these applications scale on the A64FX processor. We examine the performance and behavior of applicati…
▽ More
The development of the A64FX processor by Fujitsu has created a massive innovation in High-Performance Computing and the birth of Fugaku: the current world's fastest supercomputer. A variety of tools are used to analyze the run-times and performances of several applications, and in particular, how these applications scale on the A64FX processor. We examine the performance and behavior of applications through OpenMP scaling and how their performance differs across different compilers on the new Ookami cluster at Stony Brook University as well as the Fugaku supercomputer at RIKEN in Japan.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant
Authors:
Naomi E. Hannaford,
Sarah E. Heaps,
Tom M. W. Nye,
Thomas P. Curtis,
Ben Allen,
Andrew Golightly,
Darren J. Wilkinson
Abstract:
Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into thes…
▽ More
Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into these dynamic systems through the analysis of the microbial DNA sequences present. Unique taxa are inferred through sequence clustering to form operational taxonomic units (OTUs), with per-taxa abundance estimates obtained from corresponding sequence counts. The data in this study comprise weekly OTU counts from an activated sludge (AS) tank of a WWTP. To model the OTU dynamics, we develop a Bayesian hierarchical vector autoregressive model, which is a linear approximation to the commonly used generalised Lotka-Volterra (gLV) model. To tackle the high dimensionality and sparsity of the data, they are first clustered into 12 "bins" using a seasonal phase-based approach. The autoregressive coefficient matrix is assumed to be sparse, so we explore different shrinkage priors by analysing simulated data sets before selecting the regularised horseshoe prior for the biological application. We find that ammonia and chemical oxygen demand have a positive relationship with several bins and pH has a positive relationship with one bin. These results are supported by findings in the biological literature. We identify several negative interactions, which suggests OTUs in different bins may be competing for resources and that these relationships are complex. We also identify two positive interactions. Although simpler than a gLV model, our vector autoregression offers valuable insight into the microbial dynamics of the WWTP.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms
Authors:
Benjamin Michalowicz,
Eric Raut,
Yan Kang,
Tony Curtis,
Barbara Chapman,
Dossay Oryspayev
Abstract:
The development of the A64FX processor by Fujitsu has been a massive innovation in vectorized processors and led to Fugaku: the current world's fastest supercomputer. We use a variety of tools to analyze the behavior and performance of several OpenMP applications with different compilers, and how these applications scale on the different A64FX processors on clusters at Stony Brook University and R…
▽ More
The development of the A64FX processor by Fujitsu has been a massive innovation in vectorized processors and led to Fugaku: the current world's fastest supercomputer. We use a variety of tools to analyze the behavior and performance of several OpenMP applications with different compilers, and how these applications scale on the different A64FX processors on clusters at Stony Brook University and RIKEN.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Ookami: Deployment and Initial Experiences
Authors:
Andrew Burford,
Alan C. Calder,
David Carlson,
Barbara Chapman,
Firat CoŞKun,
Tony Curtis,
Catherine Feldman,
Robert J. Harrison,
Yan Kang,
Benjamin Michalow-Icz,
Eric Raut,
Eva Siegmann,
Daniel G. Wood,
Robert L. Deleon,
Mathew Jones,
Nikolay A. Simakov,
Joseph P. White,
Dossay Oryspayev
Abstract:
Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vec…
▽ More
Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Integrating Theory and Experiment to Explain the Breakdown of Population Synchrony in a Complex Microbial Community
Authors:
Emma J. Bowen,
Todd L. Parsons,
Thomas P. Curtis,
Joshua B. Plotkin,
Christopher Quince
Abstract:
We consider the extension of the `Moran effect', where correlated noise generates synchrony between isolated single species populations, to the study of synchrony between populations embedded in multi-species communities. In laboratory experiments on complex microbial communities, comprising both predators (protozoa) and prey (bacteria), we observe synchrony in abundances between isolated replicat…
▽ More
We consider the extension of the `Moran effect', where correlated noise generates synchrony between isolated single species populations, to the study of synchrony between populations embedded in multi-species communities. In laboratory experiments on complex microbial communities, comprising both predators (protozoa) and prey (bacteria), we observe synchrony in abundances between isolated replicates. A breakdown in synchrony occurs for both predator and prey as the reactor dilution rate increases, which corresponds to both an increased rate of input of external resources and an increased effective mortality though washout. The breakdown is more rapid, however, for the lower trophic level. We can explain this phenomenon using a mathematical framework for determining synchrony between populations in multi-species communities at equilibrium. We assume that there are multiple sources of environmental noise with different degrees of correlation that affect the individual species population dynamics differently. The deterministic dynamics can then influence the degree of synchrony between species in different communities. In the case of a stable equilibrium community synchrony is controlled by the eigenvalue with smallest negative real part. Intuitively fluctuations are minimally damped in this direction. We show that the experimental observations are consistent with this framework but only for multiplicative noise.
△ Less
Submitted 15 January, 2016;
originally announced January 2016.
-
Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples
Authors:
Matthew J. Wade,
Thomas P. Curtis,
Russell J. Davenport
Abstract:
In the rapidly evolving domain of next generation sequencing and bioinformatics analysis, data generation is one aspect that is increasing at a concomitant rate. The burden associated with processing large amounts of sequencing data has emphasised the need to allocate sufficient computing resources to complete analyses in the shortest possible time with manageable and predictable costs. A novel me…
▽ More
In the rapidly evolving domain of next generation sequencing and bioinformatics analysis, data generation is one aspect that is increasing at a concomitant rate. The burden associated with processing large amounts of sequencing data has emphasised the need to allocate sufficient computing resources to complete analyses in the shortest possible time with manageable and predictable costs. A novel method for predicting time to completion for a popular bioinformatics software (QIIME), was developed using key variables characteristic of the input data assumed to impact processing time. Multiple Linear Regression models were developed to determine run time for two denoising algorithms and a general bioinformatics pipeline. The models were able to accurately predict clock time for denoising sequences from a naturally assembled community dataset, but not an artificial community. Speedup and efficiency tests for AmpliconNoise also highlighted that caution was needed when allocating resources for parallel processing of data. Accurate modelling of computational processing time using easily measurable predictors can assist NGS analysts in determining resource requirements for bioinformatics software and pipelines. Whilst demonstrated on a specific group of scripts, the methodology can be extended to encompass other packages running on multiple architectures, either in parallel or sequentially.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.
-
A Bayesian Nonparametric System Reliability Model which Integrates Multiple Sources of Lifetime Information
Authors:
Richard L. Warr,
Jeremy M. Meyer,
Jackson T. Curtis
Abstract:
We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate propertie…
▽ More
We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate properties of the beta-Stacy process, which is the fundamental building block of our model. These individual models are linked together using a method of moments estimation approach. This model is computationally fast, allows for right-censored data, and is used for estimating and predicting system reliability.
△ Less
Submitted 21 March, 2022; v1 submitted 13 December, 2014;
originally announced December 2014.