-
DeepLINK-T: deep learning inference for time series data using knockoffs and LSTM
Authors:
Wenxuan Zuo,
Zifan Zhu,
Yuxuan Du,
Yi-Chun Yeh,
Jed A. Fuhrman,
**chi Lv,
Yingying Fan,
Fengzhu Sun
Abstract:
High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains un…
▽ More
High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains underexplored. This study introduces a novel method, Deep Learning Inference using Knockoffs for Time series data (DeepLINK-T), focusing on the selection of significant time series variables in regression while controlling the false discovery rate (FDR) at a predetermined level. DeepLINK-T combines deep learning with knockoff inference to control FDR in feature selection for time series models, accommodating a wide variety of feature distributions. It addresses dependencies across time and features by leveraging a time-varying latent factor structure in time series covariates. Three key ingredients for DeepLINK-T are 1) a Long Short-Term Memory (LSTM) autoencoder for generating time series knockoff variables, 2) an LSTM prediction network using both original and knockoff variables, and 3) the application of the knockoffs framework for variable selection with FDR control. Extensive simulation studies have been conducted to evaluate DeepLINK-T's performance, showing its capability to control FDR effectively while demonstrating superior feature selection power for high-dimensional longitudinal time series data compared to its non-time series counterpart. DeepLINK-T is further applied to three metagenomic data sets, validating its practical utility and effectiveness, and underscoring its potential in real-world applications.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Enabling End-to-End Secure Federated Learning in Biomedical Research on Heterogeneous Computing Environments with APPFLx
Authors:
Trung-Hieu Hoang,
Jordan Fuhrman,
Ravi Madduri,
Miao Li,
Pranshu Chaturvedi,
Zilinghan Li,
Kibaek Kim,
Minseok Ryu,
Ryan Chard,
E. A. Huerta,
Maryellen Giger
Abstract:
Facilitating large-scale, cross-institutional collaboration in biomedical machine learning projects requires a trustworthy and resilient federated learning (FL) environment to ensure that sensitive information such as protected health information is kept confidential. In this work, we introduce APPFLx, a low-code FL framework that enables the easy setup, configuration, and running of FL experiment…
▽ More
Facilitating large-scale, cross-institutional collaboration in biomedical machine learning projects requires a trustworthy and resilient federated learning (FL) environment to ensure that sensitive information such as protected health information is kept confidential. In this work, we introduce APPFLx, a low-code FL framework that enables the easy setup, configuration, and running of FL experiments across organizational and administrative boundaries while providing secure end-to-end communication, privacy-preserving functionality, and identity management. APPFLx is completely agnostic to the underlying computational infrastructure of participating clients. We demonstrate the capability of APPFLx as an easy-to-use framework for accelerating biomedical studies across institutions and healthcare systems while maintaining the protection of private medical data in two case studies: (1) predicting participant age from electrocardiogram (ECG) waveforms, and (2) detecting COVID-19 disease from chest radiographs. These experiments were performed securely across heterogeneous compute resources, including a mixture of on-premise high-performance computing and cloud computing, and highlight the role of federated learning in improving model generalizability and performance when aggregating data from multiple healthcare systems. Finally, we demonstrate that APPFLx serves as a convenient and easy-to-use framework for accelerating biomedical studies across institutions and healthcare system while maintaining the protection of private medical data.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service
Authors:
Zilinghan Li,
Shilan He,
Pranshu Chaturvedi,
Trung-Hieu Hoang,
Minseok Ryu,
E. A. Huerta,
Volodymyr Kindratenko,
Jordan Fuhrman,
Maryellen Giger,
Ryan Chard,
Kibaek Kim,
Ravi Madduri
Abstract:
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx empl…
▽ More
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx employs Globus authentication to allow users to easily and securely invite trustworthy collaborators for PPFL, implements several synchronous and asynchronous FL algorithms, streamlines the FL experiment launch process, and enables tracking and visualizing the life cycle of FL experiments, allowing domain experts and ML practitioners to easily orchestrate and evaluate cross-silo FL under one platform. APPFLx is available online at https://appflx.link
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
First Flight Performance of the Micro-X Microcalorimeter X-Ray Sounding Rocket
Authors:
Joseph S. Adams,
Robert Baker,
Simon R. Bandler,
Noemie Bastidon,
Daniel Castro,
Meredith E. Danowksi,
William B. Doriese,
Megan E. Eckart,
Enectali Figueroa-Feliciano,
Joshua Fuhrman,
David C. Goldfinger,
Sarah N. T. Heine,
Gene Hilton,
Antonia J. F. Hubbard,
Daniel Jardin,
Richard L. Kelley,
Caroline A. Kilbourne,
Steven W. Leman,
Renee E. Manzagol-Harwood,
Dan McCammon,
Philip H. H. Oakley,
Takashi Okajima,
Frederick Scott Porter,
Carl D. Reintsema,
John Rutherford
, et al. (6 additional authors not shown)
Abstract:
The flight of the Micro-X sounding rocket on July 22, 2018 marked the first operation of Transition-Edge Sensors and their SQUID readouts in space. The instrument combines the microcalorimeter array with an imaging mirror to take high-resolution spectra from extended X-ray sources. The first flight target was the Cassiopeia~A Supernova Remnant. While a rocket pointing malfunction led to no time on…
▽ More
The flight of the Micro-X sounding rocket on July 22, 2018 marked the first operation of Transition-Edge Sensors and their SQUID readouts in space. The instrument combines the microcalorimeter array with an imaging mirror to take high-resolution spectra from extended X-ray sources. The first flight target was the Cassiopeia~A Supernova Remnant. While a rocket pointing malfunction led to no time on-target, data from the flight was used to evaluate the performance of the instrument and demonstrate the flight viability of the payload. The instrument successfully achieved a stable cryogenic environment, executed all flight operations, and observed X-rays from the on-board calibration source. The flight environment did not significantly affect the performance of the detectors compared to ground operation. The flight provided an invaluable test of the impact of external magnetic fields and the instrument configuration on detector performance. This flight provides a milestone in the flight readiness of these detector and readout technologies, both of which have been selected for future X-ray observatories.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Micro-X Sounding Rocket Payload Re-flight Progress
Authors:
J. S. Adams,
S. R. Bandler,
N. Bastidon,
M. E. Eckart,
E. Figueroa-Feliciano,
J. Fuhrman,
D. C. Goldfinger,
A. J. F. Hubbard,
D. Jardin,
R. L. Kelley,
C. A. Kilbourne,
R. E. Manzagol-Harwood,
D. McCammon,
T. Okajima,
F. S. Porter,
C. D. Reintsema,
S. J. Smith
Abstract:
Micro-X is an X-ray sounding rocket payload that had its first flight on July 22, 2018. The goals of the first flight were to operate a transition edge sensor (TES) X-ray microcalorimeter array in space and take a high-resolution spectrum of the Cassiopeia A supernova remnant. The first flight was considered a partial success. The array and its time-division multiplexing readout system were succes…
▽ More
Micro-X is an X-ray sounding rocket payload that had its first flight on July 22, 2018. The goals of the first flight were to operate a transition edge sensor (TES) X-ray microcalorimeter array in space and take a high-resolution spectrum of the Cassiopeia A supernova remnant. The first flight was considered a partial success. The array and its time-division multiplexing readout system were successfully operated in space, but due to a failure in the attitude control system, no time on-target was acquired. A re-flight has been scheduled for summer 2022. Since the first flight, modifications have been made to the detector systems to improve noise and reduce the susceptibility to magnetic fields. The three-stage SQUID circuit, NIST MUX06a, has been replaced by a two-stage SQUID circuit, NIST MUX18b. The initial laboratory results for the new detector system will be presented in this paper.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Modeling a Three-Stage SQUID System in Space with the First Micro-X Sounding Rocket Flight
Authors:
J. S. Adams,
S. R. Bandler,
N. Bastidon,
M. E. Eckart,
E. Figueroa-Feliciano,
J. Fuhrman,
D. C. Goldfinger,
A. J. F. Hubbard,
D. Jardin,
R. L. Kelley,
C. A. Kilbourne,
R. E. Manzagol-Harwood,
D. McCammon,
T. Okajima,
F. S. Porter,
C. D. Reintsema,
S. J. Smith
Abstract:
The Micro-X sounding rocket is a NASA funded X-ray telescope payload that completed its first flight on July 22, 2018. This event marked the first operation of Transition Edge Sensors (TESs) and their SQUID-based multiplexing readout system in space. Unfortunately, due to an ACS pointing failure, the rocket was spinning during its five minute observation period and no scientific data was collected…
▽ More
The Micro-X sounding rocket is a NASA funded X-ray telescope payload that completed its first flight on July 22, 2018. This event marked the first operation of Transition Edge Sensors (TESs) and their SQUID-based multiplexing readout system in space. Unfortunately, due to an ACS pointing failure, the rocket was spinning during its five minute observation period and no scientific data was collected. However, data collected from the internal calibration source marked a partial success for the payload and offers a unique opportunity to study the response of TESs and SQUIDs in space. Of particular interest is the magnetic field response of the NIST MUX06a SQUID readout system to tumbling through Earth's magnetic field. We present a model to explain the baseline response of the SQUIDs, which lead to a subset of pixels failing to "lock" for the full observational period. Future flights of the Micro-X rocket will include the NIST MUX18b SQUID system with dramatically reduced magnetic susceptibility.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
First operation of Transition-Edge Sensors in space with the Micro-X sounding rocket
Authors:
J. S. Adams,
R. Baker,
S. R. Bandler,
N. Bastidon,
M. E. Danowski,
W. B. Doriese,
M. E. Eckart,
E. Figueroa-Feliciano,
J. Fuhrman,
D. C. Goldfinger,
S. N. T. Heine,
G. C. Hilton,
A. J. F. Hubbard,
D. Jardin,
R. L. Kelley,
C. A. Kilbourne,
R. E. Manzagol-Harwood,
D. McCammon,
T. Okajima,
F. S. Porter,
C. D. Reintsema,
P. Serlemitsos,
S. J. Smith,
P. Wikus
Abstract:
With its first flight in 2018, Micro-X became the first program to fly Transition-Edge Sensors and their SQUID readouts in space. The science goal was a high-resolution, spatially resolved X-ray spectrum of the Cassiopeia A Supernova Remnant. While a rocket pointing error led to no time on target, the data was used to demonstrate the flight performance of the instrument. The detectors observed X-r…
▽ More
With its first flight in 2018, Micro-X became the first program to fly Transition-Edge Sensors and their SQUID readouts in space. The science goal was a high-resolution, spatially resolved X-ray spectrum of the Cassiopeia A Supernova Remnant. While a rocket pointing error led to no time on target, the data was used to demonstrate the flight performance of the instrument. The detectors observed X-rays from the on-board calibration source, but a susceptibility to external magnetic fields limited their livetime. Accounting for this, no change was observed in detector response between ground operation and flight operation. This paper provides an overview of the first flight performance and focuses on the upgrades made in preparation for reflight. The largest changes have been upgrading the SQUIDs to mitigate magnetic susceptibility, synchronizing the clocks on the digital electronics to minimize beat frequencies, and replacing the mounts between the cryostat and the rocket skin to improve mechanical integrity. As the first flight performance was consistent with performance on the ground, reaching the instrument goals in the laboratory is considered a strong predictor of future flight performance.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
The role of negative emissions in meeting China's 2060 carbon neutrality goal
Authors:
Jay Fuhrman,
Andres F. Clarens,
Haewon McJeon,
Pralit Patel,
Scott C. Doney,
William M. Shobe,
Shreekar Pradhan
Abstract:
China's pledge to reach carbon neutrality before 2060 is an ambitious goal and could provide the world with much-needed leadership on how to limit warming to +1.5C warming above pre-industrial levels by the end of the century. But the pathways that would achieve net zero by 2060 are still unclear, including the role of negative emissions technologies. We use the Global Change Analysis Model to sim…
▽ More
China's pledge to reach carbon neutrality before 2060 is an ambitious goal and could provide the world with much-needed leadership on how to limit warming to +1.5C warming above pre-industrial levels by the end of the century. But the pathways that would achieve net zero by 2060 are still unclear, including the role of negative emissions technologies. We use the Global Change Analysis Model to simulate how negative emissions technologies, in general, and direct air capture (DAC) in particular, could contribute to China's meeting this target. Our results show that negative emissions could play a large role, offsetting on the order of 3 GtCO2 per year from difficult-to-mitigate sectors such as freight transportation and heavy industry. This includes up to a 1.6 GtCO2 per year contribution from DAC, constituting up to 60% of total projected negative emissions in China. But DAC, like bioenergy with carbon capture and storage and afforestation, has not yet been demonstrated at anywhere approaching the scales required to meaningfully contribute to climate mitigation. Deploying NETs at these scales will have widespread impacts on financial systems and natural resources such as water, land, and energy in China.
△ Less
Submitted 19 April, 2021; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Identifying viruses from metagenomic data by deep learning
Authors:
Jie Ren,
Kai Song,
Chao Deng,
Nathan A. Ahlgren,
Jed A. Fuhrman,
Yi Li,
Xiaohui Xie,
Fengzhu Sun
Abstract:
The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a ref…
▽ More
The recent development of metagenomic sequencing makes it possible to sequence microbial genomes including viruses in an environmental sample. Identifying viral sequences from metagenomic data is critical for downstream virus analyses. The existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences. Here we have developed a reference-free and alignment-free machine learning method, DeepVirFinder, for predicting viral sequences in metagenomic data using deep learning techniques. DeepVirFinder was trained based on a large number of viral sequences discovered before May 2015. Evaluated on the sequences after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths. Enlarging the training data by adding millions of purified viral sequences from environmental metavirome samples significantly improves the accuracy for predicting under-represented viruses. Applying DeepVirFinder to real human gut metagenomic samples from patients with colorectal carcinoma (CRC) identified 51,138 viral sequences belonging to 175 bins. Ten bins were associated with the cancer status, indicating their potential use for non-invasive diagnosis of CRC. In summary, DeepVirFinder greatly improved the precision and recall rates of viral identification, and it will significantly accelerate the discovery rate of viruses.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge
Authors:
Yang Young Lu,
Ting Chen,
Jed A. Fuhrman,
Fengzhu Sun
Abstract:
The advent of next-generation sequencing (NGS) technologies enables researchers to sequence complex microbial communities directly from environment. Since assembly typically produces only genome fragments, also known as contigs, instead of entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU c…
▽ More
The advent of next-generation sequencing (NGS) technologies enables researchers to sequence complex microbial communities directly from environment. Since assembly typically produces only genome fragments, also known as contigs, instead of entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based upon sequence composition and coverage across multiple samples.
The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison to state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is employing $L_{1}$ distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization.
In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT ,GroopM, MaxBin and MetaBAT.
The software is available at https://github.com/younglululu/COCACOLA
△ Less
Submitted 8 April, 2016;
originally announced April 2016.