ACES: Automatic Cohort Extraction System for Event-Stream Datasets

Justin Xu
University of Oxford
[email protected]
\AndJack Gallifant
Massachusetts Institute of Technology
[email protected]
\AndAlistair E. W. Johnson
Independent Scientist
[email protected]
\AndMatthew B. A. McDermott
Harvard Medical School
[email protected]
Abstract

Reproducibility remains a significant challenge in machine learning (ML) for healthcare. In this field, datasets, model pipelines, and even task/cohort definitions are often private, leading to a significant barrier in sharing, iterating, and understanding ML results on electronic health record (EHR) datasets. In this paper, we address a significant part of this problem by introducing the Automatic Cohort Extraction System for Event-Stream Datasets (ACES). This tool is designed to simultaneously simplify the development of task/cohorts for ML in healthcare and enable the reproduction of these cohorts, both at an exact level for single datasets and at a conceptual level across datasets. To accomplish this, ACES provides (1) a highly intuitive and expressive configuration language for defining both dataset-specific concepts and dataset-agnostic inclusion/exclusion criteria, and (2) a pipeline to automatically extract patient records that meet these defined criteria from real-world data. ACES can be automatically applied to any dataset in either the Medical Event Data Standard (MEDS) or EventStreamGPT (ESGPT) formats, or to any dataset for which the necessary task-specific predicates can be extracted in an event-stream form. ACES has the potential to significantly lower the barrier to entry for defining ML tasks, redefine the way researchers interact with EHR datasets, and significantly improve the state of reproducibility for ML studies in this modality. ACES is available at https://github.com/justin13601/aces.

1 Introduction

Machine learning (ML) for healthcare suffers from a severe and systemic reproducibility crisis [9]. This challenge is further exacerbated by the need to maintain private and secure datasets, but even with public datasets, ML pipelines are not reliably reproducible from published papers alone. For instance, in numerous attempts to reproduce ML for healthcare studies using the MIMIC-III dataset [6], Johnson et al. found that more than half the time, the cohorts described in the studies could not be reliably reconstructed [5].

This burden in reproducing the basic task and problem definitions in ML for healthcare studies is profoundly detrimental. Beyond the obvious concerns it raises around the robustness of reported results and their readiness for deployment, our inability to reliably define shared, canonical, reproducible task definitions limits our capacity to perform meaningful model comparisons during methodological development. This is particularly notable in settings where not all researchers have mutual access to all datasets, as is common in healthcare. Given the critical role that open benchmarks play in the advancement of ML methods [23, 17, 19], this deficit directly translates to a significant barrier in our ability, as a research community, to effectively experiment, iterate, and develop new ML methodologies in the healthcare space.

Given the clear import of this problem, the research community has naturally explored a number of prospective solutions. These can be largely categorized into two areas: (1) leveraging existing common data models (CDMs) to define reproducible task cohorts only for datasets within these schemas, and (2) defining static benchmarking tasks on individual public datasets. Both of these areas have generated numerous high-impact works. For example, in the area of CDM-driven tools, systems such as the ATLAS tool [13] for OHDSI’s OMOP CDM [12], i2b2’s query tool for the i2b2 CDM [11], and institution-specific tools such as the Stanford Research Repository (STARR) OMOP system and the NIHR Infections in Oxfordshire Database (IORD) cohort discovery platform have all been used to drive numerous new lines of inquiry. Unfortunately, these tools are also extremely limited in that they can only apply to the specific CDM or institutional data warehouse for which they have been defined. Further, because many of these CDMs have had limited penetrance into healthcare’s high-capacity, deep learning ecosystems, they are particularly ill-posed for task and cohort extraction within the deep learning communities. Conversely, public static benchmarks [3, 8, 20] over datasets such as MIMIC-IV [4] or eICU [15] have also been extremely impactful for the ML community. However, they are all tied to only a single or small set of datasets and tasks. Given the highly dynamic nature of clinical data and healthcare requirements, this is insufficient for the benchmarking and reproducibility needs faced by the ML for healthcare community.

When considering these existing solutions alongside the realities of healthcare data access and methodological development, it is clear that they are insufficient for three key reasons:

1. The Need for Interoperability The limited public datasets and only partially used CDMs cannot capture the diverse clinical populations, needs, and model capacities necessary for tangible ML progress in healthcare. To address this, systems for automated task extraction must be meaningfully interoperable across both public and private datasets with diverse input schemas.

2. The Need for Flexibility A single, static benchmark cannot encompass the variety of clinical tasks relevant to clinicians and informaticians. As existing tools (with only limited interfaces to define queries of per-set vocabularies) may struggle to generalize to new clinical tasks, ideal solutions must be flexible enough to accommodate a myriad of new task definitions, criteria formats, and disease or deployment areas.

3. The Need for Accessibility, Usability, & Applicability in Deep Learning Workflows While many existing tools present with no-code interfaces (e.g., web tools to build queries) that are essential for less technically literate audiences, integrating such tools with deep learning workflows can prove challenging. Deep learning systems are often run in a semi-programmatic manner on siloed, private computational clusters where researchers have minimal control. Hence, existing tools can be significantly inconvenient. Instead, successful tools must be able to provide a Python and command-line interface (CLI) that offer significant ease of use to deep learning researchers, alongside shareable and readable configuration files that specify task definitions in a manner that can be readily ported across datasets and environments.

Our Solution: Automatic Cohort Extraction System for Event-Stream Datasets

In this work, we solve these problems with the Automatic Cohort Extraction System for Event-Stream Datasets (ACES). ACES offers a simple, expressive, and shareable configuration file language for task definitions and a reliable command-line- and Python-accessible tool for extracting labeled task dataframes. Task definitions in ACES are naturally separated into simple dataset-specific event predicates and dataset-agnostic inclusion/exclusion criteria, thereby permitting the same task to be used in a conceptually identical manner across diverse datasets. This approach not only enhances reproducibility but also facilitates community collaboration on task definitions, inclusion/exclusion criteria, and evaluation metrics for specific clinical use cases (Figure 1).

In contrast to prior task-definition systems, such as OHDSI’s ATLAS tool, ACES makes minimal assumptions about the input data structure or source vocabularies. In particular, ACES can be run on any dataset, provided the necessary task-specific predicates can be pre-extracted in an event-stream format111e.g., in a dataframe with columns for patient ID, timestamp, and event predicate values, and can further be run from raw data directly for any dataset in the relatively low-level and flexible Medical Event Data Standard (MEDS) [1] or EventStreamGPT (ESGPT) [10] formats. To demonstrate the utility and flexibility of ACES, we also release a collection of task definitions (Section 4) based on prior ML for healthcare works at both the dataset-agnostic criteria level and with dataset-specific predicates for the widely-used MIMIC-IV dataset [4].

In sum, ACES represents 3 key contributions:

  1. 1.

    ACES defines a shareable, simple, and flexible task configuration language that can define diverse sets of prediction tasks for ML in healthcare on any event-stream dataset.

  2. 2.

    ACES provides an easy-to-use utility to automatically extract these tasks from diverse sources of real-world, structured, and longitudinal electronic health record (EHR) data.

  3. 3.

    Through these advancements, ACES is poised to significantly advance the state of reproducibility, interoperability, and effective development of ML methods for healthcare.

In the rest of this work, we will do the following. First, in Section 2, we will go through ACES in more depth, beginning by illustrating the key concepts of ACES using a running example with ACES CLI. Next, in Section 3, we will discuss the ACES design principles and briefly overview its core algorithm. Subsequently, in Section 4, we will demonstrate the use of ACES on diverse problem areas over real-world data, before ultimately discussing the limitations and future roadmap of ACES in Section 5, and offering concluding thoughts in Section 6.

Refer to caption
Figure 1: ML task cohort extraction process with ACES and without ACES. Predicates are dataset-specific concepts that are needed to conceptually capture a machine learning task. Windows are temporal segments on a patient’s health record and are dataset-agnostic, as they are defined relative to the predicates. This distinction allows researchers to share the more complex task logic that is independent of datasets, enabling conceptual reproducibility for ML tasks in healthcare.

2 Automatic Cohort Extraction System for Event-Stream Datasets (ACES)

In this section, we introduce ACES, a novel automated task and cohort definition and extraction system that fills the key gaps in interoperability, flexibility, and accessibility left by the existing tools outlined in Section 1. To use ACES and extract a cohort for downstream ML tasks, a user only needs to do the following simple steps:

1. Install ACES: A fully functional version of ACES is pushed to PyPI, and any user can easily install it by simply running pip install es-aces. All dependencies are automatically set up with no further actions needed by the user.

2. Define Dataset: A dataset in a permitted format, such as MEDS, ESGPT, or as direct predicates, is required. More information on the data formats is available in Section 2.1.

3. Define Task: A task configuration file is required to define the task that the user wishes to extract. This configuration language is simple, clear, yet flexible, permitting users to rapidly share and iterate over task definitions for their clinical settings. Configuration specification is given in Section 2.2 and the Appendix.

4. Run the ACES CLI: ACES can be directly run from the command-line using:

1$ aces-cli cohort_name="$TASK" data.path="$DATA_PATH"

Additional details about the possible command-line arguments are detailed in Section 2.3. ACES can also be used as a direct Python import, as detailed in Section 2.5.

5. Get Outputs: The resulting output from ACES is a single unified dataframe with all valid patient instances extracted according to task specifications. Users can subsequently leverage the returned columns with original patient identifiers and health record timestamps for downstream ML tasks and benchmark creation. ACES also returns additional columns for the task label, as well as summaries of each of the predicates over windows of the patient record. Further information is available in Section 2.4 and in the example output displayed in Figure 3.

Critically, after only these five simple steps, a user can immediately, reproducibly extract a full cohort from their source dataset that matches their task definition, and begin using this task for downstream machine learning or informatics applications. Note: Due to space constraints, full technical details on all aspects of ACES and the precise details of the recursive algorithm used to extract task cohorts given a configuration file are limited to the Appendix and the online ACES documentation, available here: https://eventstreamaces.readthedocs.io/en/latest/.

2.1 Dataset Configurations

ACES is extremely flexible and can handle different data formats, including data.standard=meds [1], data.standard=esgpt [10], or data.standard=direct, where event-stream predicate views are pre-extracted by the user from any dataset schema. Note that other formats are interchangeable with these formats, such as OHDSI OMOP through the MEDS OMOP ETL222https://github.com/Medical-Event-Data-Standard/meds_etl/tree/main. The use of direct predicates for extraction from datasets in a format that ACES does not natively support still significantly reduces the burden on users. Simply creating predicates is much less cumbersome than either fully converting the dataset via a CDM in order to use existing tools like ATLAS or i2b2, or performing the entire task extraction from scratch by writing in-house querying code. This demonstrates the significant improvement in utility that ACES brings across diverse data schemas compared to existing tools.

2.2 Task Configurations

In ACES, tasks are specified through configuration files that define a collection of dataset-specific event predicates, which are simple functions evaluated on individual events within a structured medical dataset. Additionally, task criteria are defined in a dataset-agnostic manner through a collection of interrelated windows, which specify regions of a patient’s record and are constrained by certain relationships. Please see Figure 2 for an example of a task configuration.

Refer to caption
Figure 2: Example of a configuration file for the binary prediction of in-hospital mortality 48 hours after admission. References to predicates and windows are italicized and bolded, respectively. (A) Dataset-specific task predicates. These concepts are needed to conceptually capture this task and are used as constraints and boundaries for windows of the patient record. For instance, in this example, the value of “ADMISSION” denotes a hospital admission event in the external source dataset. (B) A window of the task specifying the task inputs for downstream models. Suppose we’d like to use all historic patient data up to and including 24 hours past the admission. An arbitrary criterion requiring more than 5 records can then be placed on this window to ensure that the extracted cohort contains sufficient input data (at least 5 events). (C) The trigger events for the task, which are hospital admissions as we’d like to make a mortality prediction for each admission. (D) A window of the task specifying a gap in the patient timeline. Suppose we’d like to set a minimum length of admission for our cohort (e.g., 48 hours). A temporal constraint (minimum window duration) of 48 hours could then be set to represent this requirement. (E) A window of the task specifying the task target, which is set from the end of the gap window in (D) to the immediately subsequent “discharge” or “death” predicate. This creates our binary classes for the task (i.e., discharge=0𝑑𝑖𝑠𝑐𝑎𝑟𝑔𝑒0discharge=0italic_d italic_i italic_s italic_c italic_h italic_a italic_r italic_g italic_e = 0, death=1𝑑𝑒𝑎𝑡1death=1italic_d italic_e italic_a italic_t italic_h = 1). All windows are interrelated on the patient timeline, as shown by how each window references another in the configuration file.

2.3 Command-Line Interface Arguments

Hydra Arguments

The Hydra framework [21] enhances the CLI by enabling flexible run configurations and argument parsing for cohort extractions. For instance, specific arguments are required to define the external source dataset for data loading. The data.standard argument can be set to meds, esgpt, or direct, corresponding to the three supported ACES formats. Depending on the chosen format, either the path to the data file (for MEDS or direct predicates) or the path to the dataset directory (for ESGPT) must be specified to indicate the external source data from which ACES will extract the cohort. Additionally, cohort_dir and cohort_name are essential for locating and loading the task configuration file, as well as for saving results and logging operational data.

Scaling to Large Datasets

For users dealing with large datasets, ACES can also be run over a collection of sharded files, extracting and storing the matching cohort for each shard individually in matching file paths. This can greatly increase computational efficiency by facilitating the processing of different shards in parallel via Hydra’s multi-run launchers333e.g., https://hydra.cc/docs/1.0/plugins/joblib_launcher/. 1$ aces-cli \ 2 --multirun \ 3 cohort_name="<task_config_name>" \ 4 cohort_dir="/directory/to/task/config/" \ 5 data=sharded \ 6 data.standard=meds \ 7 data.root="/directory/to/dataset/shards/" \ 8 "data.shard=\$(expand_shards <folder>/<num>)" # Sweeps over shards

2.4 Extraction Output

Finally, with a dataset configured for predicates and a task configuration, ACES will execute the extraction for the cohort and return a table where each row is a valid instance as per the criteria defined in the configuration file. Hence, each instance can be included in our cohort used for the downstream ML task. At the most basic level, the table contains the patient identifiers of our cohort, a user-defined timestamp that indexes prediction time, and a task label derived from a user-specified predicate. In addition, for each of the interrelated windows, a start and end timestamp is provided to segment the patient record, along with a summary of the number of predicates evaluated in said window. Please see Figure 3 for a sample output table.

Refer to caption
Figure 3: Sample output table from ACES. Continuing the example from Figure 2, we obtain this table after executing ACES extraction. The subject_id, index_timestamp, and label form the foundation of the defined cohort and are what the user would need, at a minimum, to obtain the necessary data for downstream tasks. In addition to these three columns, there exists a column for each of the defined windows in the task configuration, with its value being a structured object containing information about that particular window, such as the start and end times, as well as a summary of predicate counts. As seen in the values of this table, the timestamp of the end of the input window is always 24 hours after the trigger timestamp for each sample (row). Additionally, the window itself contains more than 5 of _ANY_EVENT—exactly matching our previously defined task configuration.

2.5 Python Usage

In addition to the command-line tool, we also provide a Python interface to allow researchers to easily leverage ACES for cohort extraction in their code pipelines. A full tutorial is provided at https://eventstreamaces.readthedocs.io/en/latest/notebooks/tutorial.html.

3 ACES Technical Design

Key Design Principles

ACES is built with the following key design principles in mind:

  1. 1.

    Structured Medical Data as Event-Stream Data: ACES views structured medical data as “event-stream” data, enabling the extraction of meaningful tasks and cohorts from continuous patient records.

  2. 2.

    Compatibility with Any Dataset: ACES is designed to be compatible with any dataset, provided the necessary dataset-specific predicates are extracted and formatted as an event-stream (this step is automated for any dataset that can be expressed in the MEDS [1] or ESGPT [10] formats)444Note: This includes datasets that can be converted into these formats with existing ETLs, such as OMOP datasets through the MEDS OMOP ETL..

  3. 3.

    Separation of Configuration Components: ACES clearly separates dataset-specific configuration components from dataset-agnostic configuration components, ensuring flexibility and ease-of-use.

  4. 4.

    Simple Configuration Language: ACES leverages a simple configuration language to maximize portability and facilitate community feedback and iteration on ML tasks and environments.

  5. 5.

    Parallelization for Computational Efficiency: ACES relies on parallelization over patient shards to handle large datasets efficiently, ensuring scalability and performance.

Algorithm Overivew

The key principle behind the ACES algorithm is that we decompose the task of determining whether a section of a patient’s record meets certain criteria specified in a configuration file into a series of recursive analyses. These analyses evaluate whether specific windows, defined relative to the realizations of other windows in the patient record, meet their criteria. By imposing only minimal constraints on the structure of the task configuration file, we can ensure that tasks can always be extracted using this recursive approach, thus forming the foundation of the entire ACES system. For more details, please see the Appendix.

4 Using ACES: A Repository of Task Configuration Examples

To demonstrate the flexibility and utility of ACES, we define and publicly release the task configurations described in Table 1, both with dataset-agnostic criteria and with dataset-specific predicate realizations based on the MEDS version of the public MIMIC-IV dataset [4]. Additional usage profiles, tasks, and areas, as well as details of the task cohorts extracted from MIMIC-IV, can be found in the Appendix. These various tasks have been previously studied, and ACES will facilitate their conceptual reproducibility to encourage benchmarking efforts and robustness in ML for healthcare.

Task Name Description
readmission/30d Predict hospital readmission within 30 days of discharge.
mortality/post_hospital_discharge/30d Predict mortality within 30 days of discharge.
mortality/in_hospital/first_24h Predict mortality within a hospital admission using the first 24 hours of data from that admission.
mortality/in_hospital/first_48h Predict mortality within a hospital admission using the first 48 hours of data from that admission.
mortality/in_icu/first_24h Predict mortality within an ICU admission using the first 24 hours of data from that admission.
mortality/in_icu/first_48h Predict mortality within a ICU admission using the first 48 hours of data from that admission.
Table 1: A collection of sample configuration files for various common tasks on MIMIC-IV [4]. These tasks can be easily generalized to other datasets, such as e-ICU [15] or other private intensive care unit (ICU) and inpatient datasets. Please see the Appendix for further examples.

5 Discussion

Additional Related Work

In addition to the existing tools discussed in Section 1, there are several other areas of related work relevant to ACES. Firstly, ACES could be directly connected with and support various health CDMs, such as OMOP, FHIR, PCORNET, and i2b2 [12, 2, 14, 11]. These models provide already-accepted standardized frameworks for organizing and analyzing healthcare data, and integrating ACES directly with them, rather than through ETLs, could greatly enhance its utility and interoperability. Static benchmarks that provide standardized datasets, evaluation metrics, and baseline methods for a range of clinical problems, such as YAIB [20], Harutyunyan et al.’s multitask learning clinical prediction benchmarks [3], and EHR-PT [8], can also be directly integrated with ACES to facilitate robust ML in healthcare. Lastly, ACES can be used in conjunction with various health data management tools, such as ESGPT [10], TemporAI [18], PyHealth [22], and OMOPLearn [7]. These tools offer functionalities for pre-processing, managing, and analyzing health data for downstream tasks, and integrating ACES with them directly can streamline ML workflows.

Beyond healthcare, ACES is applicable to data from a variety of other domains, such as for financial, climate, or social media data—essentially, ACES could be used for any structured, longitudinal data that can be reformatted as an event-stream. This versatility makes ACES a powerful tool for extracting and analyzing complex event-based datasets across different fields.

Limitations & Future Roadmap

ACES has several key limitations that can be addressed in future work. Firstly, while already very expressive, ACES’s task configuration language is currently still relatively limited. Expressing more complex kinds of predicates, window aggregations, labeling functions, and criteria would expand the scope of ACES significantly. Similarly, the configuration language can also be made more user-friendly, such as by allowing users to specify a list of values that all establish a single predicate directly, rather than having to define multiple predicates for each value. ACES is also very well poised to capture more complex patterns of task and cohort relationships, including prescribed systems of case/control matching, automated analyses, or propensity re-weighting over excluded populations. It would also possible to enable users to nest ACES configuration files to leverage extracted task labels as new predicates in more complex ACES extraction processes. Finally, with the standardization that ACES offers, new opportunities for human-data interaction are also made available, such as via a natural language interface to define ACES predicates or configuration files and, thus, to extract downstream tasks, patient cohorts, or derived datasets in a code-free manner. Preliminary analyses of the viability of this process with current large language models show significant promise, though much future work remains.

ACES: Enabling a New Kind of Benchmark

In addition to the clear impact of ACES on reproducibility, robustness, and accessibility of health datasets in ML for healthcare, we also feel that ACES is an essential tool for a “new kind of benchmark” in the field—and, in so being, is a portent of what needs to come should ML for healthcare progress to a more productive, communal, and impactful stage. In particular, we argue that for ML for healthcare to progress in the manner desired by the community, and most likely to be maximally positively impactful for all patients, we need to develop methodologies to test, share, and develop ML solutions across diverse datasets in a meaningful and reproducible manner, even without said datasets being publicly available to general researchers. This capability is critical because, without it, we will never be able to offer new inductive insights to clinical scientists or informaticians about which methods are most likely to work best on their novel, private data. In other words, if we cannot test our model training recipes across the diverse sets of clinical care settings, populations, and conceptual dataset schemas that exist in the real world, we similarly cannot expect those training recipes to generalize to said set of myriad downstream deployment areas. Tools like ACES, which make it as easy as possible for users to share the conceptual definitions of their tasks and prediction areas across datasets—in such a way that their colleagues can use them even over independent, private datasets—can help transform the kinds of benchmarking studies that we perform in ML for healthcare towards those that permit generalizable assessment of ML training recipes across datasets, clinical areas, and more.

6 Conclusion

In this work, we present the Automatic Cohort Extraction System for Event-Stream Datasets (ACES). ACES is a system designed to intuitively define cohorts and downstream tasks of interest in ML and reliably and automatically extract those cohorts from arbitrary datasets that are in an event-stream format. This system enables significantly greater shareability of task definitions, reproducibility of ML training and evaluation recipes, and is as easy to use as installing a package via pip and running a simple command-line tool. We feel that ACES will be an integral tool in the development of new kinds of benchmarks in ML for healthcare, which can be explored across both public and private datasets alike, and define and characterize the populations and tasks of interest in a manner that cleanly separates dataset-specific components from shareable dataset-agnostic components. To learn more about ACES and use it today in your work, please visit our GitHub repository at https://github.com/justin13601/aces and the online ACES documentation at https://eventstreamaces.readthedocs.io/en/latest/.

Acknowledgments and Disclosure of Funding

MBAM gratefully acknowledges support from a Berkowitz Postdoctoral Fellowship at Harvard Medical School. JX greatly appreciates support from supervisors David W. Eyre (Big Data Institute, University of Oxford) and Curtis P. Langlotz (Center for Artificial Intelligence in Medicine & Imaging, Stanford University). We also acknowledge contributions from Tom Pollard (Massachusetts Institute of Technology).

References

  • [1] Bert Arnrich, Edward Choi, Jason Alan Fries, Matthew B.A. McDermott, Jungwoo Oh, Tom Pollard, Nigam Shah, Ethan Steinberg, Michael Wornow, and Robin van de Water. Medical event data standard (MEDS): Facilitating machine learning for health. In ICLR 2024 Workshop on Learning from Time Series For Health, 2024.
  • [2] Duane Bender and Kamran Sartipi. Hl7 fhir: An agile and restful approach to healthcare information exchange. In Proceedings of the 26th IEEE international symposium on computer-based medical systems, pages 326–331. IEEE, 2013.
  • [3] Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Greg Ver Steeg, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1):96, June 2019. Publisher: Nature Publishing Group.
  • [4] Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo A. Celi, and Roger G. Mark. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data, 10(1):1, January 2023. Publisher: Nature Publishing Group.
  • [5] Alistair E. W. Johnson, Tom J. Pollard, and Roger G. Mark. Reproducibility in critical care: a mortality prediction case study. In Proceedings of the 2nd Machine Learning for Healthcare Conference, pages 361–376. PMLR, November 2017. ISSN: 2640-3498.
  • [6] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  • [7] Rohan Kodialam, Rebecca Boiarsky, Justin Lim, Aditya Sai, Neil Dixit, and David Sontag. Deep contextual clinical prediction with reverse distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 249–258, 2021.
  • [8] Matthew McDermott, Bret Nestor, Evan Kim, Wancong Zhang, Anna Goldenberg, Peter Szolovits, and Marzyeh Ghassemi. A comprehensive ehr timeseries pre-training benchmark. In Proceedings of the Conference on Health, Inference, and Learning, pages 257–278, 2021.
  • [9] Matthew B. A. McDermott, Shirly Wang, Nikki Marinsek, Rajesh Ranganath, Luca Foschini, and Marzyeh Ghassemi. Reproducibility in machine learning for health research: Still a ways to go. Science Translational Medicine, 13(586):eabb1655, March 2021.
  • [10] Matthew B.A. McDermott, Bret Nestor, Peniel N Argaw, and Isaac S. Kohane. Event stream GPT: A data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  • [11] Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh, Susanne Churchill, and Isaac Kohane. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association, 17(2):124–130, 2010.
  • [12] OHDSI. Omop common data model. https://www.ohdsi.org/data-standardization/. Accessed: 2024.
  • [13] OHDSI, Sigfried Gold, Chris Knoll, Anthony Sena, Frank DeFalco, Anton Abushkevich, Pavel Grafkin, Vlad Belousov, Alex Saltykov, Vitaly Koulakov, Sergey Suvorov, Anastasiia Klochkova, Maria Pozhidaeva, Jungmi Han, Anton Gackovka, Anton Stepanof, Mark Velez, Vlad Belousov, Chen Regen, Ekaterina Krivets, Semyon Titarenko, chgl, Joris Borgdorff, Valeri Antonov, Kwang Soo Jeong, Gowtham Rao, Alex Odysseus, Ajit Londhe, Alex Cumarav, Kai Kewley, Marc Suchard, Wonjun Hong, Tom White, Konstantin Yaroshovets, Shaun Turner, Taha Abdul-Basser, Richard D. Boyce, Tiago Novo, and Rowan Parry. Atlas, June 2024. original-date: 2015-07-08T16:26:35Z.
  • [14] PCORnet. pcornet: Common data model (cdm) specification, version 6.1, 2023.
  • [15] Tom J Pollard, Alistair E W Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13, 2018.
  • [16] Muriel Ramirez-Santana. Limitations and Biases in Cohort Studies. In Cohort Studies in Health Sciences. IntechOpen, February 2018.
  • [17] Olawale Salaudeen and Moritz Hardt. Imagenot: A contrast with imagenet preserves model rankings. arXiv preprint arXiv:2404.02112, 2024.
  • [18] Evgeny S Saveliev and Mihaela van der Schaar. Temporai: Facilitating machine learning innovation in time domain tasks for medicine. arXiv preprint arXiv:2301.12260, 2023.
  • [19] Ali Shirali, Rediet Abebe, and Moritz Hardt. A theory of dynamic benchmarks. In The Eleventh International Conference on Learning Representations, 2023.
  • [20] Robin van de Water, Hendrik Schmidt, Paul Elbers, Patrick Thoral, Bert Arnrich, and Patrick Rockenschaub. Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML, March 2024. arXiv:2306.05109 [cs].
  • [21] Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019.
  • [22] Chaoqi Yang, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. Pyhealth: A deep learning toolkit for healthcare applications. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 5788–5789, New York, NY, USA, 2023. Association for Computing Machinery.
  • [23] Guanhua Zhang and Moritz Hardt. Inherent trade-offs between diversity and stability in multi-task benchmark. arXiv preprint arXiv:2405.01719, 2024.

Appendix

The full online ACES documentation is available at https://eventstreamaces.readthedocs.io/en/latest/. We have also included a compiled PDF version of this documentation as an ancillary file.

To answer specific questions about ACES, please see the below index (links to the external online documentation are provided; references to chapters and page numbers of the ancillary PDF documentation are in brackets):

How do you use ACES?

  1. 1.

    What is a task and how do you specify one?
    Sample task descriptions and specifications are provided on the Task Examples page of the online documentation. Please refer to https://eventstreamaces.readthedocs.io/en/latest/notebooks/examples.html (Chapter 3, Pg 21-26).

    1. 1.1.

      What are predicates and how do you specify them?
      For an overview of predicates and how they form the foundation of ACES, please refer to the Predicates DataFrame page at https://eventstreamaces.readthedocs.io/en/latest/notebooks/predicates.html (Chapter 4, Pg 27-30).

    2. 1.2.

      What are windows and how do you specify them?
      A window in ACES is defined at https://eventstreamaces.readthedocs.io/en/latest/algorithm.html#window. Additionally, for details on how to define a window, please refer to https://eventstreamaces.readthedocs.io/en/latest/readme.html#windows (Chapter 1.4.3, Pg 9-10).

  2. 2.

    How do you extract a task from a dataset?
    For general ACES usage instructions, please refer to https://eventstreamaces.readthedocs.io/en/latest/usage.html#quick-start (Chapter 2.1, Pg 13-16). Additionally, brief end-to-end instructions are also available at https://eventstreamaces.readthedocs.io/en/latest/readme.html#instructions-for-use (Chapter 1.3, Pg 4-6).

    1. 2.1.

      Detailed Usage Instructions for ACES CLI
      For detailed instructions on using ACES CLI, please refer to the online Usage Guide at https://eventstreamaces.readthedocs.io/en/latest/usage.html#detailed-instructions (Chapter 2.2, Pg 16-17).

    2. 2.2.

      Tutorial for the ACES Python API
      For a step-by-step tutorial on using the ACES Python API, please refer to the online Code Example Notebook at https://eventstreamaces.readthedocs.io/en/latest/notebooks/tutorial.html (Chapter 5, Pg 31-41).

  3. 3.

    ACES with and vs. Other Tools
    For an overview of how ACES could be used with other existing complementary tools for reproducible machine learning, please refer to the online section at https://eventstreamaces.readthedocs.io/en/latest/readme.html#complementary-tools (Chapter 1.5.2, Pg 10).
    For an overview of ACES and other existing alternative tools for semi- or fully-automated cohort extraction, please refer to the online section at https://eventstreamaces.readthedocs.io/en/latest/readme.html#alternative-tools (Chapter 1.5.3, Pg 10).

How does ACES work?

  1. 1.

    What is the formal configuration language specification for ACES?
    For technical details on the ACES configuration language, please refer to the Configuration Language Specification page at https://eventstreamaces.readthedocs.io/en/latest/configuration.html (Chapter 6.1, Pg 43-46).

  2. 2.

    Glossary of ACES Terminology
    For a glossary of terminology used throughout ACES, please refer to the Algorithm Terminology page at https://eventstreamaces.readthedocs.io/en/latest/technical.html#algorithm-terminology (Chapter 6.3, Pg 49-50).

  3. 3.

    What is the ACES extraction algorithm?
    For technical details on the ACES algorithm, please refer to the Algorithm Design page at https://eventstreamaces.readthedocs.io/en/latest/technical.html#algorithm-design (Chapter 6.4, Pg 50-53).

  4. 4.

    Full ACES Module API Documentation
    For the complete ACES module documentation, including tests that ensure algorithm correctness, please refer to the Module API page at https://eventstreamaces.readthedocs.io/en/latest/api/modules.html (Chapter 8, Pg 57-114).

How well does ACES work?

  1. 1.

    Computational Profile
    For an overview of the computational profile of ACES, please refer to the Computational Profile page at https://eventstreamaces.readthedocs.io/en/latest/profiling.html (Chapter 7, Pg 55-57).

  2. 2.

    Further Examples
    For additional examples of configuration files and criteria of different machine learning for healthcare tasks, please refer to https://github.com/mmcdermott/PIE_MD/tree/main/tasks/criteria.