License: CC BY 4.0
arXiv:2401.00077v1 [q-bio.NC] 29 Dec 2023

A Maturity Model for Operations in Neuroscience Research

Erik C. Johnson1,a1𝑎{}^{1,a}start_FLOATSUPERSCRIPT 1 , italic_a end_FLOATSUPERSCRIPT, Thinh T. Nguyen22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, Benjamin K. Dichter33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT, Frank Zappulla44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT, Montgomery Kosma22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT,
Kabilar Gunalan2,525{}^{2,5}start_FLOATSUPERSCRIPT 2 , 5 end_FLOATSUPERSCRIPT, Yaroslav O. Halchenko66{}^{6}start_FLOATSUPERSCRIPT 6 end_FLOATSUPERSCRIPT, Shay Q. Neufeld77{}^{7}start_FLOATSUPERSCRIPT 7 end_FLOATSUPERSCRIPT, Michael Schirner812812{}^{8-12}start_FLOATSUPERSCRIPT 8 - 12 end_FLOATSUPERSCRIPT, Petra Ritter812812{}^{8-12}start_FLOATSUPERSCRIPT 8 - 12 end_FLOATSUPERSCRIPT,
Maryann E. Martone1313{}^{13}start_FLOATSUPERSCRIPT 13 end_FLOATSUPERSCRIPT, Brock Wester11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Franco Pestilli1414{}^{14}start_FLOATSUPERSCRIPT 14 end_FLOATSUPERSCRIPT, Dimitri Yatsenko2,b2𝑏{}^{2,b}start_FLOATSUPERSCRIPT 2 , italic_b end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT Research and Exploratory Development Department,
Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT DataJoint, Houston, TX, USA
33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT CatalystNeuro, Benicia, CA, USA
44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT Digital R&D Creation Center, Pfizer Inc., USA
55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPT McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
66{}^{6}start_FLOATSUPERSCRIPT 6 end_FLOATSUPERSCRIPT Center for Open Neuroscience, Department of Psychological and Brain Sciences,
Dartmouth College, New Hampshire, USA
77{}^{7}start_FLOATSUPERSCRIPT 7 end_FLOATSUPERSCRIPT Inscopix, a Bruker company, Mountain View, CA, USA
88{}^{8}start_FLOATSUPERSCRIPT 8 end_FLOATSUPERSCRIPT Berlin Institute of Health (BIH) at Charité – Universitätsmedizin Berlin, Berlin, Germany
99{}^{9}start_FLOATSUPERSCRIPT 9 end_FLOATSUPERSCRIPT Department of Neurology with Experimental Neurology, Charité,
Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin
and Humboldt Universität zu Berlin, Berlin, Germany
1010{}^{10}start_FLOATSUPERSCRIPT 10 end_FLOATSUPERSCRIPT Bernstein Focus State Dependencies of Learning and
Bernstein Center for Computational Neuroscience, Berlin, Germany
1111{}^{11}start_FLOATSUPERSCRIPT 11 end_FLOATSUPERSCRIPT Einstein Center for Neuroscience Berlin, Berlin, Germany
1212{}^{12}start_FLOATSUPERSCRIPT 12 end_FLOATSUPERSCRIPT Einstein Center Digital Future, Berlin, Germany
1313{}^{13}start_FLOATSUPERSCRIPT 13 end_FLOATSUPERSCRIPT Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
1414{}^{14}start_FLOATSUPERSCRIPT 14 end_FLOATSUPERSCRIPT Department of Psychology, University of Texas Austin, Austin, TX, USA
a𝑎{}^{a}start_FLOATSUPERSCRIPT italic_a end_FLOATSUPERSCRIPT[email protected]; b𝑏{}^{b}start_FLOATSUPERSCRIPT italic_b end_FLOATSUPERSCRIPT[email protected]
Abstract

Scientists are adopting new approaches to scale up their activities and goals. Progress in neurotechnologies, artificial intelligence, automation, and tools for collaboration promises new bursts of discoveries. However, compared to other disciplines and the industry, neuroscience laboratories have been slow to adopt key technologies to support collaboration, reproducibility, and automation. Drawing on progress in other fields, we define a roadmap for implementing automated research workflows for diverse research teams. We propose establishing a five-level capability maturity model for operations in neuroscience research. Achieving higher levels of operational maturity requires new technology-enabled methodologies, which we describe as “SciOps”. The maturity model provides guidelines for evaluating and upgrading operations in multidisciplinary neuroscience teams.

Keywords DevOps  \cdot DataOps  \cdot MLOps  \cdot SciOps  \cdot Capability Maturity Model  \cdot neuroscience  \cdot operations research  \cdot open science  \cdot closed-loop experiments  \cdot digital twin  \cdot FAIR  \cdot reproducible research  \cdot automated workflows  \cdot artificial intelligence

1 Scaling operations in neuroscience research

Neuroscience, driven by technological advancement, stands on the cusp of groundbreaking discoveries. Current techniques enable the capture and modulation of intact neural circuit activity during complex behaviors, aligning these recordings with detailed molecular and anatomical maps. Advanced statistical tools unlock computational patterns in this vast data. Neuroscientists are increasingly coming together in large teams to develop and implement these new techniques at scales beyond the individual laboratory. However, larger projects often undergo chaotic phases as diverse teams strive to integrate disparate technical and organizational methods. The central question looms: How can advancements in neurotechnologies be translated into substantial scientific breakthroughs?

To keep pace with neurotechnologies, research teams require a revolution in their scientific operations. As the complexity of coordinating data and computation grows, it is crucial to boost collaboration, streamline processes, reduce errors, promote data integration and prioritize reproducibility. A new catalyst for discovery is the integration of artificial intelligence into research activities—a prospect that demands a formalized approach to scientific workflows1, 2. Through enhanced operations, teams can integrate and navigate vast data, blending human ingenuity with emerging artificial intelligence. As the challenges increase, a clear roadmap becomes vital, steering research teams from their current practices towards greater capacities for ambitious projects.

2 Capability Maturity Model for Scientific Operations

To address the question of research teams’ readiness to tackle complex challenges with robust methods, we introduce the Capability Maturity Model for Scientific Operations (Fig. 1). Capability Maturity Models (CMMs) have a well-established history in engineering and related fields, providing structured frameworks for assessing and enhancing processes within organizations3. These frameworks are commonly employed by corporations and governments to evaluate contractors and strategically plan for the future. Enhancing operational maturity equips teams to embark on ambitious initiatives aimed at solving complex problems.

Our inspiration for this model derives from the CMMI (Capability Maturity Model Integration) family of models, initially developed by the Software Engineering Institute at Carnegie Mellon University for software engineering and subsequently adapted to various industries4. Drawing on our extensive experience in coordinating large-scale research collaborations in neuroscience, we have tailored key concepts from CMMI to the unique challenges and opportunities presented in current neuroscience projects. While our primary focus remains on experimental neuroscience, the core principles we define here can apply broadly in other disciplines.

Our model places research teams into five maturity levels based on their approach to planning and executing essential activities. This categorization includes multiple criteria: team structure, formal processes, training, software development, data management, and computational infrastructure and procedures. The model can act as a step-by-step guide, assisting teams in identifying essential steps to expand their capabilities.

Refer to caption
Figure 1: The Capability Maturity Model for Scientific Operations in neuroscience research (Neuro SciOps CMM v1.0). We define SciOps as a set methodologies for implementing advanced capabilities.

Today, most neuroscience teams hover between Levels 1 and 2, though some specialization occurs among subfields. For instance, scientific operations in human neuroimaging tend to be more mature than in experimental neurophysiology5. Funding policies and publisher mandates for open data and reproducible results are nudging teams towards Level 3, which emphasizes community standards. By Level 4, teams prioritize research automation, scalable computing, and efficient team workflows. This approach is most commonly achieved in larger, centralized institutions, but can be adapted to smaller teams through better tools and platforms. However, the peak of Level 5, promising a significant leap in discoveries by closing the discovery loop with the help of artificial intelligence, remains a goal yet to be achieved by any team. We delve into each level in the sections below.

2.1 Level 1: Initial

At the outset of scientific endeavors, such as establishing a new laboratory, embarking on a Ph.D. project, or initiating a novel experiment, teams typically find themselves at Level 1 of the maturity model. This stage is characterized by a high degree of flexibility and customization in experimental and analytical methods. Data volumes at Level 1 are typically limited, not requiring high throughput. Each project adopts tailored approaches, with custom software and manual data management on dedicated lab infrastructure. Standardized methods are largely absent at this stage.

Experimentation without rigid methods, often described as “tinkering,” has always been foundational to research6. Remarkable achievements, including those recognized with Nobel Prizes, have commonly emerged from Level 1 practices. Nevertheless, in the context of today’s technology-driven and data-intensive projects, there is a growing need for more organized, open, and reproducible methods. Level 1 practices make it difficult for individual labs to adapt their processes for larger collaborations. Operational maturity should align with the complexity and scope of the research being conducted, much like how biosafety levels are established to match research demands.

While maintaining space for intuition-driven and flexible activities is crucial at all maturity levels, successful methods and practices can be standardized and scaled up as research demands increase. Moreover, a team’s maturity level is not necessarily a reflection of its size or its funding level. The ultimate aim of this model is to provide research teams with the necessary tools and guidance to elevate their operational maturity to meet the evolving demands of their research.

2.2 Level 2: Managed

To expand their scope and tackle larger, more complex projects, research teams must progress to Level 2, where the emphasis shifts to develo** lab-wide standards. These standards enhance consistency and predictability in internal operations, thereby facilitating more effective teamwork and leading to more credible findings.

Key characteristics of Level 2 teams include:

  1. 1.

    Repeatable Processes: Laboratories establish uniform methods applicable across various projects. This standardization extends to data management with shared storage, standardized formats, and structured naming conventions, ensuring data integrity. Software practices also evolve, incorporating version control, documentation, testing, and code review processes 7, 8, 5. A notable practice at this level is the development and maintenance of a stable data pipeline, optimized for efficient, long-term use.

  2. 2.

    Role Specialization: Level 2 teams clearly define team roles and responsibilities. This specialization allows for an efficient distribution of tasks and maximizes the use of individual expertise. A common practice is rotating trainees through different roles to provide a comprehensive understanding of lab operations.

  3. 3.

    Quality Control: Level 2 introduces rigorous procedures for continuously monitoring and validating the accuracy and reliability of experimental results. These include instrument calibration, software testing, and signal quality assessment. Quality control criteria are established and periodically updated to ensure the highest standards.

  4. 4.

    Training Programs: Level 2 teams provide a structured onboarding process for new members as well as ongoing training initiatives. These programs aim to keep the team proficient in lab operations and abreast of the latest developments. Some labs extend their training efforts beyond their teams, hosting seminars and workshops for external groups to disseminate their standardized practices.

The progression to Level 2 marks a significant step towards operational maturity, characterized by a systematic approach to research, a focus on quality and reliability, and an emphasis on continuous learning and improvement. However, uniting multiple Level 2 labs within a consortium for collaborative projects poses significant challenges owing to mismatching processes employed in each lab.

2.3 Level 3: Defined

Level 3 marks a stage where research teams embrace practices conducive to robust collaborations spanning various laboratories and disciplines. Level 3 labs excel in joining forces within multilab consortia, streamlining interaction, and harmonizing data processes. Key features of this level include the adoption of common data standards and interoperable computational frameworks, essential for efficient use and sharing of data and software among teams.

Key characteristics of Level 3 teams are:

  1. 1.

    Open-Source Ecosystems: Level 3 teams are deeply engaged with resilient open-source software. These ecosystems are not just about providing tools that are “free to use” but also involve responsive community governance setting standards for quality, reliability, and reproducibility. They form the foundation for state-of-the-art projects and offer community support and educational resources. Level 3 teams align their work with these community-driven open-source endeavors, promoting consistency, integration, reliability, reproducibility, and sustainability.

    These teams adopt disciplined practices for software management, including principled code management, peer review, validation, and testing. This approach enables the creation of evolving data pipelines and computational workflows with minimal downtime, ensuring continuous research progression.

  2. 2.

    FAIR Data: Level 3 teams develop, adopt, and promote harmonized initiatives for data standards, fostering interoperability of tools and processes across research groups. This includes adherence to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) for scientific data9. Such standards facilitate the exchange and reuse of complex data, supported by robust systems including data sharing platforms that enable reproducibility and re-analysis. Examples in neuroscience include data standards around the BIDS10 and NWB projects11.

    Data exchange and reuse requires infrastructure. Robust data sharing platforms and collaborative research environments are set up to facilitate joint research endeavors. Neuroscience data archives such as DANDI12, BrainLife13, BossDB14, and OpenNeuro15 not only store the data but also facilitate reproducibility and new analysis. Distributed data management systems such as DataLad, which relies on git-annex, facilitate not only versioning of data but also unified data access and exchange across multiple work sites and archives16, 17.

    FAIR principles foster efficient work for both humans and machines. Seamless automation and effective machine learning rely on machine readability enabled by FAIR data standards18. On a global scale, the collective output of Level 3+ research contributes to a semantic web of datasets and methods, enabling further aggregation of knowledge.

  3. 3.

    FAIR Workflows: Level 3 teams elevate the application of FAIR principles beyond data management to encompass the entirety of computational workflows, tracking all data transformations from the raw acquired data through processing and analysis to the figures in a paper19, 20. These practices manage the associated code versions, dependencies, environment configurations, and parameters. Data outputs contain their provenance information describing their lineage from the original inputs through all computational transformations. This comprehensive approach ensures that computational analyses are repeatable and shareable, all while minimizing the potential for human error.

    Formal workflows incorporate best practices for testing code logic, including unit, regression, and integration testing, often utilizing benchmark datasets. This rigorous testing framework promotes the reliability and trustworthiness of research results.

    The assimilation of formal workflow specifications has been disparate across scientific domains. Geosciences and bioinformatics have been at the forefront, rapidly adapting and benefiting from these advancements. Neuroscience, especially outside human neuroimaging, has been more reticent, perhaps because of its vast array of data modalities and the inherent challenges in achieving standardization21.

Level 3 teams, exemplified by the International Brain Lab and participants in the NIH U19 program, demonstrate how open science practices significantly enhance operational capacity22. They view open science not as a burden but as an opportunity to become more efficient, credible, and accessible in their research endeavors. This level represents a significant step towards a more integrated and collaborative scientific community, where shared knowledge and resources propel research to new heights.

2.4 Level 4: Scalable — Introducing SciOps

Levels 4 and 5 transform research operations by adopting technologies for automation, scalability, and efficient collaboration. We collectively refer to these methodologies as SciOps, aligning with the principles of other successful “Ops” disciplines in the technology industry: DevOps, DataOps, and MLOps.

In the software industry, DevOps cultivates a seamless, semi-automated pipeline for software development and operations23, 24, 25. DevOps relies on a set of enabling technologies: containerization, version control, infrastructure as code (IaC), continuous integration / continuous deployment (CI/CD), logging and monitoring. Repetitive tasks are automated, transitions are minimized, computational resources are meticulously orchestrated, and processes are continuously monitored and rendered observable. The core driver of productivity in DevOps is the integration of collaborating teams’ activities into a unified automated workflow. Rather than delivering their outputs separately, these teams contribute to a joint automated pipeline, guided by well-defined processes. Elimination of hand-offs between teams reduces errors and shortens development cycles.

Similarly, the field of data analytics has scaled up operations by extending DevOps into the methodologies of DataOps26, 27, 28 and MLOps29. These strategies streamline teamwork in data gathering, processing, and analysis and accelerate the development and deployment of machine learning models. They encourage teamwork among data specialists through clear communication, integration, and automation, resulting in more accurate and timely insights.

The emergence of SciOps was surveyed in a 2022 National Academies consensus study on Automated Research Workflows2, spanning various data-intensive research domains with one notable exception: neuroscience. These workflows represent the convergence of computation, lab automation, and artificial intelligence, spanning across the entire research cycle: experiment design, observations, simulations, data collection and analysis, and learning from results to inform further experiments and simulations.

Level 4 leverages technology to dramatically scale and streamline coordinated activities of multidisciplinary teams collaborating on complex projects. Its key capabilities include:

  1. 1.

    Experiment Automation: Level 4 teams automate experimental processes, including data collection and analysis, including experiment execution, where instruments for experiment control, stimuli, sensors, and data acquisition are integrated into the data pipeline and workflow management.

  2. 2.

    DataOps: Level 4 teams implement standard practices for optimizing data management, processing, and analysis while integrating advanced computational infrastructure. The primary focus is on achieving efficiency, scalability, maintaining quality control, and enabling seamless collaboration.

    At this level, each project is supported by a meticulously defined shared data pipeline. This data pipeline serves as a formalized process for aggregating primary data while ensuring consistency and integrity, acting as the authoritative source for all downstream activities. As teams progress to higher maturity levels, the data pipeline extends to encompass all critical phases of the data lifecycle.

    These automated data pipelines exhibit adaptability and transparency, allowing for the continuous evolution of data analysis methods through collaborative endeavors. Workflow logic undergoes ongoing validation through automated testing to minimize the risk of human errors. Software and system design are structured to efficiently handle large data volumes, guaranteeing fast data access and movement. Achieving elastic scalability often involves leveraging cloud and high-performance computing infrastructures, in conjunction with established best practices for software development and operation.

  3. 3.

    Collaboration Environments: Level 4 teams establish streamlined collaboration environments that enable diverse, distributed teams to access data exploration capabilities. This includes the use of web-based environments for exploratory analysis and knowledge exchange, user-friendly interfaces for data import/export, and the use of community data archives and software repositories30.

  4. 4.

    Teamflow: Level 4 teams establish efficient project management and communication strategies to support large-scale, multidisciplinary teams. Project management frameworks are adopted to lower the barriers to participation. They prioritize project continuity and fair credit tracking for individual contributions. These teams foster open, transparent, and efficient communication while ensuring data consistency and integrity. Leadership and mentorship activities promote flexibility and adaptability. Integration of rapid software development practices, efficient data management techniques, and collaborative tools enable teams to work effectively and cohesively within a complex and scalable research environment.

Only a select few neuroscience research teams approach Level 4 maturity as the requisite innovations are currently in development. Among those that have made progress in this direction are well-funded institutes and multi-institution consortia, including the Allen Institute for Neural Dynamics, e11 Bio, major BRAIN Initiative initiatives like BICCN 31 and MICrONS 32, as well as the International Brain Laboratory 33. The Virtual Brain Project, another example, enables Level 4 operations through open-source containerized cloud services for multiscale brain simulation and magnetic resonance image processing to enable scalable and reproducible research34. Open cloud platforms that support automated data processing and management, such as brainlife.io13, can provide open and secure environments for researchers to collaborate and implement reproducible data analysis. The platforms Virtual Research Environment35 and EBRAINS36 with its federated Health Data Cloud37 peer-to-peer networks38 also serve as collaborative research environments while protecting data privacy in building human digital twins for medical research. These entities are at the forefront of neuroscience data scale and collaboration towards shared research goals and Level 4 operations. A key challenge for the field is to create an ecosystem of tools, standards, and platforms to enable diverse research teams of all sizes and funding levels to adopt these scalable approaches.

2.5 Level 5: Optimizing

The ultimate aim of SciOps is the seamless integration of artificial intelligence (AI) with human cognition, creating a dynamic partnership that continually refines experiment design and enhances knowledge synthesis through concurrent computational modeling. Level 5 is distinguished by the strategic application of advanced technologies and AI to establish a “closed discovery loop.” This entails the formalization and automation of the process of using insights derived from data analysis to actively guide subsequent decisions, ultimately sha** the direction of scientific studies.

Key components establishing AI-accelerated closed-loop studies include:

  1. 1.

    Adaptable experiments: Experiment designs are highly configurable, allowing rapid adaptation to new scientific questions. Parameterized controls include sensory and optogenetic stimulation, recording protocols, and cognitive tasks.

  2. 2.

    Machine learning in the loop: Closed-loop experiments create a continuous automated feedback loop where machine learning algorithms optimize experiment controls to maximize knowledge gain about brain structure and function. This may involve a “digital twin” paradigm, where a simulation of the brain is continuously refined to match recorded data, facilitating concurrent in silico experiments. These closed loops can span multiple temporal scales: from real-time control of experiments to long-term adaptations exploring the hypothesis space, effectively guiding the scientific study.

    Machine learning models powering closed-loop discovery will undergo rapid development, requiring adopting industry-proven MLOps practices whereby new models are tested and deployed continuously.

    Integrated AI systems observe trends in data, identify new phenomena, propose hypotheses and new experiments. Future generative AI systems will consider various information sources, including current literature, existing open datasets, and ongoing discourse between researchers, embedding this knowledge into the data pipeline and providing decision support for further inquiry1.

  3. 3.

    Human in the loop: While closed-loop experiments automate many processes, human input and insight remain integral. Level 5 workflows feature a symbiotic relationship between AI tools and human ingenuity. This hybrid approach enables rapid hypothesis generation, adaptation of experimental procedures, effective quality control, and the rapid analysis of large-scale data collections. To support human participation, the data pipeline and experiment controls are made observable and explainable.

While no research team in neuroscience (or perhaps in any other discipline) has fully realized Level 5 operations, existing closed-loop experiments showcase elements of this vision through the application of machine learning to optimize experimental conditions based on functional recordings39. These projects involve the analysis of extensive neurophysiology data in large-scale collaborations to train deep neural networks for stimulus generation in a closed-loop manner.

Level 5 of the Neuroscience Operations Maturity Model represents the forefront of neuroscience research, where AI takes center stage, driving innovation and enabling researchers to tackle complex questions with unprecedented efficiency and precision. It will signify a transformative leap in the field’s capabilities and potential for groundbreaking discoveries.

3 What We Need to Advance Modern Scientific Operations in Neuroscience

The evolution of scientific practices, coupled with rapid technological advancements, demands a strategic shift in how we approach research, especially in the context of experimental neuroscience. The transformational power of SciOps methodologies has the potential to reshape the way we think about scientific endeavors. However, to tap into this potential, we must focus on specific areas of development and innovation:

3.1 Action 1: Adopt this Capability Maturity Model

Community Governance: We invite the community to embrace, enhance, and collaboratively govern this Capability Maturity Model. This version, designated as Neuro SciOps CMM v1.0, serves as a starting point, and we are committed to establishing a roadmap and guidelines for contributions, working in coordination with the International Neuroinformatics Coordinating Facility (INCF)40.

A centralized community resource will be established through the INCF and other platforms to provide access to the maturity model and related resources. This resource will serve as a valuable reference point for individuals and organizations interested in advancing scientific operations. To facilitate community engagement, we envision the creation of a dedicated SciOps resource and a working group to administer and support the framework and its collaborative development.

Assessment and Roadmaps: The model will support assessment and certification processes, enabling organizations to prepare their processes for various projects and programs. It will also serve as a valuable tool for charting organizational improvements and technological roadmaps.

By embracing this Capability Maturity Model and actively participating in its development and application, the scientific community can collectively advance the field of scientific operations, foster innovation, and drive transformative progress in research methodologies and practices.

Socialize expectations across the neuroscience community: For this model to be adopted, the expectations around how neuroscience needs to be performed to meet 21st century goals will have to be socialized across the neuroscience community, through integration with training programs, townhall meetings and workshops at major scientific gatherings.

3.2 Action 2: Establish SciOps Methodologies

Current standards and policies in neuroscience data are focused on standardization and public data sharing, marking a progression towards Level 3 maturity within our model. Achieving Levels 4 and 5 will require new tools, “SciOps methodologies”, which adapt DevOps principles to the context of neuroscience experiments. Many practices can be transferred from collaborative scientific workflow management systems that have gained prominence in bioinformatics21, 41, 42, 43 and from industry DevOps and DataOps open-source frameworks and commercial platforms44.

Experiment automation with DevOps, DataOps, and MLOps: Workflow management technologies should seamlessly integrate with neuroinformatics tools and methods, setting the stage for more scalable operations as defined in Levels 4 and 5 of our model. The evolution of neuroinformatics tools should prioritize continuous integration and deployment of research software—whether commercial or community-driven— encompassing experiment control, data acquisition, and analysis. Data should be made available in formats and on infrastructure that allow for scalable storage, processing, and sharing. With the constant flux and scaling of data, formats and infrastructure evolve rapidly, making it critical to establish lasting organizational principles to ensure project continuity. Furthermore, experimental workflows should integrate formal frameworks for embedding artificial intelligence into the discovery loop.

The creation and widespread adoption of innovative tools tailored specifically for SciOps have the potential to simplify intricate processes, streamline tasks, and significantly enhance research efficiency. By embracing this approach, we can usher in a transformative era of scientific operations characterized by heightened methodological prowess and overall efficacy.

3.3 Action 3: Focus on cyberinfrastructure and digital platforms

Digital platforms have the potential to elevate research teams to Level 3+ maturity without the need for extensive investments in engineering expertise and custom solutions by individual projects45. However, in the academic landscape, digital platforms often suffer from limited usability, creating the perception that shared infrastructure can impede daily activities, presenting a barrier for sustained adoption. Consequently, some teams opt for building their own solutions rather than embracing centralized platforms endorsed by their communities. Unfortunately, these custom-built infrastructures frequently fall short in terms of service quality, performance, customer support, and long-term sustainability.

A new generation of academic neuroinformatics platforms, including BrainLife 13, DABI 46, OpenNeuro 15, SPARC 47, and DANDI 11, are positioned to replicate the success of well-established platforms in the fields of bioinformatics and biomedical research such as the Galaxy platform, for example48. To do so, they must evolve beyond data sharing (Level 3) to the automation and scalability of workflows (Level 4+).

On the other hand, commercial platforms, driven by competitive pressures, are better positioned to deliver usability, robust customer support, and service continuity. In the field of neuroscience, emerging commercial platforms such as DataJoint Works, Inscopix IDEAS, and CodeOcean provide support for integrated research workflows.

To ensure the success of these platforms, it is essential that they integrate seamlessly with FAIR data and FAIR workflows, emphasizing transparency, accessibility, and interoperability of data and computational services. The widespread adoption of this Capability Maturity Model can assist commercial technology providers in aligning their business models with the overarching objective of expanding research capabilities and enhancing the user experience. Given the diverse nature of the field, it is unlikely that a single platform or toolset will dominate. Consequently, diverse groups with common goals and roadmaps can form alliances around shared standards and open-source frameworks, promoting interoperability, transparency, and reproducibility across their respective platforms.

To realize this multifaceted vision, it is essential to establish a comprehensive strategy that encompasses a blend of academic projects, commercial technology initiatives, and consortial activities, all in alignment with funding policies. This multifaceted approach will guide the formulation of research projects, the marketing efforts of commercial technology providers, and the policies of funding agencies, ultimately fostering the creation of a unified and sustainable ecosystem.

Acknowledgements

We would like to thank the many community members who contributed to the discussions and reviews of this manuscript and raised key community issues, including David Feng, Andreas S. Tolias, Marisel Villafañe-Delgado, Lindsey Kitchell, Daniel Xenes, and others. DY, BW, TN, MK, KG, and ECJ were supported in part by the NIH (award R44 NS129492). YOH was supported in part by the NIH (awards 1 R24 MH117295 and 2 P41 EB019936-06A1 R). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other supporting institutions.

References

  • 1 Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
  • 2 NASEM. Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop (National Academies of Engineering and Medicine, 2022). URL https://nap.nationalacademies.org/catalog/26532/automated-research-workflows-for-accelerated-discovery-closing-the-knowledge-discovery.
  • 3 Paulk, M. C. A history of the capability maturity model for software. ASQ Software Quality Professional 12, 5–19 (2009).
  • 4 Chrissis, M. B., Konrad, M. & Shrum, S. CMMI for development: guidelines for process integration and product improvement (Pearson Education, 2011).
  • 5 Bush, K. A., Calvert, M. L. & Kilts, C. D. Lessons learned: A neuroimaging research center’s transition to open and reproducible science. Frontiers in big Data 82 (2022).
  • 6 Feyerabend, P. Against method: Outline of an anarchistic theory of knowledge (Verso Books, 2020).
  • 7 Artaza, H. et al. Top 10 metrics for life science software good practices. F1000Research 5 (2016).
  • 8 Eglen, S. J. et al. Toward standard practices for sharing computer code and programs in neuroscience. Nature neuroscience 20, 770–773 (2017).
  • 9 Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Scientific data 3, 1–9 (2016).
  • 10 Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific data 3, 1–9 (2016).
  • 11 Rübel, O. et al. The neurodata without borders ecosystem for neurophysiological data science. Elife 11, e78362 (2022).
  • 12 Subash, P. et al. A comparison of neuroelectrophysiology databases. Scientific Data 10, 719 (2023).
  • 13 Hayashi, S. et al. brainlife.io: A decentralized and open source cloud platform to support neuroscience research. ArXiv (2023).
  • 14 Hider Jr, R. et al. The brain observatory storage service and database (BossDB): a cloud-native approach for petascale neuroscience discovery. Frontiers in Neuroinformatics 16, 828787 (2022).
  • 15 Markiewicz, C. J. et al. The OpenNeuro resource for sharing of neuroscience data. Elife 10, e71774 (2021).
  • 16 Halchenko, Y. et al. Datalad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software 6 (2021).
  • 17 Kalantari, A. et al. How to establish and maintain a multimodal animal research dataset using datalad. Scientific data 10, 357 (2023).
  • 18 Huerta, E. et al. FAIR for AI: An interdisciplinary and international community building perspective. Scientific Data 10, 487 (2023).
  • 19 Goble, C. et al. FAIR computational workflows. Data Intelligence 2, 108–121 (2020).
  • 20 Deelman, E. et al. The future of scientific workflows. The International Journal of High Performance Computing Applications 32, 159–175 (2018).
  • 21 Wratten, L., Wilm, A. & Göke, J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature methods 18, 1161–1168 (2021).
  • 22 Bonacchi, N. et al. A modular architecture for organizing, processing and sharing neurophysiology data. Nature Methods 20, 403–407 (2023).
  • 23 Ebert, C., Gallardo, G., Hernantes, J. & Serrano, N. DevOps. IEEE Software 33, 94–100 (2016).
  • 24 Leite, L., Rocha, C., Kon, F., Milojicic, D. & Meirelles, P. A survey of DevOps concepts and challenges. ACM Computing Surveys (CSUR) 52, 1–35 (2019).
  • 25 Teixeira, D. et al. A maturity model for DevOps. International Journal of Agile Systems and Management 13, 464–511 (2020).
  • 26 Gartner. Gartner Hype Cycle for Data Management Positions Three Technologies in the Innovation Trigger Phase in 2018. https://www.gartner.com/en/newsroom/press-releases/2018-09-11-gartner-hype-cycle-for-data-management (2018). [Online; accessed 22-Dec-2023].
  • 27 Rodriguez, M., de Araújo, L. J. P. & Mazzara, M. Good practices for the adoption of DataOps in the software industry. In Journal of Physics: Conference Series, vol. 1694, 012032 (IOP Publishing, 2020).
  • 28 Atwal, H. Practical DataOps: Delivering agile data science at scale (Springer, 2019).
  • 29 Mäkinen, S., Skogström, H., Laaksonen, E. & Mikkonen, T. Who needs MLOps: What data scientists seek to accomplish and how can MLOps help? In 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN), 109–112 (IEEE, 2021).
  • 30 Milewicz, R. et al. DevOps pragmatic practices and potential perils in scientific software development. In International Congress on Information and Communication Technology, 629–647 (Springer, 2023).
  • 31 Network, B. I. C. C. A multimodal cell census and atlas of the mammalian primary motor cortex. Nature 598, 86–102 (2021).
  • 32 Consortium, M. et al. Functional connectomics spanning multiple areas of mouse visual cortex. BioRxiv 2021–07 (2021).
  • 33 Abbott, L. F. et al. An international laboratory for systems and computational neuroscience. Neuron 96, 1213–1218 (2017).
  • 34 Schirner, M. et al. Brain simulation as a cloud service: The virtual brain on EBRAINS. NeuroImage 251, 118973 (2022).
  • 35 VRE. Virtual Research Environment. https://vre.charite.de/vre/ (2023). [Online; accessed 22-Dec-2023].
  • 36 EBRAINS. EBRAINS. https://www.ebrains.eu/ (2023). [Online; accessed 22-Dec-2023].
  • 37 HealthDataCloud. HealthDataCloud. https://www.healthdatacloud.eu/ (2023). [Online; accessed 22-Dec-2023].
  • 38 eBRAIN Health. eBRAIN-Health. https://ebrain-health.eu/home.html (2023). [Online; accessed 22-Dec-2023].
  • 39 Walker, E. Y. et al. Inception loops discover what excites neurons most using deep predictive models. Nature neuroscience 22, 2060–2065 (2019).
  • 40 Abrams, M. B. et al. A standards organization for open and FAIR neuroscience: the international neuroinformatics coordinating facility. Neuroinformatics 20, 25–36 (2022).
  • 41 Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nature biotechnology 35, 316–319 (2017).
  • 42 Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
  • 43 Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nature biotechnology 35, 314–316 (2017).
  • 44 Bhat, M. et al. Magic quadrant for devops platforms. https://www.gartner.com/doc/reprints?id=1-2DW4I0FF&ct=230601&st=sb (2023). [Online; accessed 22-Dec-2023].
  • 45 Sandström, M. et al. Recommendations for repositories and scientific gateways from a neuroscience perspective. Scientific Data 9, 212 (2022).
  • 46 Duncan, D. et al. Data archive for the brain initiative (DABI). Scientific Data 10, 83 (2023).
  • 47 Bandrowski, A. et al. SPARC data structure: Rationale and design of a fair standard for biomedical research data. bioRxiv 2021–02 (2021).
  • 48 Afgan, E. et al. The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic acids research 50 (2022).