\journaltitle

Preprint \DOIDOI HERE \accessAdvance Access Publication Date: Day Month Year \appnotesPaper

\authormark

Welborn et al.

\corresp

[\ast]Corresponding authors. [email protected], [email protected], [email protected]

Streaming Large-Scale Microscopy Data to a Supercomputing Facility

Samuel S. Welborn\ORCID0000-0002-7697-6347    Chris Harris\ORCID0000-0002-1113-3728    Stephanie M. Ribet\ORCID0000-0002-7117-066X    Georgios Varnavides\ORCID0000-0001-8338-3323    Colin Ophus\ORCID0000-0003-2348-8558    Bjoern Enders\ORCID0000-0002-6009-6281    Peter Ercius\ORCID0000-0002-6762-9976 \orgdivNERSC, \orgnameLawrence Berkeley National Laboratory, \orgaddress\streetBerkeley, \postcode94720, \stateCA, \countryCountry \orgdivNCEM, The Molecular Foundry, \orgnameLawrence Berkeley National Laboratory, \orgaddress\streetBerkeley, \postcode94720, \stateCA, \countryCountry
(2024; 2024)
Abstract

Data management is a critical component of modern experimental workflows. As data generation rates increase, transferring data from acquisition servers to processing servers via conventional file-based methods is becoming increasingly impractical. The 4D Camera at the National Center for Electron Microscopy (NCEM) generates data at a nominal rate of 480 Gbit s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (87,000 frames s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT), producing a 700 GB dataset in fifteen seconds. To address the challenges associated with storing and processing such quantities of data, we developed a streaming workflow that utilizes a high-speed network to connect the 4D Camera’s data acquisition (DAQ) system to supercomputing nodes at the National Energy Research Scientific Computing Center (NERSC), bypassing intermediate file storage entirely. In this work, we demonstrate the effectiveness of our streaming pipeline in a production setting through an hour-long experiment that generated over 10 TB of raw data, yielding high-quality datasets suitable for advanced analyses. Additionally, we compare the efficacy of this streaming workflow against the conventional file-transfer workflow by conducting a post-mortem analysis on historical data from experiments performed by real users. Our findings show that the streaming workflow significantly improves data turnaround time, enables real-time decision-making, and minimizes the potential for human error by eliminating manual user interactions.

keywords:
streaming, 4D-STEM, high-performance computing, real-time processing

\secsize Introduction

In the era of big data, the scientific community faces significant challenges in data management (Rao, 2020; Spurgeon et al., 2021). This is especially evident at experimental user and core facilities, where advancements in instrumentation, such as faster detectors and increased light source brightness, have led to an exponential increase in data generation rates. The traditional methods of data storage and movement (e.g., personal flash drives) are becoming increasingly untenable.

In 2019, a new detector called the 4D Camera was installed on the TEAM 0.5 microscope at the National Center for Electron Microscopy (NCEM) facility of The Molecular Foundry at Lawrence Berkeley National Laboratory (LBNL) (Ercius et al., 2023, 2020). This detector produces data at a rate of 480 Gbit s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (equivalent to 87,000 frames s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT), yielding datasets of up to 700 GB for a fifteen second acquisition. Other microscopy facilities are installing similar high frame rate detectors with the ability to routinely generate >100absent100>100> 100 GB datasets (Chatterjee et al., 2021; Zambon et al., 2023). While these technological advancements provide new avenues for scientific exploration, they also pose significant challenges in data management, analysis, and acquisition. New opportunities for development include on-the-fly processing for quick feedback on an experimental approach and implementation of complex experimental pipelines, such as focal series or tomography (Pelz et al., 2022, 2021b) that leverage the capabilities of these advanced detectors. Given that microscope time is a limited and valuable resource, rapid data analysis that provides feedback on the quality of large data sets during a microscope session is crucial for improving throughput.

To mitigate these challenges, a collaborative effort involving high performance computing (HPC) experts at the National Energy Research Scientific Computing Center (NERSC), electron microscopy experts at NCEM, and software development experts at Kitware, Inc. led to the utilization of NERSC for data reduction and the development of a web frontend called Distiller to facilitate data management (Harris and Genova, 2023). HPC systems are typically accessed through command line interfaces, which are often unfamiliar to microscopists. Distiller, on the other hand, allows users to transfer and process data at NERSC through simple web-based interactions. This effort, which was part of a broader initiative at LBNL called The Superfacility Project, greatly improved the workflow for the 4D Camera (Harris and Genova, 2023; Enders et al., 2020; Welborn et al., 2024a).

Despite its utility, data analysis at NERSC was constrained by file-based input/output (I/O) steps that created bottlenecks at two stages: (1) writing data from random-access memory (RAM) to local disk storage at NCEM and (2) file transfer from NCEM to NERSC before computation. We note that file-based data movement is the common workflow across most detector systems. The dependence on file-based I/O operations slows down data processing, constrains the scope of feasible experiments, and relies on file systems (e.g., the NERSC global file system) possibly shared by multiple users. In our recent work, we showed that, by circumventing these file-based operations through streaming data from detector buffer memory directly to NERSC compute node memory, we improved throughput by five- to fourteen-fold (Welborn et al., 2024a). In the present work, we showcase the advantages of a streaming workflow for microscopy experiments using the 4D Camera as a case study.

This manuscript is organized as follows. In the background section, we provide an overview of 4D scanning transmission electron microscopy (4D-STEM) and discuss difficulties in managing the substantial datasets generated by the 4D Camera. Then, we briefly outline the components of the streaming pipeline. Next, we describe enhancements to Distiller that obviate the need for an in-depth understanding of HPC. Finally, we demonstrate the practical benefits of streaming through a comparative analysis of real user experiments employing both workflows.

\secsize Background

\subsecsize Transmission Electron Microscopy and 4D-STEM

Transmission electron microscopy (TEM) provides insights into the atomic and molecular structure of materials, making it a cornerstone characterization technique across scientific disciplines from materials science to biology. Scanning TEM (STEM) operates in a mode where an electron probe is focused onto the sample and rastered over a two-dimensional set of probe positions. Post-specimen detectors register electron events in diffraction space that can be mapped to specific probe positions. The versatility of STEM extends its utility beyond conventional imaging, facilitating advanced analytical methods such as spectroscopy, electron tomography, ptychography, and holography (Yasin et al., 2016, 2018; Stevens et al., 2018; Ophus, 2019; Miao et al., 2016; Ercius et al., 2015; Varnavides et al., 2023; Ribet et al., 2024; Ben-Moshe et al., 2021; Ophus, 2023).

Recent advancements in detector technology have ushered in a new era for STEM. Specifically, the introduction of direct electron detectors (DEDs) has dramatically accelerated data acquisition rates and opened new experimental possibilities  (Levin, 2021; Ercius et al., 2023). DEDs can acquire data with a temporal resolution ranging from milliseconds to microseconds enabling a technique generally called 4D-STEM because two-dimensional (2D) diffraction patterns are acquired at a series of 2D probe positions (Ophus, 2019). The resulting 4D dataset contains a wealth of both structural and compositional information about the sample. Analysis of the diffraction patterns can reveal the sample’s overall crystal orientation, strain, and material phase, enabling a detailed map** of these properties to provide a comprehensive characterization of the material (Ophus, 2019). One of the applications of 4D-STEM is phase contrast imaging—while detectors record only the intensity of the exit wave after interaction with the sample, it is possible to reconstruct the phase, leading to dose-efficient characterization of weakly scattering signals. Phase retrieval STEM methods, such as differential phase contrast (DPC) (Dekkers and De Lang, 1974; Waddell and Chapman, 1979; Shibata et al., 2012; Cao et al., 2018), which measures the change in the center of mass of diffraction patterns, and advanced algorithms such as ptychography, offer enhanced contrast and resolution (Nellist et al., 1995; Varnavides et al., 2023; Enders and Thibault, 2016).

The size of 4D-STEM data introduce significant challenges in data management. An illustrative case is the 4D Camera, which can accumulate 2D diffraction patterns at a rate of 87,000 Hz (nominally 200 TB hr1superscripthr1\mathrm{hr}^{-1}roman_hr start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) highlighting the need for informed data treatment beyond current capabilities at most electron microscopy laboratories. Solving the challenges that come with storing and processing large datasets in a timely manner necessitates an examination of the pathway data takes within a data acquisition (DAQ) system and processing workflow.

\subsecsize High Data Rate Acquisition and its Challenges

The DAQ system for the 4D Camera at NCEM, developed in-house at LBNL, integrates both software and hardware elements to achieve such high data rates (Fig. 1). The 4D Camera sensor (Fig. 1a, bottom) is partitioned into four sectors, each of which is connected to a dedicated receiving server via twelve 10 Gbit s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT connections through field-programmable gate arrays (FPGAs). As the electron beam rasters across the sample (Fig. 1a, top), a 576 ×\times× 576 pixel frame is acquired at each scan position. Each 144 ×\times× 576 pixel sector is processed by an FPGA and transmitted to its corresponding data receiving server (Fig. 1a-b). Upon completion of a scan, the data are written as binary data files (Fig. 1c) to flash storage. For a more comprehensive description of this DAQ system, the reader is referred to Ercius et al. (2023) and Welborn et al. (2024a).

Refer to caption
Figure 1: Schematic illustrating both the DAQ system and the initial mitigation strategies for managing large-scale 4D-STEM datasets generated at NCEM. A user begins an experiment using the TEAM 0.5 microscope software for the four-sector 4D Camera (a). The camera is connected to data receiving servers through FPGAs (b). Each server ingests all data into RAM and subsequently writes it to an eight TB flash storage system (c), which takes around 150 s for a 700 GB dataset. The data is either processed locally at NCEM on a single server with ten CPU cores (d), or transferred to NERSC’s filesystems (e) and processed with more robust compute resources (f). Data processing is illustrated in panel (g), showing the assembly of disconnected sectors into coherent frames and subsequent electron counting of these frames. This processed data is saved in a single HDF5 file. The Distiller web application (h) enables the user (i) to initiate file transfers to NERSC’s file systems, perform electron counting, and launch analysis notebooks in NERSC’s Jupyter environment.

With a data rate of 480 Gbits s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, a single fifteen second acquisition using the 4D camera generates approximately 700 GB of data (Ercius et al., 2023). The large data volume manifests three distinct but interrelated challenges: (1) limited local disk storage capacity, where the available eight TB of flash storage can only accommodate eleven full scans; (2) the computational burden associated with processing large datasets, which overwhelms local dedicated resources; and (3) the time-intensive nature of writing large files to disk, which blocks the system from further data acquisition. Collectively, these challenges substantially reduce user productivity and waste precious beam time. It is important to note that challenges in data management and computational limitations extend beyond NCEM to other Experimental and Observational Science (EOS) facilities, and these problems will intensify in the future (Rao, 2020; Spurgeon et al., 2021). A notable unscalable example is the Event Horizon Telescope data transfer protocol, which involved physically transporting hard disk drives from the telescope to a processing facility to produce the now-famous black hole image (Doeleman et al., 2023).

\subsecsize Initial Mitigation Strategies

The 4D Camera is designed to acquire CBED frames containing a small number of electrons, leading to a sparse data set. Thus, we can simultaneously mitigate the first challenge (storage capacity) and remove detector noise from our data through compression. The software package stempy (Avery et al., 2023) efficiently transforms the raw data into a more manageable sparse format by finding and kee** only the locations of single electron hits, a process called “electron counting” in this manuscript (Pelz et al., 2021a; Ercius et al., 2023; Battaglia et al., 2009). This transformation (represented graphically in Fig. 1g) results in an order of magnitude data size reduction (alleviating some storage pressure) and the reduction of detector noise. The raw detector data is typically deleted after it has been electron counted.

While stempy significantly reduces storage requirements, it introduces the second challenge: computational demands for quickly processing large datasets. The processing time for this operation on local resources, represented in Fig. 1d, is considerable. At NCEM, the computational resources are limited to ten CPU cores, which makes the electron counting of a 700 GB dataset a time-consuming task (10-12 minutes). During this time, the detector cannot be used because the same computational resources are shared for both data acquisition and processing. In contrast, each CPU node on NERSC’s newest supercomputer, Perlmutter, is equipped with 128 CPU cores (Fig. 1f), and multiple compute nodes can be allocated to parallelize the electron counting process. Upgrades to NERSC’s computational infrastructure (which have occurred since the 4D Camera was installed) translate into immediate improvements in both the NCEM processing pipeline and for the broader NERSC user community, thereby optimizing resource utilization. Absent this integration, any dedicated compute nodes installed at NCEM require local maintenance and remain underutilized, particularly in periods between experiments. Moreover, by integrating NCEM’s workflow with NERSC’s infrastructure, NCEM users gain access to NERSC’s rich computing and data ecosystem, which is particularly advantageous for processing their data during and after the experiment. This integration not only streamlines NCEM’s operations but also provides a blueprint for the efficient deployment of compute resources beyond a single detector or EOS facility (Enders et al., 2020; Bard et al., 2022).

Recognizing the advantages of centralized compute/storage resources for managing large datasets, the Distiller (Fig. 1h) application was developed to facilitate user interactions with the detector and NERSC. During data acquisition, Distiller presents status and metadata using a user-friendly web-based frontend, allowing users (Fig. 1i) to initiate data transfers to NERSC (Fig. 1e). Then, the data is electron counted using stempy on Perlmutter (Fig. 1f-g) (Harris and Genova, 2023; Enders et al., 2020). After counting, the end result is a single sparse HDF5 file ready for further analysis. NERSC can then provide access restrictions based on user credentials, compute for further analysis, and file transfer to other sites. We provide a screencast of this workflow in Supplemental Video 1 (Welborn, 2024). It is also important to recognize that by collaborating with software development experts at Kitware and HPC specialists at NERSC, we avoided the technical debt often associated with ad hoc scripts developed by microscopists, who do not typically have the bandwidth to develop seamlessly-integrated tools like Distiller.

Despite these advancements, writing/reading large files to/from disk remains an unresolved bottleneck, leading to the third challenge that impedes the efficient transfer of high-volume data.

\subsecsize I/O Bottlenecks in Data Transmission

Four critical I/O operations slow down the transmission of data from NCEM to NERSC:

  1. 1.

    Writing the data to a local drive at NCEM.

  2. 2.

    Reading the data from the local drive and transferring it to NERSC over a fiber network.

  3. 3.

    Writing the data to NERSC’s file systems.

  4. 4.

    Reading the data into NERSC compute node memory for electron counting.

These file I/O bottlenecks present a dual challenge: they slow down data transfer and analysis and also restrict the types of experiments that can be conducted. For instance, they preclude the possibility of running automated experiments over extended periods (Pattison et al., 2023), because human intervention is required to manage data transfer and counting once the local eight TB file system is full.

\secsize Streaming Data from NCEM to NERSC

To overcome the I/O bottlenecks outlined above, we have developed a streaming service that facilitates the transmission of microscope data from NCEM servers to NERSC compute nodes without using file storage. The foundation of our solution is a socket-based network that facilitates RAM-to-RAM data transfer for real-time processing (Fig. 2). Sockets serve as integral components in networked systems, facilitating the exchange of data packets between interconnected devices; by using sockets, we are taking advantage of the progress made in commercial internet infrastructure to improve scientific computing. Our architecture utilizes Zero Message Queue (ZeroMQ), a network socket library, to establish communication between the key elements of our pipeline: the data receiving servers at NCEM, a centralized aggregator server at NCEM, and the compute nodes at NERSC. It is important to note that this section’s content serves as a high-level synopsis of our approach. For a more comprehensive overview of the methods and system architecture, the reader is directed to our recent technical work (Welborn et al., 2024a).

\subsecsize Intercepting File Write at NCEM

The data receiving servers at NCEM (Fig. 2a) handle detector data retrieval, data formatting, and disk storage of raw data files (see background section). Traditionally, each server accumulates data for one sector of the detector in memory during a scan and writes it to disk as files (Fig. 2c) after acquisition is complete. We replaced this disk write operation with our ZeroMQ streaming operation represented by the outlet socket attached to Fig. 2a. These sockets transmit the data from the server’s RAM to a central aggregator server.

Refer to caption
Figure 2: Schematic comparison of the data streaming pipeline (blue pathway, a-b-d) with the file transfer pipeline (red pathway, a-c-d). Starting from the data receivers (a), the streaming approach employs ZeroMQ sockets to bypass raw file disk storage at NCEM, enabling direct RAM-to-RAM transfer. Sockets are created on the data receivers, a centralized aggregator server at NCEM, and NERSC compute nodes to facilitate this transmission. Conversely, the file transfer approach requires several intermediate file storage operations to move the data from NCEM to NERSC. In both pathways, the thick vertical line indicates the network border between NCEM and NERSC. Using stempy, the data is electron counted and saved in a single HDF5 file for further processing (d).

\subsecsize Routing the Data to NERSC

The aggregator server routes data to NERSC for frame reassembly and processing. Its sockets are graphically represented by the inlet socket attached to Fig. 2b. The routing strategy on the central aggregator uses sector metadata (the frame number) to forward data (outlet socket attached to the aggregator in Fig. 2b) to its corresponding node at NERSC (inlet socket attached to NERSC nodes in Fig. 2b). This data routing ensures equitable distribution of frames across the NERSC compute nodes, maintaining a consistent computational load across them. Further, it guarantees that all sectors of a given frame are routed to the same NERSC compute node — sector data is initially dispersed among the receiving servers (see Fig. 1b), and they must be assembled on the same NERSC node before processing (see Fig. 1g).

\subsecsize Live Electron Counting at NERSC

At NERSC, the data is ingested into the compute nodes’ RAM (Fig. 2b) from the upstream centralized aggregator. Full frames are automatically processed using the electron counting algorithm in the stempy package (Avery et al., 2023). After all frames have been received, the sparse, electron counted data is saved in a single HDF5 file (Fig. 2d). The entire system is now ready for another acquisition.

\secsize Workflow from the User’s Perspective

Many users, particularly those without experience in HPC, may find the prospect of initiating a streaming job on a supercomputing cluster to be daunting. To address this, we extended the functionalities of Distiller (Harris and Genova, 2023). Prior to this work, Distiller served as a web portal primarily for cataloging data sets, tracking metadata, and initiating processing jobs at NERSC. Our enhancements allow users to initiate a streaming compute job through the Distiller web interface. Supplemental Video 2 demonstrates starting a session using the Distiller interface and subsequently collecting several acquisitions (Welborn, 2024).

With a single mouse click in Distiller, the necessary connections between NCEM and NERSC are automatically established. This enables users to focus on their experiments while the data is seamlessly streamed to NERSC. As a result, datasets are rapidly available for further analysis, eliminating user distraction and delays associated with manually starting a separate job for each dataset.

The user monitors the progress of their streaming session and initiates data analysis notebooks using NERSC’s Jupyter ecosystem directly from the Distiller web interface (see Supplemental Video 3) (Welborn, 2024), which is enabled by NERSC’s Superfacility API (Thomas et al., 2017; Thomas and Cholia, 2021; Henderson et al., 2020; Enders et al., 2020; Parkinson et al., 2020). Integration of data acquisition, transfer, and analysis into a unified workflow enhances user productivity and enables more complex, data-intensive experiments. The streaming capability have been utilized on the TEAM 0.5 microscope for about 8 months providing streamlined data transfer and analysis for real user experiments. The code for Distiller is publicly accessible and can be found in Harris and Genova (2023).

\secsize Microscope Stability Experiment and Workflow Comparison

\subsecsize Stability Experiment

In order to explore the capabilities enabled by the streaming approach, we conducted a real experiment that mimics a typical high-throughput microscopy workflow — the collection of data at regular time intervals for an extended period, hereafter referred to as a multi-scan experiment. The goals were three-fold: first, to generate a large volume of data that would challenge the streaming system’s capabilities; second, to quantify the microscope’s stability over time; and third, to show that the system can produce many high quality 4D-STEM datasets amenable to advanced analytical techniques, such as ptychography.

The experiment was performed on the aberration corrected TEAM 0.5 outfitted with the all piezo-electric TEAM Stage. This stage offers exceptional stability, with a nominal drift rate of 2 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and allows for tilting up to ±180 within the 2.5 mm pole piece gap (Ercius et al., 2012). The microscope was operated at an accelerating voltage of 300 keV, a convergence angle of 17.1 mrad, a sample tilt of 0, a probe current of 20 pA, and a probe step size of 0.36 Å. A standard sample made of gold nanoparticles with approximate diameter of 5-10 nm was prepared by chemical vapor deposition (CVD) of gold onto an ultra thin carbon substrate. Using an automated data collection script, we acquired 60 4D-STEM datasets at 55-second intervals (total duration of 55 minutes), each with dimensions of 512 ×\times× 512 ×\times× 576 ×\times× 576. Each dataset consists of 173 GB of raw data, culminating in a total data volume of 10.4 TB streamed to NERSC. Each dataset was successfully acquired, transmitted to NERSC, reduced by electron counting, and stored for further analysis. The total data volume was reduced from 10.4 TB down to a more manageable size of 125 GB through counting.

Refer to caption
Figure 3: Reconstructions of the same gold nanoparticle using (a) parallax and (b) ptychography over the course of the nearly hour long experiment. (i), (ii), and (iii) display reconstructions from the experiment’s start, middle, and end.
Refer to caption
Figure 4: Fitted parameters from reconstructions in Fig. 3. (a) Defocus (C1) drift of the TEAM 0.5 during the experiment, 0.3 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. (b) Probe astigmatism (X and Y) drift throughout the experiment, both drifting at about 0.2 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. (a) and (b) were both fit using the data in Fig. 3a. (c) Lateral drift of the sample, fit with cross-correlation using the first ptychography reconstruction as the basis (x=0,y=0)formulae-sequence𝑥0𝑦0\left(x=0,\,y=0\right)( italic_x = 0 , italic_y = 0 ).

The large number of high quality 4D-STEM scans acquired during this hour long experiment provides an opportunity to measure changes in the microscope using advanced techniques such as parallax or tilt-corrected bright field and ptychography (Varnavides et al., 2023; Yu et al., 2024). Here, we used the ParallaxReconstruction and SingleslicePtychographicReconstruction classes in py4DSTEM version 0.14.3 (Savitzky et al., 2021; Varnavides et al., 2023) to perform the reconstructions on each dataset in the time series (see our accompanying data repository in Welborn et al. (2024b) for the reconstruction settings along with an example Jupyter notebook for one of the acquisitions). Representative (a) parallax and (b) ptychography reconstructions at the beginning (i), middle (ii), and end (iii) of the series are shown in Fig. 3. These reconstructions indicate that atomic resolution is maintained throughout the experiment, owing in part to the exceptional stability of the TEAM 0.5 microscope and stage — we made no adjustments to the microscope during the experiment.

We quantify the microscope’s stability by inspecting the estimated microscope parameters from the parallax and ptychography results. In a parallax reconstruction, virtual images from different positions in the bright field disk are aligned through cross-correlation. The image shifts are imparted on these virtual images based on the gradient of the aberration surface of the incoming beam and the rotation between real and reciprocal space in the microscope setup. By fitting the aberration profile of these shifts and rotations, we can estimate changes in low order aberrations during the course of the experiments. Fig. 4a shows the estimated change in defocus over the full hour of data acquisition, amounting to a drift rate of 0.5 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, which is either due to stage or lens drift. The defocus drift value is not typically measured due to the projection nature of the STEM.

We also expect other aberrations to change during the course of the experiment due to lens drift (Schramm et al., 2012), and we can estimate the probe astigmatism (A1) in X and Y for all 60 datasets (Fig. 4b) based on the A1X and A1Y determined from the probe estimate over time. Both astigmatism directions have a drift rate of approximately 0.2 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Iterative electron ptychography can be used to solve for the object as well as the probe from a 4D-STEM dataset and produces a high-resolution and high signal-to-noise reconstruction (Varnavides et al., 2023). We further quantified the lateral drift of the sample by employing cross-correlation techniques on the high-resolution ptychographic reconstructions. The sample exhibited a movement of approximately 1.3 nm in the positive Y direction and around 0.3 nm in the positive X direction (Fig. 4c). These shifts are well within the published stability limits of the microscope stage, which allows for a maximum drift of 6.6 nm over the 55-minute experiment duration, as calculated from the stage’s drift rate of 2 pm s1superscripts1\mathrm{s}^{-1}roman_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

\subsecsize Workflow Comparison

In our recent work (Welborn et al., 2024a), we established that streaming a dataset from NCEM to NERSC is five- to fourteen-fold faster than the conventional file transfer workflow in terms of raw throughput of raw detector data without counting electron events (i.e., the electron beam was off). Here, we will compare these two workflows through an analysis of historical data from four real user experiments with electron events, as illustrated by the timelines in Fig. 5. File transfer experiments exhibit extended durations due to concurrent dataset transmissions and manual user interactions with Distiller, which introduce delays. Conversely, streaming maintains consistent and reliable transfer times, making data immediately accessible at NERSC post-acquisition.

Refer to caption
Figure 5: Timeline diagrams of the streaming workflow compared to the file transfer workflow for two different types of experiments: (a) multi-scan, where data is automatically acquired at regular intervals similar to the stability experiment; and (b) 4D-STEM tomography, where data is collected at semi-regular intervals, but adjustments must be made to the microscope between acquisitions. The left side of each horizontal bar represents the acquisition start time, and the right side indicates the time when the electron-counted data is available at NERSC. (c) and (d) qualitatively indicate the serial steps taken within each of these bars for streaming and file transfer, respectively. The arrows in (a) represent the twenty-fourth acquisition in both workflows.

To construct these plots, we determined the last-modified timestamps for two key files created on the NERSC file system for each acquisition. The ‘start time’ is marked by the timestamp of the simultaneously acquired HAADF-STEM image file uploaded to NERSC immediately at the end of each acquisition, and the ‘end time’ by the timestamp of the electron-counted data file. This pair forms one of the horizontal bars in Fig. 5. It is important to note that these timestamps are synchronized using the same clock, avoiding timing discrepancies across different devices in the distributed workflow environment. For clarity, we do not detail the intermediate steps between the start and finish in Fig. 5a-b, instead displaying representative timeline snippets in Fig. 5c-d. For the file transfer workflow (Fig. 5d), there are five serial steps: writing the data to the local drive at NCEM (light blue), waiting for user input to initiate network transfer (grey), the network transfer to NERSC file system (yellow), followed by electron counting (pink), and finally saving to hdf5 (green). Conversely, there are only two steps in the streaming workflow: streaming (light blue), followed by saving to hdf5 (green).

In Fig. 5a, we compare the file transfer and streaming workflows for multi-scan experiments similar to the stability experiment described above. Each acquisition amounted to 173 GB of raw data, with data dimensions of 512 ×\times× 512 ×\times× 576 ×\times× 576. During the experiment using the file transfer workflow, the user allowed a batch of acquisitions to accumulate on the NCEM file system and then initiated many NERSC transfer jobs using Distiller in reverse-acquisition order. This results in a pyramid-shaped timeline for each batch, since the most recent acquisition was transferred to NERSC first. In Fig. 5a, two batches are shown: the first starting with the dark blue bar and ending with the light blue bar, and the second starting with the dark orange bar and ending with the light orange bar. Notably, the second batch exhibits gaps indicating missing acquisitions. These omissions could either be deliberate, perhaps due to adjustments in the microscope setup causing concerns with these acquisitions, or unintentional due to transfer failures. In either case, this underscores the disadvantages of having a human in the loop for repetitive file transfer tasks, as real-time decision-making distracts from the ongoing experiment.

The delay between acquisition and data availability is significantly longer in the file transfer workflow compared to streaming. For instance, in the first batch, the user waited 20 minutes for the initial acquisition to be available (dark blue bar in Fig. 5a). Even the last acquisition in the first batch, one of the shortest timeline bars in the series, required about 270 seconds to become accessible—almost an order of magnitude slower than the consistent 30-second time to processed data observed in the streaming workflow. Our previous work showed that the average file transfer duration for similar sized data sets is approximately 139 seconds (refer to the Results section of reference 39). However, that analysis did not account for additional overhead found in real experiments, such as simultaneous data transfer and acquisition, Perlmutter queue times, and user interactions needed to initiate transfers in Distiller. Together, these delays amounted to doubling the waiting period for the microscope user. Conversely, the streaming acquisitions, represented in teal, were available approximately 30 seconds after each acquisition as no human interaction is required between acquisitions, and simultaneous data transfer and acquisition do not occur. The arrows in Fig. 5a point to the last (twenty-fourth) acquisition in both series, including the omitted file transfer acquisitions, indicating the streaming workflow enables collection of data at a faster rate.

In Fig. 5b, we compare the workflows for a 4D-STEM tomography experiment, where a user spends time between acquisitions to align the sample and microscope at a set of rotation angles. Each acquisition amounted to 695 GB of raw data, with data dimensions of 1024 ×\times× 1024 ×\times× 576 ×\times× 576. Here, the user was able to tilt, center, and focus the object in the field of view faster than the file transfer pipeline was able to produce processed data. The user thus had to wait for the NERSC process to complete before acquiring a new scan. Further, the user was required to initiate file transfers (disrupting their focus on the experiment) and monitor the file transfer process during the experiment to avoid overtaxing the system. Conversely, the streaming workflow (initiated with one interaction at the start of the experiment) produced finalized data before the next scan was initiated indicating processing time was less than microscope operation time.

In both cases, it is clear that more data can be acquired in a shorter amount of time with better consistency. There is also additional benefit in removing several extra steps from the experimental workflow, especially the need for users to initiate processing jobs.

\secsize Conclusions

In this work, we demonstrate the advantages of a streaming-based data transfer workflow over traditional file-based workflows, which often suffer from performance bottlenecks due to disk I/O operations. By bypassing local and remote disk I/O and transferring data directly over the network to a HPC center, our pipeline enables on-the-fly processing on remote hardware with better capabilities. This streaming pipeline seamlessly connects a high frame rate direct electron detector (the 4D Camera) to an HPC center (NERSC).

The pipeline’s capabilities were demonstrated through an hour-long experiment, where 60 4D-STEM datasets totalling over 10 TB of raw data were acquired, streamed, and electron-counted in real time at NERSC, resulting in a compressed data size of 125 GB. This experiment not only tested the streaming workflow’s ability to handle large volumes of data but also evaluated the entire system’s capacity to produce large numbers of high-quality datasets suitable for advanced analyses such as parallax and ptychography.

A key benefit of our streaming approach, beyond the already-established increase in raw throughput (Welborn et al., 2024a), is the significant reduction in human interaction required. Our comparative analysis of historical data from real user experiments reveals that automating the data transfer process increases throughput, minimizes the potential for human error, and eliminates the overhead associated with manual interactions.

Furthermore, the user focused design of our solution abstracts away the complexities of HPC, allowing researchers to focus on scientific inquiry rather than the intricacies of computation. This streaming system is integrated into the Distiller web frontend, simplifying the workflow. The system is in daily use on the TEAM 0.5 microscope at NCEM.

This work represents an important step forward in the integration of HPC resources with EOS facilities, addressing critical challenges in data management and computational efficiency. It serves as a model for similar integrations in other data-intensive scientific domains, having implications that extend beyond the immediate context of one electron microscopy detector. Future work will focus on expanding its applicability to other experimental setups and analytical techniques.

\secsize Competing interests

No competing interest is declared.

\secsize Acknowledgments

Work at the Molecular Foundry was supported by the Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 using NERSC awards BES-ERCAP0024753 and BES-ERCAP0024754. We would like to thank Gatan, Inc. as well as P Denes, A Minor, J Ciston, J Joseph, Vamsi Vytla, and I Johnson who contributed to the development of the 4D Camera. We would also like to thank A Saha and A Bhalla-Levine for compiling experimental information for use in Figure 5.

References

  • Avery et al. (2023) P. Avery, C. Harris, P. Ercius, A. Genova, M. D. Hanwell, and Z. Zhao. Openchemistry/stempy: stempy 3.3.3, Apr. 2023. URL https://doi.org/10.5281/zenodo.7806318.
  • Bard et al. (2022) D. Bard, C. Snavely, L. Gerhardt, J. Lee, R. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, and G. Torok. Lbnl superfacility project report, 2022. URL https://doi.org/10.2172/1875256. Research Org.: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States).
  • Battaglia et al. (2009) M. Battaglia, D. Contarato, P. Denes, and P. Giubilato. Cluster imaging with a direct detection CMOS pixel sensor in transmission electron microscopy. Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment, 608(2):363–365, Sept. 2009. 10.1016/j.nima.2009.07.017.
  • Ben-Moshe et al. (2021) A. Ben-Moshe, A. Da Silva, A. Müller, A. Abu-Odeh, P. Harrison, J. Waelder, F. Niroui, C. Ophus, A. M. Minor, M. Asta, et al. The chain of chirality transfer in tellurium nanocrystals. Science, 372(6543):729–733, 2021.
  • Cao et al. (2018) M. C. Cao, Y. Han, Z. Chen, Y. Jiang, K. X. Nguyen, E. Turgut, G. D. Fuchs, and D. A. Muller. Theory and practice of electron diffraction from single atoms and extended objects using an empad. Microscopy, 67(suppl_1):i150–i161, 2018.
  • Chatterjee et al. (2021) D. Chatterjee, J. Wei, B. Bammes, B. Levin, R. Bilhorn, P. Voyles, et al. An ultrafast direct electron camera for 4d stem. Microscopy and Microanalysis, 27(S1):1004–1006, 2021.
  • Dekkers and De Lang (1974) N. Dekkers and H. De Lang. Differential phase contrast in a stem. Optik, 41(4):452–456, 1974.
  • Doeleman et al. (2023) S. S. Doeleman, J. Barrett, L. Blackburn, K. L. Bouman, A. E. Broderick, R. Chaves, V. L. Fish, G. Fitzpatrick, M. Freeman, A. Fuentes, J. L. Gómez, K. Haworth, J. Houston, S. Issaoun, M. D. Johnson, M. Kettenis, L. Loinard, N. Nagar, G. Narayanan, A. Oppenheimer, D. C. M. Palumbo, N. Patel, D. W. Pesce, A. W. Raymond, F. Roelofs, R. Srinivasan, P. Tiede, J. Weintroub, and M. Wielgus. Reference array and design consideration for the next-generation event horizon telescope. Galaxies, 11(5), 2023. 10.3390/galaxies11050107. URL https://www.mdpi.com/2075-4434/11/5/107.
  • Enders and Thibault (2016) B. Enders and P. Thibault. A computational framework for ptychographic reconstructions. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 472(2196):20160640, 2016. 10.1098/rspa.2016.0640. URL https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2016.0640.
  • Enders et al. (2020) B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, et al. Cross-facility science with the superfacility project at lbnl. In 2020 IEEE/ACM 2nd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), pages 1–7. IEEE, 2020.
  • Ercius et al. (2012) P. Ercius, M. Boese, T. Duden, and U. Dahmen. Operation of TEAM I in a User Environment at NCEM. Microscopy and Microanalysis, 18(4):676–683, 07 2012. 10.1017/S1431927612001225. URL https://doi.org/10.1017/S1431927612001225.
  • Ercius et al. (2015) P. Ercius, O. Alaidi, M. J. Rames, and G. Ren. Electron tomography: a three-dimensional analytic tool for hard and soft materials research. Advanced materials, 27(38):5638–5663, 2015.
  • Ercius et al. (2020) P. Ercius, I. Johnson, H. Brown, P. Pelz, S.-L. Hsu, B. Draney, E. Fong, A. Goldschmidt, J. Joseph, J. Lee, et al. The 4d camera - an 87 khz frame-rate detector for counted 4d-stem experiments. Microscopy and Microanalysis, 26(S2):1896–1897, 2020. 10.1017/S1431927620019753.
  • Ercius et al. (2023) P. Ercius, I. J. Johnson, P. Pelz, B. H. Savitzky, L. Hughes, H. G. Brown, S. E. Zeltmann, S.-L. Hsu, C. C. S. Pedroso, B. E. Cohen, R. Ramesh, D. Paul, J. M. Joseph, T. Stezelberger, C. Czarnik, M. Lent, E. Fong, J. Ciston, M. C. Scott, C. Ophus, A. M. Minor, and P. Denes. The 4d camera: an 87 khz direct electron detector for scanning/transmission electron microscopy, 2023.
  • Harris and Genova (2023) C. Harris and A. Genova. Distiller. https://github.com/OpenChemistry/distiller, 2023.
  • Henderson et al. (2020) M. L. Henderson, W. Krinsman, S. Cholia, R. Thomas, and T. Slaton. Accelerating experimental science using jupyter and nersc hpc. In Tools and Techniques for High Performance Computing: Selected Workshops, HUST, SE-HER and WIHPC, Held in Conjunction with SC 2019, Denver, CO, USA, November 17–18, 2019, Revised Selected Papers 6, pages 145–163. Springer, 2020.
  • Levin (2021) B. D. A. Levin. Direct detectors and their applications in electron microscopy for materials science. Journal of Physics: Materials, 4(4):042005, 7 2021. 10.1088/2515-7639/ac0ff9. URL https://dx.doi.org/10.1088/2515-7639/ac0ff9.
  • Miao et al. (2016) J. Miao, P. Ercius, and S. J. Billinge. Atomic electron tomography: 3d structures without crystals. Science, 353(6306):aaf2157, 2016.
  • Nellist et al. (1995) P. Nellist, B. McCallum, and J. M. Rodenburg. Resolution beyond the’information limit’in transmission electron microscopy. nature, 374(6523):630–632, 1995.
  • Ophus (2019) C. Ophus. Four-Dimensional Scanning Transmission Electron Microscopy (4D-STEM): From Scanning Nanodiffraction to Ptychography and Beyond. Microscopy and Microanalysis, 25(3):563–582, 06 2019. 10.1017/S1431927619000497. URL https://doi.org/10.1017/S1431927619000497.
  • Ophus (2023) C. Ophus. Quantitative scanning transmission electron microscopy for materials science: Imaging, diffraction, spectroscopy, and tomography. Annual Review of Materials Research, 53:105–141, 2023.
  • Parkinson et al. (2020) D. Y. Parkinson, H. Krishnan, D. Ushizima, M. Henderson, and S. Cholia. Interactive parallel workflows for synchrotron tomography. In 2020 IEEE/ACM 2nd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), pages 29–34. IEEE, 2020.
  • Pattison et al. (2023) A. J. Pattison, C. C. S. Pedroso, B. E. Cohen, J. C. Ondry, A. P. Alivisatos, W. Theis, and P. Ercius. Advanced techniques in automated high-resolution scanning transmission electron microscopy. Nanotechnology, 35(1):015710, oct 2023. 10.1088/1361-6528/acf938. URL https://dx.doi.org/10.1088/1361-6528/acf938.
  • Pelz et al. (2021a) P. Pelz, P. Ercius, C. Ophus, I. Johnson, and M. Scott. Real-time interactive ptychography from electron event representation data. Microscopy and Microanalysis, 27(S1):188–189, 08 2021a. 10.1017/S1431927621001288. URL https://doi.org/10.1017/S1431927621001288.
  • Pelz et al. (2021b) P. M. Pelz, H. G. Brown, S. Stonemeyer, S. D. Findlay, A. Zettl, P. Ercius, Y. Zhang, J. Ciston, M. C. Scott, and C. Ophus. Phase-contrast imaging of multiply-scattering extended objects at atomic resolution by reconstruction of the scattering matrix. Phys. Rev. Res., 3:023159, 2021b. 10.1103/PhysRevResearch.3.023159. URL https://link.aps.org/doi/10.1103/PhysRevResearch.3.023159.
  • Pelz et al. (2022) P. M. Pelz, S. Griffin, S. Stonemeyer, D. Popple, H. Devyldere, P. Ercius, A. Zettl, M. C. Scott, and C. Ophus. Solving complex nanostructures with ptychographic atomic electron tomography, 2022. URL https://arxiv.longhoe.net/abs/2206.08958.
  • Rao (2020) R. Rao. Synchrotrons face a data deluge. Phys. Today, 2020(2):0925a, Sept. 2020.
  • Ribet et al. (2024) S. M. Ribet, G. Varnavides, C. C. S. Pedroso, B. E. Cohen, P. Ercius, M. C. Scott, and C. Ophus. Uncovering the three-dimensional structure of upconverting core–shell nanoparticles with multislice electron ptychography. Applied Physics Letters, 124(24):240601, 06 2024. 10.1063/5.0206814. URL https://doi.org/10.1063/5.0206814.
  • Savitzky et al. (2021) B. H. Savitzky, S. E. Zeltmann, L. A. Hughes, H. G. Brown, S. Zhao, P. M. Pelz, T. C. Pekin, E. S. Barnard, J. Donohue, L. Rangel DaCosta, E. Kennedy, Y. Xie, M. T. Janish, M. M. Schneider, P. Herring, C. Gopal, A. Anapolsky, R. Dhall, K. C. Bustillo, P. Ercius, M. C. Scott, J. Ciston, A. M. Minor, and C. Ophus. py4dstem: A software package for four-dimensional scanning transmission electron microscopy data analysis. Microscopy and Microanalysis, 27(4):712–743, 2021.
  • Schramm et al. (2012) S. M. Schramm, S. J. van der Molen, and R. M. Tromp. Intrinsic instability of aberration-corrected electron microscopes. Phys. Rev. Lett., 109:163901, Oct 2012. 10.1103/PhysRevLett.109.163901. URL https://link.aps.org/doi/10.1103/PhysRevLett.109.163901.
  • Shibata et al. (2012) N. Shibata, S. D. Findlay, Y. Kohno, H. Sawada, Y. Kondo, and Y. Ikuhara. Differential phase-contrast microscopy at atomic resolution. Nature Physics, 8(8):611–615, 2012.
  • Spurgeon et al. (2021) S. R. Spurgeon, C. Ophus, L. Jones, A. Petford-Long, S. V. Kalinin, M. J. Olszta, R. E. Dunin-Borkowski, N. Salmon, K. Hattar, W.-C. D. Yang, R. Sharma, Y. Du, A. Chiaramonti, H. Zheng, E. C. Buck, L. Kovarik, R. L. Penn, D. Li, X. Zhang, M. Murayama, and M. L. Taheri. Towards data-driven next-generation transmission electron microscopy. Nature materials, 20(3):274–279, Mar. 2021. 10.1038/s41563-020-00833-z.
  • Stevens et al. (2018) A. Stevens, H. Yang, W. Hao, L. Jones, C. Ophus, P. D. Nellist, and N. D. Browning. Subsampled stem-ptychography. Applied Physics Letters, 113(3), 2018.
  • Thomas and Cholia (2021) R. Thomas and S. Cholia. Interactive supercomputing with jupyter. Computing in Science & Engineering, 23(2):93–98, 2021.
  • Thomas et al. (2017) R. Thomas, S. Canon, S. Cholia, L. Gerhardt, and E. Racah. Toward interactive supercomputing at nersc with jupyter. In Cray User Group (CUG) Conference Proceedings. Cray User Group (CUG), 2017.
  • Varnavides et al. (2023) G. Varnavides, S. M. Ribet, S. E. Zeltmann, Y. Yu, B. H. Savitzky, V. P. Dravid, M. C. Scott, and C. Ophus. Iterative phase retrieval algorithms for scanning transmission electron microscopy. arXiv preprint arXiv:2309.05250, 2023.
  • Waddell and Chapman (1979) E. Waddell and J. Chapman. Linear imaging of strong phase objects using asymmetrical detectors in stem. Optik, 54:83–96, 1979.
  • Welborn (2024) S. Welborn. SI videos, streaming, May 2024. URL https://doi.org/10.5281/zenodo.10092028.
  • Welborn et al. (2024a) S. S. Welborn, B. Enders, C. Harris, P. Ercius, and D. J. Bard. Accelerating time-to-science by streaming detector data directly into perlmutter compute nodes. arXiv preprint arXiv:2403.14352, 2024a.
  • Welborn et al. (2024b) S. S. Welborn, S. M. Ribet, and P. Ercius. This manuscript’s code repository. https://github.com/swelborn/welborn-microscopy-streaming-paper-with-code, 2024b.
  • Yasin et al. (2016) F. S. Yasin, T. R. Harvey, J. J. Chess, J. S. Pierce, and B. J. McMorran. Development of stem-holography. Microscopy and Microanalysis, 22(S3):506–507, 2016. 10.1017/S143192761600338X.
  • Yasin et al. (2018) F. S. Yasin, T. R. Harvey, J. J. Chess, J. S. Pierce, C. Ophus, P. Ercius, and B. J. McMorran. Probing light atoms at subnanometer resolution: Realization of scanning transmission electron microscope holography. Nano letters, 18(11):7118–7123, 2018.
  • Yu et al. (2024) Y. Yu, K. A. Spoth, M. Colletta, K. X. Nguyen, S. E. Zeltmann, X. S. Zhang, M. Paraan, M. Kopylov, C. Dubbeldam, D. Serwas, et al. Dose-efficient cryo-electron microscopy for thick samples using tilt-corrected scanning transmission electron microscopy, demonstrated on cells and single particles. bioRxiv, pages 2024–04, 2024.
  • Zambon et al. (2023) P. Zambon, J. Vávra, G. Montemurro, S. Bottinelli, A. Dudina, R. Schnyder, C. Hörmann, M. Meffert, C. Schulze-Briese, D. Stroppa, et al. High-frame rate and high-count rate hybrid pixel detector for 4d stem applications. Frontiers in Physics, 11:1308321, 2023.