Deep Temporal Sequence Classification and Mathematical Modeling for Cell Tracking in Dense 3D Microscopy Videos of Bacterial Biofilms

Tan** Taher Toma, Yibo Wang, Andreas Gahlmann, Scott T. Acton Tan** Taher Toma is with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904 USA (e-mail: [email protected]).Yibo Wang is with the Department of Chemistry, University of Virginia, Charlottesville, VA 22904 USA (e-mail: [email protected]).Andreas Gahlmann is with the Department of Chemistry, University of Virginia, Charlottesville, VA 22904 USA and the Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903 USA (e-mail: [email protected]).Scott T. Acton is with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904 USA (e-mail: [email protected]).
Abstract

Automatic cell tracking in dense environments is plagued by inaccurate correspondences and misidentification of parent-offspring relationships. In this paper, we introduce a novel cell tracking algorithm named DenseTrack, which integrates deep learning with mathematical model-based strategies to effectively establish correspondences between consecutive frames and detect cell division events in crowded scenarios. We formulate the cell tracking problem as a deep learning-based temporal sequence classification task followed by solving a constrained one-to-one matching optimization problem exploiting the classifier’s confidence scores. Additionally, we present an eigendecomposition-based cell division detection strategy that leverages knowledge of cellular geometry. The performance of the proposed approach has been evaluated by tracking densely packed cells in 3D time-lapse image sequences of bacterial biofilm development. The experimental results on simulated as well as experimental fluorescence image sequences suggest that the proposed tracking method achieves superior performance in terms of both qualitative and quantitative evaluation measures compared to recent state-of-the-art cell tracking approaches.

Index Terms:
Cell tracking, deep learning, temporal sequence classification, eigendecomposition, bacterial biofilms.

I Introduction

Cell tracking in time-lapse microscopy image sequences is a challenging multi-object tracking task that is essential for research focusing on the behaviors of individual cells in a population. Because large numbers of cells need to be tracked to make statistically significant conclusions, accurate and robust automated tracking approaches are required. Automated tracking involves identifying and linking instances of the same biological cell and perhaps their offspring in consecutive frames of an image sequence. Accurate reconstructions of cell trajectories enables researchers to extract biologically and biophysically relevant parameters, such as cell growth and division rate, cell adhesion and dispersal frequencies, death rate, as well as changes in cellular motion patterns. All these observables can provide quantitative insights into how population behaviors emerge from the underlying behaviors of individual cells [1, 2]. The cell tracking problem often becomes challenging to solve in the presence of high cell density, fast motion, and frequent division events.

There are two main categories of automatic cell tracking methods: tracking-by-contour evolution and tracking-by-detection. The contour evolution-based methods involve finding the object contour in the current frame given an initial contour from the previous frame [3, 4, 5, 6]. Contour evolution-based approaches solve the segmentation and tracking tasks simultaneously by solving an iterative PDE-based energy functional. In contrast, the tracking-by-detection approach separates the segmentation and tracking task by first performing the segmentation of the individual instances in all the frames and then establishing the temporal associations between the segmented cells of consecutive frames [7, 8, 9]. While tracking-by-contour evolution is effective in certain scenarios, such as when morphological changes of cells are imaged at high magnification, it suffers in situations with low frame rates, high cell density, high motility, and frequent cell divisions. This is due to the underlying assumption of unambiguous spatiotemporal overlap between the corresponding cell regions [1, 2]. Tracking-by-detection methods are more effective in such scenarios, and their reduced computational complexity has further led to their widespread adoption for tracking large numbers of cells over longer time periods [10, 11]. In this paper, we focus on tracking-by-detection and present an algorithm to effectively track crowded cells over time in 4D (3D space plus time) data.

Over the years, numerous tracking-by-detection approaches have been proposed. The simplest methods use basic nearest-neighbor techniques to match cells between frames based on features such as intensity distribution, morphology, and size [12, 13]. More complex features, such as features of the cell’s neighborhood [14] or features derived from a graph structure [15], have also been exploited. However, nearest-neighbor methods that rely on a distance or similarity function are not effective for establishing correspondence in dense cell tracking scenarios, as they often lead to incorrect associations due to sub-optimal user-defined distance or similarity measures [16, 17]. There also exist graph-based tracking approaches where cells are represented as nodes in a graph, and association hypotheses are represented as edges linking the nodes [18, 19, 20, 21]. Such structures allow the tracking problem to be formulated as a graph-matching problem. However, the underlying problem formulation of graph-based tracking methods typically entails solving an optimization problem with numerous regularization terms, which poses challenges in tuning many hyperparameters.

Furthermore, probabilistic approaches for correspondence finding have also been proposed. These include joint probabilistic data association (JPDA) [22, 23] and multiple hypothesis-based tracking (MHT) [24, 25, 26]. The classical Kalman filter or its probabilistic variants have also been used to predict the position of the cells in the next frame [27, 28]. While these traditional methods have demonstrated effectiveness in many applications, they often rely on simplistic assumptions about cell behavior. For example, they may depend on a specific cell motion model or the selection of a particular probability distribution to represent the likelihood of object appearance and disappearance within the field of view. These assumptions do not necessarily hold true in all scenarios [29]. Most importantly, these classical approaches are fully based on fixed models and hence cannot leverage the advantages of learning representative information from a training dataset.

The utilization of deep learning techniques in cell tracking has typically been limited by the unavailability of ground truth annotations for time-lapse image sequences, in particular for 3D images. Several deep learning-based methods have been developed for cell tracking. One such approach models cell tracking as an edge classification problem in a direct graph using a graph neural network [30]. While estimating the entire set of cell trajectories at once with a graph neural network seems efficient, it can lead to numerous incorrect associations, particularly in dense or long image sequences, as many edges need to be classified simultaneously. Another cell tracking approach employs two separate U-Nets for cell likelihood detection and motion estimation [31]. Although the motion estimation strategy can be useful for tracking high-motility cells, simply relying on likelihood detection may not be as effective as segmenting all the cells prior to tracking in the case of dense neighborhoods. Other recent approaches that rely on extensive training data and high computational resources include a deep reinforcement learning method [32] and a pipeline of Siamese networks [33], both of which generally depend on large training datasets for optimal performance [34, 35]. Furthermore, authors in [36] presented a single convolutional neural network for simultaneous cell segmentation and tracking by predicting cellular embeddings and clustering bandwidths. Its effectiveness, though, has only been demonstrated in the context of cell tracking within 2D image sequences.

A common drawback of all these above-mentioned deep learning-based tracking approaches is that they do not explicitly enforce one-to-one matching between successive frames, the lack of which can lead to erroneous one-to-many associations. Additionally, these methods do not incorporate temporal history to predict associations in the next frame, which can be necessary when a cell is poorly imaged or segmented in some frames but better detected in neighboring frames. Furthermore, some of these approaches have been developed only for 2D cell tracking.

In our proposed approach, we address these limitations of existing classical and deep learning-based methods and present an effective solution that combines a robust deep learning strategy with mathematical modeling for 3D cell tracking in dense environments. Additionally, we overcome the challenge of lacking ground truth annotations for cell tracking by generating training annotations automatically. We achieve this by simulating synthetic biofilm image sequences using a dedicated simulation framework [37], which are then used to train the deep learning network in our tracking system.

The key contributions of our method are outlined below:

  • For frame-by-frame instance matching, our approach estimates association scores for potential matches in the next frame by conducting a deep learning-based temporal sequence classification task. This enables us to learn the association task through a data-driven approach rather than relying on a fixed classical model. Additionally, such an association score estimation network is trainable with limited training data compared to existing deep learning-based cell tracking methods, as the underlying problem being addressed is a straightforward binary classification task of predicting correct versus incorrect associations given a sequence of spatiotemporal features of instances.

  • We leverage near-temporal history in our spatiotemporal feature representation to estimate the association scores, instead of relying solely on features from the current and the next frame.

  • We enforce one-to-one matching between successive frames by solving an optimization problem that utilizes association scores estimated by the classifier. This additional matching step, unlike existing deep learning-based tracking methods, reduces matching errors in high cell density environments, such as those often seen in bacterial biofilms.

  • For detecting cell division events in bacterial biofilms, we present a novel strategy based on the eigendecomposition of unmatched instances in the next frame. This approach effectively identifies the offspring and their parent instance, even in scenarios where there is minimal overlap between the parent and daughter instances in the next frame due to cell motion.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 1: Overview of the proposed tracking approach DenseTrack. In (a) and (b), we depict our frame-by-frame matching technique, which entails calculating deep learning-based association scores and integrating them into one-to-one matching optimization. (c) illustrates the detection of a cell division event by identifying the neighboring instance with the minimum projection along the 2ndsuperscript2𝑛𝑑2^{nd}2 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT and 3rdsuperscript3𝑟𝑑3^{rd}3 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT principal components of the unmatched instance in frame t+1𝑡1t+1italic_t + 1.

II Problem Statement

Let us consider an image sequence, denoted by 𝑺={𝑭t}t=1T𝑺superscriptsubscriptsuperscript𝑭𝑡𝑡1𝑇\boldsymbol{S}=\{\boldsymbol{F}^{t}\}_{t=1}^{T}bold_italic_S = { bold_italic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, which comprises T𝑇Titalic_T frames. Let L𝐿Litalic_L be the number of biological cells present in this sequence. The cell tracking problem can be stated as follows, (1) determine the trajectory of each biological cell and (2) identify the parent of each biological cell in cases where cell existence is due to cell division. For each biological cell, we need to calculate a set of information represented by 𝓣l={tinitl,tfinl,𝑪l,P(l)}subscript𝓣𝑙superscriptsubscript𝑡𝑖𝑛𝑖𝑡𝑙superscriptsubscript𝑡𝑓𝑖𝑛𝑙superscript𝑪𝑙𝑃𝑙\mathcal{\boldsymbol{T}}_{l}=\{t_{init}^{l},t_{fin}^{l},\boldsymbol{C}^{l},P(l)\}bold_caligraphic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = { italic_t start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , bold_italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , italic_P ( italic_l ) }. Here tinitlsuperscriptsubscript𝑡𝑖𝑛𝑖𝑡𝑙t_{init}^{l}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and tfinlsuperscriptsubscript𝑡𝑓𝑖𝑛𝑙t_{fin}^{l}italic_t start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT refer to the first and last time points in which the lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT cell appears in the sequence, respectively. 𝑪lsuperscript𝑪𝑙\boldsymbol{C}^{l}bold_italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT represents the set of coordinates of the lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT cell from the first frame tinitlsuperscriptsubscript𝑡𝑖𝑛𝑖𝑡𝑙t_{init}^{l}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT to the last frame tfinlsuperscriptsubscript𝑡𝑓𝑖𝑛𝑙t_{fin}^{l}italic_t start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT. Finally, P(l)𝑃𝑙P(l)italic_P ( italic_l ) is a function that identifies the parent cell of the lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT cell, where P(l)=l𝑃𝑙superscript𝑙P(l)=l^{{}^{\prime}}italic_P ( italic_l ) = italic_l start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT if lsuperscript𝑙l^{{}^{\prime}}italic_l start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT is the parent of cell l𝑙litalic_l, and P(l)=0𝑃𝑙0P(l)=0italic_P ( italic_l ) = 0 if the cell appearance is not due to cell division. The objective of cell tracking is to obtain the set {𝓣1,,𝓣L}subscript𝓣1subscript𝓣𝐿\{\mathcal{\boldsymbol{T}}_{1},...,\mathcal{\boldsymbol{T}}_{L}\}{ bold_caligraphic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_caligraphic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT }.

III Proposed Solution

To solve the problem, our method involves initially matching cell instances across consecutive frames, followed by the detection of division events and the establishment of complete trajectories. An overview of the proposed approach, called DenseTrack, is illustrated in Fig. 1.

III-A Frame-by-Frame Association

Let 𝑭t={𝒇it|i=1,2,..,m}\boldsymbol{F}^{t}=\{\boldsymbol{f}_{i}^{t}|i=1,2,..,m\}bold_italic_F start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_i = 1 , 2 , . . , italic_m } and 𝑭t+1={𝒇jt+1|j=1,2,..,n}\boldsymbol{F}^{t+1}=\{\boldsymbol{f}_{j}^{t+1}|j=1,2,..,n\}bold_italic_F start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { bold_italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT | italic_j = 1 , 2 , . . , italic_n } denote two consecutive frames with m𝑚mitalic_m and n𝑛nitalic_n cell instances, respectively, where each instance is represented by a feature vector 𝒇𝒇\boldsymbol{f}bold_italic_f. For each instance in frame t𝑡titalic_t, there exist several matching candidates in frame t+1𝑡1t+1italic_t + 1, represented by the set 𝑴i={(𝒇it,𝒇jkit+1)|ki=1,2,,Ni}subscript𝑴𝑖conditional-setsuperscriptsubscript𝒇𝑖𝑡superscriptsubscript𝒇subscript𝑗subscript𝑘𝑖𝑡1subscript𝑘𝑖12subscript𝑁𝑖\boldsymbol{M}_{i}=\{(\boldsymbol{f}_{i}^{t},\boldsymbol{f}_{j_{k_{i}}}^{t+1})% |k_{i}=1,2,...,N_{i}\}bold_italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { ( bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , 2 , … , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. These candidates can be selected from the neighborhood of the projected location of 𝒇itsuperscriptsubscript𝒇𝑖𝑡\boldsymbol{f}_{i}^{t}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in frame t+1𝑡1t+1italic_t + 1. Our objective is to estimate the likelihood of each of these candidates being a correct association. For any candidate kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT association, we create a spatiotemporal feature vector, 𝒇i,(jki)tem=[𝒇itr,,𝒇it,𝒇jkit+1]superscriptsubscript𝒇𝑖subscript𝑗subscript𝑘𝑖𝑡𝑒𝑚superscriptsubscript𝒇𝑖𝑡𝑟superscriptsubscript𝒇𝑖𝑡superscriptsubscript𝒇subscript𝑗subscript𝑘𝑖𝑡1\boldsymbol{f}_{i,(j_{k_{i}})}^{tem}=[\boldsymbol{f}_{i}^{t-r},...,\boldsymbol% {f}_{i}^{t},\boldsymbol{f}_{j_{k_{i}}}^{t+1}]bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT = [ bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - italic_r end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ]. This vector is formed by concatenating the feature vector at time t𝑡titalic_t with the feature vectors from the preceding r𝑟ritalic_r time frames and the feature vector of the candidate at time t+1𝑡1t+1italic_t + 1. Representative features to characterize 𝒇isubscript𝒇𝑖\boldsymbol{f}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at a time point include 3D spatial coordinates and bounding box measures of the instance.

By leveraging the near-temporal history within our spatiotemporal feature vector, we propose computing the probability that any candidate association kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is correct, denoted as P[y=1|𝒇i,(jki)tem]𝑃delimited-[]𝑦conditional1superscriptsubscript𝒇𝑖subscript𝑗subscript𝑘𝑖𝑡𝑒𝑚P[y=1|\boldsymbol{f}_{i,(j_{k_{i}})}^{tem}]italic_P [ italic_y = 1 | bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT ], through the execution of a temporal sequence classification task using deep learning. We have chosen InceptionTime [38], a widely adopted time series convolutional neural network model based on the Inception architecture, for this classification task. By incorporating Inception modules along with residual connections, the InceptionTime architecture is designed to address overfitting and vanishing gradient concerns. In Section V-A, we have demonstrated that the InceptionTime architecture outperforms other state-of-the-art time series classifiers in this classification task as part of our cell tracking framework. We have pretrained the classification network to distinguish between correct and incorrect associations (y=1𝑦1y=1italic_y = 1 or 00). During tracking execution, the network’s confidence score is utilized as the association score for the kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT candidate association, denoted as a(𝒇it,𝒇jkit+1)=P[y=1|𝒇i,(jki)tem;𝚯)]a(\boldsymbol{f}_{i}^{t},\boldsymbol{f}_{j_{k_{i}}}^{t+1})=P[y=1|\boldsymbol{f% }_{i,(j_{k_{i}})}^{tem};\boldsymbol{\Theta})]italic_a ( bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) = italic_P [ italic_y = 1 | bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT ; bold_Θ ) ]. Here, 𝚯𝚯\boldsymbol{\Theta}bold_Θ represents the learned parameters of the network. Overall, with Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT number of potential associations for the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT instance, there are a total of N=i=1mNi𝑁superscriptsubscript𝑖1𝑚subscript𝑁𝑖N=\sum_{i=1}^{m}{N_{i}}italic_N = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT possible associations between frame t𝑡titalic_t and t+1𝑡1t+1italic_t + 1, such that 𝑴=i=1m𝑴i𝑴superscriptsubscript𝑖1𝑚subscript𝑴𝑖\boldsymbol{M}=\cup_{i=1}^{m}\boldsymbol{M}_{i}bold_italic_M = ∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bold_italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT exist. The network estimates association scores for all these N𝑁Nitalic_N associations in one shot.

Next, we establish a one-to-one correspondence between frames t𝑡titalic_t and t+1𝑡1t+1italic_t + 1 by solving a constrained optimization problem, utilizing the calculated association scores. The objective is to choose the associations from the N𝑁Nitalic_N potential associations that maximize the sum of the association scores. Mathematically, the optimal matching approach involves searching for a solution represented by a binary vector 𝒙0={0,1}Nsubscript𝒙0superscript01𝑁\boldsymbol{x}_{0}=\{0,1\}^{N}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT that maximize the objective function presented in equation (1),

𝒙0=argmax𝒙{0,1}Nk=1N(𝒙(k)a(𝒇ikt,𝒇jkt+1))subscript𝒙0subscriptargmax𝒙superscript01𝑁superscriptsubscript𝑘1𝑁𝒙𝑘𝑎superscriptsubscript𝒇subscript𝑖𝑘𝑡superscriptsubscript𝒇subscript𝑗𝑘𝑡1\boldsymbol{x}_{0}=\operatorname*{arg\,max}_{\boldsymbol{x}\in\{0,1\}^{N}}~{}% \sum_{k=1}^{N}\left(\boldsymbol{x}(k)\,a(\boldsymbol{f}_{i_{k}}^{t},% \boldsymbol{f}_{j_{k}}^{t+1})\right)bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT bold_italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( bold_italic_x ( italic_k ) italic_a ( bold_italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) (1)

The matching constraint that ensures bi-directional one-to-one correspondence for the optimization in (1) can be expressed as follows,

𝒀𝒙𝒃𝒀𝒙𝒃\boldsymbol{Y}\boldsymbol{x}\leq\boldsymbol{b}bold_italic_Y bold_italic_x ≤ bold_italic_b (2)

where 𝒀𝒀\boldsymbol{Y}bold_italic_Y represents a (m+n)×N𝑚𝑛𝑁(m+n)\times N( italic_m + italic_n ) × italic_N dimensional system matrix and 𝒃𝒃\boldsymbol{b}bold_italic_b represents a (m+n)𝑚𝑛(m+n)( italic_m + italic_n ) dimensional vector of ones. The system matrix 𝒀𝒀\boldsymbol{Y}bold_italic_Y is designed as follows,

𝒀(q,k)={1,if q=ik or jk0,otherwise;q=1,2,..,(m+n)\boldsymbol{Y}(q,k)=\begin{cases}1,&\text{if $q=i_{k}$ or $j_{k}$}\\ 0,&\text{otherwise}\end{cases}\text{;}\,\quad\text{$q=1,2,.....,(m+n)$}bold_italic_Y ( italic_q , italic_k ) = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_q = italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW ; italic_q = 1 , 2 , … . . , ( italic_m + italic_n ) (3)

The entries of the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT column of 𝒀𝒀\boldsymbol{Y}bold_italic_Y indicate which cell instances in frame t𝑡titalic_t and t+1𝑡1t+1italic_t + 1 correspond to the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT possible match (𝒇ikt,𝒇jkt+1)superscriptsubscript𝒇subscript𝑖𝑘𝑡superscriptsubscript𝒇subscript𝑗𝑘𝑡1(\boldsymbol{f}_{i_{k}}^{t},\boldsymbol{f}_{j_{k}}^{t+1})( bold_italic_f start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) in 𝑴𝑴\boldsymbol{M}bold_italic_M, where k=1,2,.,Nk=1,2,....,Nitalic_k = 1 , 2 , … . , italic_N. The solution to the optimization problem in equation (1) is obtained following the proposed one-to-one matching algorithm presented in Algorithm 1. The algorithm iterates for each instance in frame t𝑡titalic_t to identify its matching candidate in t+1𝑡1t+1italic_t + 1 with the highest association score. In cases where two instances from t𝑡titalic_t are matched with a single instance in t+1𝑡1t+1italic_t + 1, the instance with the higher association score is considered correct, and the other instance is assigned to its candidate with the next highest association score. The algorithm continues until all the matched instances in t+1𝑡1t+1italic_t + 1 are unique cell IDs. The computational complexity of the proposed matching algorithm is 𝒪(mlogn)𝒪𝑚𝑛\mathcal{O}(m\log n)caligraphic_O ( italic_m roman_log italic_n ).

After performing one-to-one matching between any two consecutive frames t𝑡titalic_t and t+1𝑡1t+1italic_t + 1, the matched instances (𝒙(k)=1𝒙𝑘1\boldsymbol{x}(k)=1bold_italic_x ( italic_k ) = 1) in frame t+1𝑡1t+1italic_t + 1 are assigned the same identification numbers or cell IDs as their corresponding instances in frame t𝑡titalic_t. The unmatched instances (𝒙(k)=0𝒙𝑘0\boldsymbol{x}(k)=0bold_italic_x ( italic_k ) = 0) in frame t+1𝑡1t+1italic_t + 1 are labeled with new cell IDs.

III-B Cell Division Detection

To identify division events, we examine the unmatched instances identified throughout the video sequence since these instances may result from a cell division event or indicate the appearance of a new cell in the field of view. To determine if an unmatched instance is a potential daughter cell, we propose a novel eigendecomposition-based technique. This approach can accurately account for the rod-shaped geometry of the bacterial cell, as well as the fact that division results in the parent cell splitting into daughter cells along its major axis, as illustrated in Fig. 1c.

Let the coordinates of an unmatched instance be denoted by Xp×3𝑋superscript𝑝3X\in\mathcal{R}^{p\times 3}italic_X ∈ caligraphic_R start_POSTSUPERSCRIPT italic_p × 3 end_POSTSUPERSCRIPT with p𝑝pitalic_p representing the number of 3D points. The covariance matrix can be expressed as A=XTX𝐴superscript𝑋𝑇𝑋A=X^{T}Xitalic_A = italic_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_X, and we perform the singular-value decomposition, [U,S,V]=svd(A)𝑈𝑆𝑉𝑠𝑣𝑑𝐴[U,S,V]=svd(A)[ italic_U , italic_S , italic_V ] = italic_s italic_v italic_d ( italic_A ). The Eigenvector matrix, V=[v1,v2,v3]𝑉subscript𝑣1subscript𝑣2subscript𝑣3V=[v_{1},v_{2},v_{3}]italic_V = [ italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] with vi3subscript𝑣𝑖superscript3v_{i}\in\mathcal{R}^{3}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT contains three principal components, each of dimension 3. We then consider a neighborhood around X𝑋Xitalic_X with a neighborhood size twice the length of X𝑋Xitalic_X and project each neighboring cell Yiq×3subscript𝑌𝑖superscript𝑞3Y_{i}\in\mathcal{R}^{q\times 3}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_q × 3 end_POSTSUPERSCRIPT onto the 2ndsuperscript2𝑛𝑑2^{nd}2 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPT and 3rdsuperscript3𝑟𝑑3^{rd}3 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT principal components of X𝑋Xitalic_X. The resulting projection matrix is expressed as PMi=YiV2,3𝑃subscript𝑀𝑖subscript𝑌𝑖subscript𝑉23PM_{i}=Y_{i}V_{2,3}italic_P italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT 2 , 3 end_POSTSUBSCRIPT and a single projection value is computed as PVi=norm(mean(PMi))𝑃subscript𝑉𝑖𝑛𝑜𝑟𝑚𝑚𝑒𝑎𝑛𝑃subscript𝑀𝑖PV_{i}=norm(mean(PM_{i}))italic_P italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_n italic_o italic_r italic_m ( italic_m italic_e italic_a italic_n ( italic_P italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). Finally, the neighboring cell Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with the minimum projection value, argmin{PVi}argmin𝑃subscript𝑉𝑖\operatorname*{arg\,min}\{PV_{i}\}start_OPERATOR roman_arg roman_min end_OPERATOR { italic_P italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, is considered as the other candidate daughter cell of X𝑋Xitalic_X, denoted by Xsuperscript𝑋X^{{}^{\prime}}italic_X start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT.

Now, to further ensure that instances X𝑋Xitalic_X and Xsuperscript𝑋X^{{}^{\prime}}italic_X start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT in any frame t+1𝑡1t+1italic_t + 1 result from the cell division of a parent cell in frame t𝑡titalic_t, we compare the volume of the other candidate daughter cell in the current frame, vol(Xt+1)𝑣𝑜𝑙superscriptsubscript𝑋𝑡1vol(X_{t+1}^{{}^{\prime}})italic_v italic_o italic_l ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ), against the volume of its matched instance in the preceding time frame, vol(Xt)𝑣𝑜𝑙superscriptsubscript𝑋𝑡vol(X_{t}^{{}^{\prime}})italic_v italic_o italic_l ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ). Since cell division leads to the parent cell dividing into two daughter cells, each with approximately half the volume of the parent cell, we examine whether the ratio vol(Xt)vol(Xt+1)50%𝑣𝑜𝑙superscriptsubscript𝑋𝑡𝑣𝑜𝑙superscriptsubscript𝑋𝑡1percent50\frac{vol(X_{t}^{{}^{\prime}})}{vol(X_{t+1}^{{}^{\prime}})}\approx 50\%divide start_ARG italic_v italic_o italic_l ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_v italic_o italic_l ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) end_ARG ≈ 50 %. If the condition is satisfied, it suggests that X𝑋Xitalic_X and Xsuperscript𝑋X^{{}^{\prime}}italic_X start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT are the daughters of a parent cell from the previous frame. In such cases, we assign a distinct new cell ID to the other daughter cell Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to differentiate it from its parent in the previous frame.

III-C Generate Complete Trajectories

Following frame-by-frame association and cell division detection, the complete trajectories of the labeled cell instances can be computed. This process begins by identifying the unique instance IDs in the relabeled sequence. For each unique instance ID l𝑙litalic_l, the sequence is traversed to determine the initial and final time frames at which the instance appears, denoted by tinitlsuperscriptsubscript𝑡𝑖𝑛𝑖𝑡𝑙t_{init}^{l}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and tfinlsuperscriptsubscript𝑡𝑓𝑖𝑛𝑙t_{fin}^{l}italic_t start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, respectively. Additionally, the coordinates of the lthsuperscript𝑙𝑡l^{th}italic_l start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT instance at each time frame between tinitlsuperscriptsubscript𝑡𝑖𝑛𝑖𝑡𝑙t_{init}^{l}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and tfinlsuperscriptsubscript𝑡𝑓𝑖𝑛𝑙t_{fin}^{l}italic_t start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT are extracted and stored in a set of coordinates represented by 𝑪lsuperscript𝑪𝑙\boldsymbol{C}^{l}bold_italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT. Furthermore, incidents of cell division (P(l)=l𝑃𝑙superscript𝑙P(l)=l^{\prime}italic_P ( italic_l ) = italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT or P(l)=0𝑃𝑙0P(l)=0italic_P ( italic_l ) = 0) are recorded based on the findings from Section III-B.

Algorithm 1 One-to-One Matching between Frames t𝑡titalic_t and t+1𝑡1t+1italic_t + 1
1:  Input: Cell IDs for all candidate associations, 𝑪N×2subscript𝑪𝑁2\boldsymbol{C}_{N\times 2}bold_italic_C start_POSTSUBSCRIPT italic_N × 2 end_POSTSUBSCRIPT ; association scores, 𝒂N×1subscript𝒂𝑁1\boldsymbol{a}_{N\times 1}bold_italic_a start_POSTSUBSCRIPT italic_N × 1 end_POSTSUBSCRIPT
2:  Output: Association prediction 𝒙N×1{0,1}subscript𝒙𝑁101\boldsymbol{x}_{N\times 1}\in\{0,1\}bold_italic_x start_POSTSUBSCRIPT italic_N × 1 end_POSTSUBSCRIPT ∈ { 0 , 1 }
3:  kino. of nearest neighbors in t+1 for ith instance in tsubscript𝑘𝑖no. of nearest neighbors in t+1 for ith instance in tk_{i}\leftarrow\text{no. of nearest neighbors in $t+1$ for $i^{th}$ instance % in $t$}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← no. of nearest neighbors in italic_t + 1 for italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT instance in italic_t (set to 4)
4:  𝒄0unique cell IDs from frame tsubscript𝒄0unique cell IDs from frame t\boldsymbol{c}_{0}\leftarrow\text{unique cell IDs from frame $t$}bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← unique cell IDs from frame italic_t
5:  𝒄1unique cell IDs from frame t+1subscript𝒄1unique cell IDs from frame t+1\boldsymbol{c}_{1}\leftarrow\text{unique cell IDs from frame $t+1$}bold_italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← unique cell IDs from frame italic_t + 1
6:  𝒙zerosN×1𝒙𝑧𝑒𝑟𝑜subscript𝑠𝑁1\boldsymbol{x}\leftarrow zeros_{N\times 1}bold_italic_x ← italic_z italic_e italic_r italic_o italic_s start_POSTSUBSCRIPT italic_N × 1 end_POSTSUBSCRIPT {initialize}
7:  conflict1𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡1conflict\leftarrow 1italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t ← 1 {initialize}
8:  𝑫0[𝒄0[i]]-1i={0,1,..,(len(𝒄0)-1)}\boldsymbol{D}_{0}\left[\boldsymbol{c}_{0}[i]\right]\leftarrow\text{-}1\quad% \forall\,i=\{0,1,..,(len(\boldsymbol{c}_{0})\text{-}1)\}bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_i ] ] ← - 1 ∀ italic_i = { 0 , 1 , . . , ( italic_l italic_e italic_n ( bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - 1 ) } {initialize a dictionary for unique IDs of t𝑡titalic_t}
9:  𝑫1[𝒄1[j]]-1j={0,1,..,(len(𝒄1)-1)}\boldsymbol{D}_{1}\left[\boldsymbol{c}_{1}[j]\right]\leftarrow\text{-}1\quad% \forall\,j=\{0,1,..,(len(\boldsymbol{c}_{1})\text{-}1)\}bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_j ] ] ← - 1 ∀ italic_j = { 0 , 1 , . . , ( italic_l italic_e italic_n ( bold_italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - 1 ) } {initialize a dictionary for unique IDs of t+1𝑡1t+1italic_t + 1}
10:  while conflict>0𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡0conflict>0italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t > 0 do
11:     conflict0𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡0conflict\leftarrow 0italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t ← 0
12:     for i=0𝑖0i=0italic_i = 0 to (len(𝒄0)-1)𝑙𝑒𝑛subscript𝒄0-1(len(\boldsymbol{c}_{0})\text{-}1)( italic_l italic_e italic_n ( bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - 1 ) do
13:        if 𝑫0[𝒄0[i]]=-1subscript𝑫0delimited-[]subscript𝒄0delimited-[]𝑖-1\boldsymbol{D}_{0}\left[\boldsymbol{c}_{0}[i]\right]=\text{-}1bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_i ] ] = - 1 then
14:           max_locargmax𝒂ki×1𝑚𝑎𝑥_𝑙𝑜𝑐argmaxsubscript𝒂subscript𝑘𝑖1max\_loc\leftarrow\operatorname*{arg\,max}\boldsymbol{a}_{k_{i}\times 1}italic_m italic_a italic_x _ italic_l italic_o italic_c ← start_OPERATOR roman_arg roman_max end_OPERATOR bold_italic_a start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × 1 end_POSTSUBSCRIPT {select association with max score among kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT candidate scores}
15:           if 𝑫1[𝑪[max_loc,1]]=-1subscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐1-1\boldsymbol{D}_{1}\left[\boldsymbol{C}[max\_loc,1]\right]=\text{-}1bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] = - 1 then
16:              𝑫1[𝑪[max_loc,1]]max_locsubscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐1𝑚𝑎𝑥_𝑙𝑜𝑐\boldsymbol{D}_{1}\left[\boldsymbol{C}[max\_loc,1]\right]\leftarrow max\_locbold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] ← italic_m italic_a italic_x _ italic_l italic_o italic_c {update with new association location}
17:              𝑫0[𝑪[max_loc,0]]max_locsubscript𝑫0delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐0𝑚𝑎𝑥_𝑙𝑜𝑐\boldsymbol{D}_{0}\left[\boldsymbol{C}[max\_loc,0]\right]\leftarrow max\_locbold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 0 ] ] ← italic_m italic_a italic_x _ italic_l italic_o italic_c {update with new association location}
18:           else
19:              conflict1𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡1conflict\leftarrow 1italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t ← 1
20:              if 𝒂[max_loc]>𝒂[𝑫1[𝑪[max_loc,1]]]𝒂delimited-[]𝑚𝑎𝑥_𝑙𝑜𝑐𝒂delimited-[]subscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐1\boldsymbol{a}[max\_loc]>\boldsymbol{a}\left[\boldsymbol{D}_{1}\left[% \boldsymbol{C}[max\_loc,1]\right]\right]bold_italic_a [ italic_m italic_a italic_x _ italic_l italic_o italic_c ] > bold_italic_a [ bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] ] then
21:                 𝒂[𝑫1[𝑪[max_loc,1]]]0𝒂delimited-[]subscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐10\boldsymbol{a}\left[\boldsymbol{D}_{1}\left[\boldsymbol{C}[max\_loc,1]\right]% \right]\leftarrow 0bold_italic_a [ bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] ] ← 0 {indicates no association}
22:                 𝑫0[𝑪[𝑫1[𝑪[max_loc,1]]]]-1subscript𝑫0delimited-[]𝑪delimited-[]subscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐1-1\boldsymbol{D}_{0}\left[\boldsymbol{C}[\boldsymbol{D}_{1}\left[\boldsymbol{C}[% max\_loc,1]\right]\right]]\leftarrow\text{-}1bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_C [ bold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] ] ] ← - 1 {indicates no association}
23:                 𝑫0[𝑪[max_loc,0]]max_locsubscript𝑫0delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐0𝑚𝑎𝑥_𝑙𝑜𝑐\boldsymbol{D}_{0}\left[\boldsymbol{C}[max\_loc,0]\right]\leftarrow max\_locbold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 0 ] ] ← italic_m italic_a italic_x _ italic_l italic_o italic_c {update with new association location}
24:                 𝑫1[𝑪[max_loc,1]]max_locsubscript𝑫1delimited-[]𝑪𝑚𝑎𝑥_𝑙𝑜𝑐1𝑚𝑎𝑥_𝑙𝑜𝑐\boldsymbol{D}_{1}\left[\boldsymbol{C}[max\_loc,1]\right]\leftarrow max\_locbold_italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ bold_italic_C [ italic_m italic_a italic_x _ italic_l italic_o italic_c , 1 ] ] ← italic_m italic_a italic_x _ italic_l italic_o italic_c {update with new association location}
25:              else
26:                 𝒂[max_loc]0𝒂delimited-[]𝑚𝑎𝑥_𝑙𝑜𝑐0\boldsymbol{a}[max\_loc]\leftarrow 0bold_italic_a [ italic_m italic_a italic_x _ italic_l italic_o italic_c ] ← 0 {indicates no association}
27:              end if
28:           end if
29:        end if
30:     end for
31:  end while
32:  𝒙[𝑫0[𝒄0[i]]]1i={0,1,..,(len(𝒄0)-1)}𝑫0[𝒄0[i]]-1\boldsymbol{x}\left[\boldsymbol{D}_{0}\left[\boldsymbol{c}_{0}[i]\right]\right% ]\leftarrow 1\quad\forall\,i=\{0,1,..,(len(\boldsymbol{c}_{0})\text{-}1)\}\,% \land\boldsymbol{D}_{0}\left[\boldsymbol{c}_{0}[i]\right]\neq\text{-}1bold_italic_x [ bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_i ] ] ] ← 1 ∀ italic_i = { 0 , 1 , . . , ( italic_l italic_e italic_n ( bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - 1 ) } ∧ bold_italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ bold_italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_i ] ] ≠ - 1 {obtain final association prediction}

IV Experimental Framework

In this section, we describe the dataset, provide the implementation details, discuss the evaluation metrics, and summarize the competing approaches.

IV-A Dataset

We evaluated the proposed cell tracking method on both synthetic and real 3D microscopy image sequences of bacterial biofilms. The synthetic biofilm sequences were generated using a simulation framework [37] which models biofilm formation following biophysical rules and represents bacterial cells with realistic curvilinear morphology. In these synthetic sequences, starting with one or multiple seed cells, the imaged biofilm continues to form as the cells grow and divide over a period of time. We simulated multiple synthetic sequences with varying number of initial clusters where the seed cells are placed at random spatial allocations and orientations. These sequences were generated at a frame interval of 10 seconds. Each synthetic video has a dimension of 450×450×150×4045045015040450\times 450\times 150\times 40450 × 450 × 150 × 40 voxels in x𝑥xitalic_x-y𝑦yitalic_y-z𝑧zitalic_z-t𝑡titalic_t. The challenge here lies in linking cell instances within a highly dense environment and detecting frequent division events.

For cell tracking in real biofilm sequences, we acquired lattice light-sheet microscopy [39] videos of two kinds of bacteria species, Escherichia coli and Shewanella oneidensis. The resolution of each frame in the video is approximately 230 nm in x𝑥xitalic_x and y𝑦yitalic_y and 370 nm in z𝑧zitalic_z, assuming green fluorescent protein (GFP) excitation and emission. The S. oneidensis video was captured at a frame interval of 30 seconds over a total period of 15 minutes, while the E. coli sequence was captured at a frame interval of 5 minutes over a period of 50 minutes. Shewanella bacteria species exhibit motility in dense environments, making tracking individual cells over time challenging. Conversely, the E. coli video has a lower frame rate and features frequent division events, where cells divide with changes in orientation and spatial displacement into the next frame. This presents significant challenges in the frame-by-frame association and division event detection.

IV-B Implementation Details

The proposed tracking method has one module that requires training: the temporal sequence classification network. The other modules are entirely solved in the online test stage. We trained the network using synthetic biofilm sequences. From a training sequence, we randomly sampled trajectories, represented by a trajectory feature vector 𝒇i,(jki)tem=[𝒇itr,𝒇it1,𝒇it,𝒇jkit+1]superscriptsubscript𝒇𝑖subscript𝑗subscript𝑘𝑖𝑡𝑒𝑚superscriptsubscript𝒇𝑖𝑡𝑟superscriptsubscript𝒇𝑖𝑡1superscriptsubscript𝒇𝑖𝑡superscriptsubscript𝒇subscript𝑗subscript𝑘𝑖𝑡1\boldsymbol{f}_{i,(j_{k_{i}})}^{tem}=[\boldsymbol{f}_{i}^{t-r},\boldsymbol{f}_% {i}^{t-1},\boldsymbol{f}_{i}^{t},\boldsymbol{f}_{j_{k_{i}}}^{t+1}]bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT = [ bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - italic_r end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ], between any frame pairs t𝑡titalic_t and t+1𝑡1t+1italic_t + 1 with corresponding association labels of correct or incorrect associations (y=1𝑦1y=1italic_y = 1 or 00). Each 𝒇itsuperscriptsubscript𝒇𝑖𝑡\boldsymbol{f}_{i}^{t}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is represented by a 9-dimensional feature vector including 3D spatial coordinates and bounding box measures. With a choice of r=2𝑟2r=2italic_r = 2, we compute a 36-dimensional feature vector 𝒇i,(jki)temsuperscriptsubscript𝒇𝑖subscript𝑗subscript𝑘𝑖𝑡𝑒𝑚\boldsymbol{f}_{i,(j_{k_{i}})}^{tem}bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT for each candidate association. Additionally, we set the number of potential associations Ni=4subscript𝑁𝑖4N_{i}=4italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 4, with ki=1,2,,Nisubscript𝑘𝑖12subscript𝑁𝑖k_{i}=1,2,...,N_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , 2 , … , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We then train the network to minimize a binary cross-entropy loss, =i=1my(i)logy^(i)(1y(i))(1logy^(i))superscriptsubscript𝑖1𝑚superscript𝑦𝑖superscript^𝑦𝑖1superscript𝑦𝑖1superscript^𝑦𝑖\mathcal{L}=-\sum_{i=1}^{m}y^{(i)}\log\hat{y}^{(i)}-(1-y^{(i)})(1-\log\hat{y}^% {(i)})caligraphic_L = - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT roman_log over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - ( 1 - italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ( 1 - roman_log over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ), where y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG is the predicted association probability for ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT trajectory. We implemented our association network exploiting the InceptionTime architecture in thetimeseriesAI (tsai) framework [40]. The performance of InceptionTime network on this association task has been compared against other state-of-the-art time series networks in Section V-A.

We performed tracking experiments on six synthetic sequences and two real biofilm sequences. For synthetic sequences, the experiments were performed in a leave-one-out fashion; that is, the temporal sequence classification network was pretrained on five sequences, while the tracking algorithm was evaluated on the remaining sequence. For the real image sequences of two different biofilm species, the tracking algorithm was executed using a pretrained association network on the synthetic sequences. Since the proposed method is a tracking-by-detection approach, prior to performing the tracking task, the segmentation was performed on each 3D frame of the video using the biofilm segmentation method named DeepSeeded, as detailed in our previous work [41].

IV-C Evaluation Measures

We evaluated the tracking performance using two already established cell tracking performance measures. Both measures are full reference, hence compares the estimated tracks from the tracking algorithm with respect to the reference tracks. One measure is called tracking accuracy or TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A, which is widely adopted by the Cell Tracking Challenge. This metric, based on representing tracks as an acyclic oriented graph [42], calculates the cost associated with transforming a computed graph into the reference one. The cost, referred to as AOGM𝐴𝑂𝐺𝑀AOGMitalic_A italic_O italic_G italic_M (Acyclic Oriented Graph Metric), is computed as AOGM=wEDED+wEAEA+wECEC𝐴𝑂𝐺𝑀subscript𝑤𝐸𝐷𝐸𝐷subscript𝑤𝐸𝐴𝐸𝐴subscript𝑤𝐸𝐶𝐸𝐶AOGM=w_{ED}ED+w_{EA}EA+w_{EC}ECitalic_A italic_O italic_G italic_M = italic_w start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT italic_E italic_D + italic_w start_POSTSUBSCRIPT italic_E italic_A end_POSTSUBSCRIPT italic_E italic_A + italic_w start_POSTSUBSCRIPT italic_E italic_C end_POSTSUBSCRIPT italic_E italic_C. Here, ED𝐸𝐷EDitalic_E italic_D represents the cost of adding edges (resulting from missing links), EA𝐸𝐴EAitalic_E italic_A represents the cost of deleting edges (resulting from redundant links), and EC𝐸𝐶ECitalic_E italic_C represents the cost of altering edge semantics (resulting from incorrect division detection). The weights w𝑤witalic_w associated with these cost terms are typically set to 1. In essence, TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A provides a relative cost compared to the expense of creating the reference graph from scratch, denoted as AOGM0𝐴𝑂𝐺subscript𝑀0AOGM_{0}italic_A italic_O italic_G italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Mathematically, the TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A measure is expressed as:

TRA=1min(AOGM,AOGM0)AOGM0𝑇𝑅𝐴1𝐴𝑂𝐺𝑀𝐴𝑂𝐺subscript𝑀0𝐴𝑂𝐺subscript𝑀0TRA=1-\frac{\min(AOGM,AOGM_{0})}{AOGM_{0}}italic_T italic_R italic_A = 1 - divide start_ARG roman_min ( italic_A italic_O italic_G italic_M , italic_A italic_O italic_G italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_A italic_O italic_G italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG

Additionally, we separately evaluated the cell division detection accuracy in datasets with frequent division events using a F1 score named Division-F1 [43] represented as follows,

Division-F1=2×precision×recallprecision+recallDivision-F12𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑟𝑒𝑐𝑎𝑙𝑙𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑟𝑒𝑐𝑎𝑙𝑙\textit{Division-F1}=\frac{2\times precision\times recall}{precision+recall}Division-F1 = divide start_ARG 2 × italic_p italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n × italic_r italic_e italic_c italic_a italic_l italic_l end_ARG start_ARG italic_p italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n + italic_r italic_e italic_c italic_a italic_l italic_l end_ARG

Here, precision=TPTP+FP𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑇𝑃𝑇𝑃𝐹𝑃precision=\frac{TP}{TP+FP}italic_p italic_r italic_e italic_c italic_i italic_s italic_i italic_o italic_n = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_P end_ARG and recall=TPTP+FN𝑟𝑒𝑐𝑎𝑙𝑙𝑇𝑃𝑇𝑃𝐹𝑁recall=\frac{TP}{TP+FN}italic_r italic_e italic_c italic_a italic_l italic_l = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_N end_ARG, where TP𝑇𝑃TPitalic_T italic_P represents the track splitting events detected within time distance t𝑡titalic_t (t=±1𝑡plus-or-minus1t=\pm 1italic_t = ± 1) of ground truth (GT𝐺𝑇GTitalic_G italic_T) events, FP𝐹𝑃FPitalic_F italic_P denotes the difference between total detected events and TP𝑇𝑃TPitalic_T italic_P events, and FN𝐹𝑁FNitalic_F italic_N indicates the difference between total GT𝐺𝑇GTitalic_G italic_T events and TP𝑇𝑃TPitalic_T italic_P events. Both of these quantitative metrics are computed using a publicly available repository [44].

IV-D Competing Approaches

The proposed cell tracking method DenseTrack has been evaluated against four competing approaches. We selected three recent methods that have demonstrated state-of-the-art performance in Cell Tracking Challenge datasets and have publicly available implementations. One of these methods is called Ultrack, which utilizes ultrametric contours, a hierarchical representation of the image boundaries, for linking detected instances between adjacent frames through a multiple hypotheses-based technique [45]. Another approach, referred to as the GraphOpt approach, is a graph-based cell tracking method where segmented objects are assigned to tracks by solving a model-based graph optimization problem [46]. Additionally, we considered a recent deep learning-based cell tracking approach named GNN, which constructs cell trajectories using a graph neural network [30]. Finally, we compared the proposed method against a biofilm-specific tracking approach [47] known as the NearestNbr tracking method, which performs frame-by-frame association using Euclidean distance of the extracted features.

Refer to caption
(a) t=10sec𝑡10𝑠𝑒𝑐t=10\;secitalic_t = 10 italic_s italic_e italic_c
Refer to caption
(b) t=20sec𝑡20𝑠𝑒𝑐t=20\;secitalic_t = 20 italic_s italic_e italic_c
Refer to caption
(c) t=30sec𝑡30𝑠𝑒𝑐t=30\;secitalic_t = 30 italic_s italic_e italic_c
Refer to caption
(d) t=40sec𝑡40𝑠𝑒𝑐t=40\;secitalic_t = 40 italic_s italic_e italic_c
Refer to caption
(e) t=190sec𝑡190𝑠𝑒𝑐t=190\;secitalic_t = 190 italic_s italic_e italic_c
Refer to caption
(f) t=200sec𝑡200𝑠𝑒𝑐t=200\;secitalic_t = 200 italic_s italic_e italic_c
Refer to caption
(g) t=210sec𝑡210𝑠𝑒𝑐t=210\;secitalic_t = 210 italic_s italic_e italic_c
Refer to caption
(h) t=220sec𝑡220𝑠𝑒𝑐t=220\;secitalic_t = 220 italic_s italic_e italic_c
Refer to caption
(i) t=420sec𝑡420𝑠𝑒𝑐t=420\;secitalic_t = 420 italic_s italic_e italic_c
Refer to caption
(j) t=440sec𝑡440𝑠𝑒𝑐t=440\;secitalic_t = 440 italic_s italic_e italic_c
Refer to caption
(k) t=460sec𝑡460𝑠𝑒𝑐t=460\;secitalic_t = 460 italic_s italic_e italic_c
Refer to caption
(l) t=500sec𝑡500𝑠𝑒𝑐t=500\;secitalic_t = 500 italic_s italic_e italic_c
Figure 2: Qualitative visualization of cell tracking by DenseTrack in a synthetic biofilm sequence with 50 frames captured at 10 seconds frame interval. We demonstrate tracking of three particular cells at several frames in the sequence. Each 3D frame is displayed as a maximum intensity projection along z axis.
Refer to caption
(a)
Refer to caption
(b)
Figure 3: Showing evidence of effective cell division detection over time by the DenseTrack method through (a) space-time plot and (b) volume-over-time plot, demonstrated for the ’blue’ cell in the synthetic sequence in Fig. 2.
TABLE I: Quantitative tracking evaluation on six synthetic biofilm videos
Methods TRA Division-F1
DenseTrack 0.942 ±plus-or-minus\pm± 0.018 0.911 ±plus-or-minus\pm± 0.022
Ultrack [45] 0.919 ±plus-or-minus\pm± 0.021 0.864 ±plus-or-minus\pm± 0.024
GraphOpt [46] 0.915 ±plus-or-minus\pm± 0.019 0.886 ±plus-or-minus\pm± 0.022
NearestNbr [47] 0.840 ±plus-or-minus\pm± 0.022 0.648±plus-or-minus\pm± 0.025
GNN [30] 0.818 ±plus-or-minus\pm± 0.026 0.637 ±plus-or-minus\pm± 0.033
Refer to caption
(a) t=0.5min𝑡0.5𝑚𝑖𝑛t=0.5\;minitalic_t = 0.5 italic_m italic_i italic_n
Refer to caption
(b) t=2.5min𝑡2.5𝑚𝑖𝑛t=2.5\;minitalic_t = 2.5 italic_m italic_i italic_n
Refer to caption
(c) t=5min𝑡5𝑚𝑖𝑛t=5\;minitalic_t = 5 italic_m italic_i italic_n
Refer to caption
(d) t=7.5min𝑡7.5𝑚𝑖𝑛t=7.5\;minitalic_t = 7.5 italic_m italic_i italic_n
Refer to caption
(e) t=10min𝑡10𝑚𝑖𝑛t=10\;minitalic_t = 10 italic_m italic_i italic_n
Refer to caption
(f) t=12.5min𝑡12.5𝑚𝑖𝑛t=12.5\;minitalic_t = 12.5 italic_m italic_i italic_n
Refer to caption
(g) t=13.5min𝑡13.5𝑚𝑖𝑛t=13.5\;minitalic_t = 13.5 italic_m italic_i italic_n
Refer to caption
(h) t=15min𝑡15𝑚𝑖𝑛t=15\;minitalic_t = 15 italic_m italic_i italic_n
Figure 4: Qualitative observation of bacterial cell tracking using the DenseTrack method in a real S. oneidensis biofilm sequence, consisting of 30 frames captured at 30-second intervals. We display the predicted matched instances for a group of randomly selected cells over various time points in the video, each represented by a distinct color.
Refer to caption
(a)
Refer to caption
(b)
Figure 5: Visualizing thirty predicted trajectories of the S. oneidensis sequence obtained from (a) the DenseTrack method and (b) the Ultrack method, in comparison to the corresponding manually labeled ground truth trajectories. The spatial dimension is 244×262×8724426287244\times 262\times 87244 × 262 × 87 voxels in x𝑥xitalic_x-y𝑦yitalic_y-z𝑧zitalic_z. The trajectories depicted in Fig.5a exhibit greater alignment with the ground truth.
Refer to caption
Figure 6: Quantitative tracking evaluation on S. oneidensis video in Fig. 4.
Refer to caption
(a) t=5min𝑡5𝑚𝑖𝑛t=5\;minitalic_t = 5 italic_m italic_i italic_n
Refer to caption
(b) t=20min𝑡20𝑚𝑖𝑛t=20\;minitalic_t = 20 italic_m italic_i italic_n
Refer to caption
(c) t=25min𝑡25𝑚𝑖𝑛t=25\;minitalic_t = 25 italic_m italic_i italic_n
Refer to caption
(d) t=35min𝑡35𝑚𝑖𝑛t=35\;minitalic_t = 35 italic_m italic_i italic_n
Refer to caption
(e) t=45min𝑡45𝑚𝑖𝑛t=45\;minitalic_t = 45 italic_m italic_i italic_n
Refer to caption
(f) t=50min𝑡50𝑚𝑖𝑛t=50\;minitalic_t = 50 italic_m italic_i italic_n
Figure 7: Visualization of cell tracking results obtained using the DenseTrack method in an E. coli biofilm video consisting of 10 frames captured at five-minute intervals. The figure displays the tracked instances for randomly selected cells across different time points in the video, each in a unique color.
Refer to caption
(a)
Refer to caption
(b)
Figure 8: Comparing ten predicted trajectories of the E. coli sequence from (a) the DenseTrack method and (b) the Ultrack method to the corresponding manually labeled ground truth trajectories. The spatial dimension is 458×512×101458512101458\times 512\times 101458 × 512 × 101 voxels in x𝑥xitalic_x-y𝑦yitalic_y-z𝑧zitalic_z. The DenseTrack method shows more overlap with the ground truth.
TABLE II: Quantitative tracking evaluation on an E. coli biofilm video
Methods TRA Division-F1
DenseTrack 0.904 0.877
Ultrack [45] 0.823 0.652
GraphOpt [46] 0.764 0.410
NearestNbr[47] 0.512 0.391
GNN [30] 0.477 0.297
TABLE III: Binary classification accuracy on temporal sequence classification using various classifiers, and using classifier’s confidence scores in a one-to-one matching (OTOM𝑂𝑇𝑂𝑀OTOMitalic_O italic_T italic_O italic_M) optimization
Methods Classifier Classifier+OTOM
InceptionTime [38] 0.964 ±plus-or-minus\pm± 0.007 0.998 ±plus-or-minus\pm± 0.003
TST [48] 0.886 ±plus-or-minus\pm± 0.032 0.940 ±plus-or-minus\pm± 0.013
LSTM-FCN [49] 0.914 ±plus-or-minus\pm± 0.024 0.958 ±plus-or-minus\pm± 0.006
GRU-FCN [50] 0.919 ±plus-or-minus\pm± 0.018 0.952±plus-or-minus\pm± 0.007
Res-CNN [51] 0.804 ±plus-or-minus\pm± 0.031 0.909 ±plus-or-minus\pm± 0.009
Refer to caption
(a)
Refer to caption
(b)
Figure 9: Evidence of exploiting near-temporal history (r=2𝑟2r=2italic_r = 2) in tracking performance, on a (a) synthetic biofilm video, and a (b) real biofilm video of S. oneidensis.

V Results and Discussion

In this section, we present both qualitative and quantitative tracking results obtained from synthetic and real biofilm image sequences. In Fig. 2, we display the tracking results of a synthetic biofilm sequence obtained from the proposed DenseTrack algorithm. The algorithm performs the tracking task for all cell instances. However, for clarity of visual observation in a dense environment, we demonstrate the predicted matched instances of three particular cells over the length of the image sequence, displayed in red, blue, and green. The figure shows that the DenseTrack method can successfully associate the same instance of a cell over consecutive time points, even in such a crowded neighborhood. Furthermore, the cell division events are also accurately detected by the proposed method, which is essential for an effective tracking outcome in such a dataset involving frequent division events.

In Fig. 3, we exhibit additional support for the effectiveness of the proposed method in cell division detection using a space-time plot and a volume-over-time plot. We demonstrate these two plots for the ‘blue’ cell of the displayed sequence in Fig. 2. The space-time plot depicts the x𝑥xitalic_x and y𝑦yitalic_y coordinates of the centroid of the ’blue’ cell and its matched instances over time. The green circle at the bottom represents the cell’s location in the first frame. Pairs of circles in the same color indicate that the tracking algorithm detects two daughter cells in that space and time. The line growing out of the circle signifies the instance’s growth until it divides again. Additionally, the space-time plot effectively demonstrates that the bacterial cell’s division follows a geometric progression, such as 2, 4, 8, 16, and so forth. Besides, we generate the volume-over-time plot as shown in Fig. 3b by considering the volume of only one daughter cell at each division event along the sequence. The sawtooth pattern of the plot ensures that the cell divisions are correctly detected by the tracking method, as the volume increases when the cell grows and decreases as it splits into daughter cells.

In Table I, we report the quantitative tracking performance for six synthetic biofilm image sequences in our dataset. These videos contain an average of 1400 ground truth division events. The comparison of tracking methods is based on the overall tracking accuracy (TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A) and the division-specific accuracy metric (Division-F1). The results indicate that the proposed DenseTrack method outperforms other methods in both performance measures. Additionally, Ultrack and GraphOpt exhibit reasonable performance in tracking bacterial cells within a dense biofilm environment. However, the nearest neighbor-based technique NearestNbr, employing a simplistic Euclidean distance-based frame-by-frame matching, and the graph neural network-based approach GNN, predicting one directed graph for the entire sequence, exhibit lower tracking accuracy in both measures.

Next, we present the visualization of tracked cell instances by DenseTrack method for a real biofilm sequence of S. oneidensis bacteral species in Fig. 4, which was captured at a 30-second frame interval. The matched instances of the same cell over time are displayed in the same color. The figure demonstrates that the proposed method accurately tracks most cell instances. Furthermore, we visualize the predicted trajectories compared to corresponding ground truth trajectories in Fig. 5. Thirty manually generated ground truth trajectories are plotted in x𝑥xitalic_x-y𝑦yitalic_y-z𝑧zitalic_z on top of the estimated trajectories by the tracking algorithm. We compare such trajectory plots from the proposed DenseTrack method and the best-competing method, Ultrack, as shown in Fig. 5a and 5b. Observing the figures, it becomes apparent that predicted trajectories from DenseTrack demonstrate a higher degree of overlap with the ground truth, suggesting superior accuracy compared to the Ultrack method.

In Fig. 6, we further present a comparative analysis of quantitative tracking performance based on the aforementioned thirty ground truth trajectories. Since the S. oneidensis sequence exhibits very few cell division events (only three ground truth division events in the thirty trajectories), we have opted not to separately present the Division-F1 measure and instead focus on reporting the overall tracking score TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A. The figure reveals that, similar to the results obtained from synthetic videos, the proposed DenseTrack approach excels in tracking motile S. oneidensis bacterial cells. While the Ultrack method also demonstrates reasonable performance with approximately 90% tracking accuracy, the GraphOpt, NearestNbr and GNN methods encounter challenges tracking instances within this real biofilm sequence, leading to lower TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A scores.

We then showcase the qualitative tracking results of our proposed approach on an E. coli image sequence in Fig. 7, captured at a larger frame interval of 5 minutes. In this figure, we observe that even in a lower frame-rate video with very frequent division events, the proposed method performs reasonably well in tracking the bacteria cells and their offspring. Furthermore, we offer a qualitative comparison of spatial trajectory plots between the proposed method and the Ultrack method with respect to ten manually generated ground truth trajectories in Fig. 8. While in comparison to the trajectory plot for the S. oneidensis sequence (Fig. 5a), the proposed method exhibits more deviations from ground truth for this E. coli sequence (Fig. 8a), such deviations are still fewer than those obtained from the Ultrack method (Fig. 8b).

In Table II, we present the TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A and Division-F1 scores of the comparative methods based on the ten manually generated trajectories depicted in Fig. 8. These trajectories cover a total of 54 ground truth division events. The table illustrates that our proposed DenseTrack approach achieves superior tracking performance even in this lower frame rate video (5-minute interval). Additionally, it is observed that the performances of the four competing methods deteriorate further, resulting in lower Division-F1 scores compared to their performance on synthetic image sequences in Table I. This decrease in performance in division prediction for the E. coli sequence may be attributed to the presence of complex division events accompanied by orientation changes and spatial displacement in the next frame.

V-A Ablation Study

To comprehend the distinct contributions of various components in our proposed method, we conduct ablation studies and report the results in this section. In Table III, we present quantitative support for our selection of the InceptionTime classifier in the temporal sequence classification task for frame-by-frame association. The first column presents the classification accuracy of different classifiers in distinguishing correct from wrong associations in temporal sequences. We determine classification accuracy based on whether the confidence score for the ‘correct’ class exceeds that of the ‘wrong’ class for a given association. Among the classifiers, InceptionTime demonstrates superior performance. However, classification in the first column may contain errors stemming from incorrect map**s between frames, such as one-to-multiple associations. The second column reveals that integrating the classifier’s confidence scores into one-to-one matching optimization, as in our DenseTrack framework, enhances classification performance across all classifiers. Nevertheless, InceptionTime still achieves the best results. These classification scores are averaged over 30 frame pairs from two synthetic biofilm videos, each with 15 frames randomly selected.

In Fig. 9, we highlight the importance of leveraging near-temporal history in the temporal sequence classification task as part of our proposed tracking approach, rather than solely relying on cellular attributes from the present frame and the next frame. The significance is measured in terms of the overall tracking accuracy measure TRA𝑇𝑅𝐴TRAitalic_T italic_R italic_A. In Section III, we mentioned the use of a spatiotemporal feature vector, 𝒇i,(jki)tem=[𝒇itr,,𝒇it,𝒇jkit+1]superscriptsubscript𝒇𝑖subscript𝑗subscript𝑘𝑖𝑡𝑒𝑚superscriptsubscript𝒇𝑖𝑡𝑟superscriptsubscript𝒇𝑖𝑡superscriptsubscript𝒇subscript𝑗subscript𝑘𝑖𝑡1\boldsymbol{f}_{i,(j_{k_{i}})}^{tem}=[\boldsymbol{f}_{i}^{t-r},...,\boldsymbol% {f}_{i}^{t},\boldsymbol{f}_{j_{k_{i}}}^{t+1}]bold_italic_f start_POSTSUBSCRIPT italic_i , ( italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_e italic_m end_POSTSUPERSCRIPT = [ bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - italic_r end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_italic_f start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ], formed by concatenating the feature vector at time t𝑡titalic_t with the feature vectors from the preceding r𝑟ritalic_r time frames and the feature vector at time t+1𝑡1t+1italic_t + 1. The figure illustrates the effect of using r=2𝑟2r=2italic_r = 2 as in our proposed method versus the effect of using r=0𝑟0r=0italic_r = 0. In Fig. 9a, we observe such a comparison for a synthetic biofilm video, while in Fig. 9b, we observe it for a S. oneidensis video. The figures indicate that utilizing near-temporal history (r=2𝑟2r=2italic_r = 2) improves tracking accuracy for both the synthetic sequence and the real biofilm sequence, with a more pronounced improvement observed in the real biofilm example.

VI Conclusion

This paper introduced a novel cell tracking approach to effectively track cell instances and their offspring in dense 3D time-lapse microscopy image sequences. We formulated the cell tracking problem as a frame-by-frame matching task exploiting a deep temporal sequence classifier’s confidence scores in a one-to-one optimization framework. Utilizing a data-driven deep-learning-based classifier as opposed to a fixed distance or similarity-based measure resulted in better association scores for the potential matches between frame pairs. Additionally, an effective one-to-one matching optimization formulation with proper constraints presented in this work ensures superior performance in associating cell instances within a crowded environment. To detect cell division events with high accuracy, we also proposed an eigendecomposition-based strategy that can identify division events even when daughter instances change orientation and displace spatially during dividing from the parent instance. We demonstrate the effectiveness of the proposed method in tracking bacterial cells from 3D lattice light-sheet image sequences of biofilms. The proposed method achieved better results than the state-of-the-art cell tracking approaches.

Acknowledgments

This work is supported in part by the U.S. National Institute of General Medical Sciences under NIH Grant No. 1R01GM139002. For this study, no ethical approval was required. The authors have no conflicts of interest.

References

  • [1] V. Ulman, M. Maška, K. E. Magnusson, O. Ronneberger, C. Haubold, N. Harder, P. Matula, P. Matula, D. Svoboda, M. Radojevic et al., “An objective comparison of cell-tracking algorithms,” Nature methods, vol. 14, no. 12, pp. 1141–1152, 2017.
  • [2] M. Maška, V. Ulman, D. Svoboda, P. Matula, P. Matula, C. Ederra, A. Urbiola, T. España, S. Venkatesan, D. M. Balak et al., “A benchmark for comparison of cell tracking algorithms,” Bioinformatics, vol. 30, no. 11, pp. 1609–1617, 2014.
  • [3] C. Zimmer, E. Labruyere, V. Meas-Yedid, N. Guillén, and J.-C. Olivo-Marin, “Segmentation and tracking of migrating cells in videomicroscopy with parametric active contours: A tool for cell-based drug testing,” IEEE transactions on medical imaging, vol. 21, no. 10, pp. 1212–1221, 2002.
  • [4] N. Ray, S. T. Acton, and K. Ley, “Tracking leukocytes in vivo with shape and size constrained active contours,” IEEE transactions on medical imaging, vol. 21, no. 10, pp. 1222–1235, 2002.
  • [5] K. Li, E. D. Miller, M. Chen, T. Kanade, L. E. Weiss, and P. G. Campbell, “Cell population tracking and lineage construction with spatiotemporal context,” Medical image analysis, vol. 12, no. 5, pp. 546–566, 2008.
  • [6] O. Dzyubachyk, W. A. Van Cappellen, J. Essers, W. J. Niessen, and E. Meijering, “Advanced level-set-based cell tracking in time-lapse fluorescence microscopy,” IEEE transactions on medical imaging, vol. 29, no. 3, pp. 852–867, 2010.
  • [7] N. N. Kachouie, P. Fieguth, J. Ramunas, E. Jervis et al., “Probabilistic model-based cell tracking,” International Journal of Biomedical Imaging, vol. 2006, 2006.
  • [8] D. H. Rapoport, T. Becker, A. Madany Mamlouk, S. Schicktanz, and C. Kruse, “A novel validation algorithm allows for automated cell tracking and the extraction of biologically meaningful parameters,” PloS one, vol. 6, no. 11, p. e27315, 2011.
  • [9] M. Rempfler, V. Stierle, K. Ditzel, S. Kumar, P. Paulitschke, B. Andres, and B. H. Menze, “Tracing cell lineages in videos of lens-free microscopy,” Medical image analysis, vol. 48, pp. 147–161, 2018.
  • [10] K. E. Magnusson, J. Jaldén, P. M. Gilbert, and H. M. Blau, “Global linking of cell tracks using the viterbi algorithm,” IEEE transactions on medical imaging, vol. 34, no. 4, pp. 911–929, 2014.
  • [11] R. Bise, Z. Yin, and T. Kanade, “Reliable cell tracking by global data association,” in 2011 IEEE international symposium on biomedical imaging: From nano to macro.   IEEE, 2011, pp. 1004–1010.
  • [12] M. A. A. Dewan, M. O. Ahmad, and M. Swamy, “Tracking biological cells in time-lapse microscopy: An adaptive technique combining motion and topological features,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 6, pp. 1637–1647, 2011.
  • [13] F. Boukari and S. Makrogiannis, “Automated cell tracking using motion prediction-based matching and event handling,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 17, no. 3, pp. 959–971, 2018.
  • [14] F. Li, X. Zhou, J. Ma, and S. T. Wong, “Multiple nuclei tracking using integer programming for quantitative cancer cell cycle analysis,” IEEE transactions on medical imaging, vol. 29, no. 1, pp. 96–105, 2009.
  • [15] A. Narayanaswamy, A. Merouane, A. Peixoto, E. Ladi, P. Herzmark, U. Von Andrian, E. Robey, and B. Roysam, “Multi-temporal globally-optimal dense 3-D cell segmentation and tracking from multi-photon time-lapse movies of live tissue microenvironments,” in Spatio-temporal Image Analysis for Longitudinal and Time-Series Image Data: Second International Workshop, STIA 2012, Held in Conjunction with MICCAI 2012, Nice, France, October 1, 2012. Proceedings 2.   Springer, 2012, pp. 147–162.
  • [16] N. Chenouard, I. Smal, F. De Chaumont, M. Maška, I. F. Sbalzarini, Y. Gong, J. Cardinale, C. Carthel, S. Coraluppi, M. Winter et al., “Objective comparison of particle tracking methods,” Nature methods, vol. 11, no. 3, pp. 281–289, 2014.
  • [17] K. Jaqaman, D. Loerke, M. Mettlen, H. Kuwata, S. Grinstein, S. L. Schmid, and G. Danuser, “Robust single-particle tracking in live-cell time-lapse sequences,” Nature methods, vol. 5, no. 8, pp. 695–702, 2008.
  • [18] D. Padfield, J. Rittscher, and B. Roysam, “Coupled minimum-cost flow cell tracking for high-throughput quantitative analysis,” Medical image analysis, vol. 15, no. 4, pp. 650–668, 2011.
  • [19] B. X. Kausler, M. Schiegg, B. Andres, M. Lindner, U. Koethe, H. Leitte, J. Wittbrodt, L. Hufnagel, and F. A. Hamprecht, “A discrete chain graph model for 3d+ t cell tracking with high misdetection robustness,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12.   Springer, 2012, pp. 144–157.
  • [20] M. Liu, Y. Liu, W. Qian, and Y. Wang, “Deepseed local graph matching for densely packed cells tracking,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 18, no. 3, pp. 1060–1069, 2019.
  • [21] M. Schiegg, P. Hanslovsky, C. Haubold, U. Koethe, L. Hufnagel, and F. A. Hamprecht, “Graphical model for joint segmentation and tracking of multiple dividing cells,” Bioinformatics, vol. 31, no. 6, pp. 948–956, 2015.
  • [22] W. J. Godinez and K. Rohr, “Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association,” IEEE transactions on medical imaging, vol. 34, no. 2, pp. 415–432, 2014.
  • [23] S. H. Rezatofighi, S. Gould, R. Hartley, K. Mele, and W. E. Hughes, “Application of the IMM-JPDA filter to multiple target tracking in total internal reflection fluorescence microscopy images,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012: 15th International Conference, Nice, France, October 1-5, 2012, Proceedings, Part I 15.   Springer, 2012, pp. 357–364.
  • [24] N. Chenouard, I. Bloch, and J.-C. Olivo-Marin, “Multiple hypothesis tracking for cluttered biological image sequences,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 11, pp. 2736–3750, 2013.
  • [25] S. Coraluppi and C. Carthel, “Multi-stage multiple-hypothesis tracking.” J. Adv. Inf. Fusion, vol. 6, no. 1, pp. 57–67, 2011.
  • [26] L. Liang, H. Shen, P. De Camilli, and J. S. Duncan, “A novel multiple hypothesis based particle tracking method for clathrin mediated endocytosis analysis using fluorescence microscopy,” IEEE transactions on image processing, vol. 23, no. 4, pp. 1844–1857, 2014.
  • [27] M. Liu, Y. He, Y. Wei, and P. Xiang, “Plant cell tracking using kalman filter based local graph matching,” Image and Vision Computing, vol. 60, pp. 154–161, 2017.
  • [28] L.-L. S. Ong, M. H. Ang, and H. H. Asada, “Tracking of cell population from time lapse and end point confocal microscopy images with multiple hypothesis kalman smoothing filters,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops.   IEEE, 2010, pp. 71–78.
  • [29] R. Spilger, A. Imle, J.-Y. Lee, B. Mueller, O. T. Fackler, R. Bartenschlager, and K. Rohr, “A recurrent neural network for particle tracking in microscopy images using future information, track hypotheses, and multiple detections,” IEEE Transactions on Image Processing, vol. 29, pp. 3681–3694, 2020.
  • [30] T. Ben-Haim and T. R. Raviv, “Graph neural network for cell tracking in microscopy videos,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI.   Springer, 2022, pp. 610–626.
  • [31] J. Hayashida and R. Bise, “Cell tracking with deep learning for cell detection and motion estimation in low-frame-rate,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22.   Springer, 2019, pp. 397–405.
  • [32] J. Wang, X. Su, L. Zhao, and J. Zhang, “Deep reinforcement learning for data association in cell tracking,” Frontiers in Bioengineering and Biotechnology, vol. 8, p. 298, 2020.
  • [33] A. Panteli, D. K. Gupta, N. Bruijn, and E. Gavves, “Siamese tracking of cell behaviour patterns,” in Medical Imaging with Deep Learning.   PMLR, 2020, pp. 570–587.
  • [34] G. Cruciata, L. L. Presti, and M. La Cascia, “On the use of deep reinforcement learning for visual tracking: A survey,” IEEE Access, vol. 9, pp. 120 880–120 900, 2021.
  • [35] M. Ondrašovič and P. Tarábek, “Siamese visual object tracking: A survey,” IEEE Access, vol. 9, pp. 110 149–110 172, 2021.
  • [36] K. Löffler and R. Mikut, “Embedtrack—simultaneous cell segmentation and tracking through learning offsets and clustering bandwidths,” IEEE Access, vol. 10, pp. 77 147–77 157, 2022.
  • [37] T. T. Toma, Y. Wu, J. Wang, A. Srivastava, A. Gahlmann, and S. T. Acton, “Realistic-shape bacterial biofilm simulator for deep learning-based 3D single-cell segmentation,” in 2022 IEEE 19th international symposium on biomedical imaging (ISBI).   IEEE, 2022, pp. 1–5.
  • [38] H. Ismail Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, P.-A. Muller, and F. Petitjean, “InceptionTime: Finding AlexNet for time series classification,” Data Mining and Knowledge Discovery, vol. 34, no. 6, pp. 1936–1962, 2020.
  • [39] M. Zhang, J. Zhang, J. Wang, A. M. Achimovich, A. A. Aziz, J. Corbitt, S. T. Acton, and A. Gahlmann, “3D imaging of single cells in bacterial biofilms using lattice light-sheet microscopy,” Biophysical Journal, vol. 116, no. 3, p. 25a, 2019.
  • [40] I. Oguiza, “tsai - a state-of-the-art deep learning library for time series and sequential data,” Github, 2023. [Online]. Available: https://github.com/timeseriesAI/tsai
  • [41] T. T. Toma, Y. Wang, A. Gahlmann, and S. T. Acton, “Deepseeded: Volumetric segmentation of dense cell populations with a cascade of deep neural networks in bacterial biofilm applications,” Expert Systems with Applications, vol. 238, p. 122094, 2024.
  • [42] P. Matula, M. Maška, D. V. Sorokin, P. Matula, C. Ortiz-de Solórzano, and M. Kozubek, “Cell tracking accuracy measurement based on comparison of acyclic oriented graphs,” PloS one, vol. 10, no. 12, p. e0144959, 2015.
  • [43] K. Ulicna, G. Vallardi, G. Charras, and A. R. Lowe, “Automated deep lineage tree analysis using a bayesian single cell tracking approach,” Frontiers in Computer Science, vol. 3, p. 734559, 2021.
  • [44] Janelia-Trackathon-2023, “Utilities for computing common accuracy metrics on cell tracking challenge solutions with ground truth,” https://github.com/Janelia-Trackathon-2023/traccuracy, 2023.
  • [45] J. Bragantini, M. Lange, and L. Royer, “Large-scale multi-hypotheses cell tracking using ultrametric contours maps,” arXiv preprint arXiv:2308.04526, 2023.
  • [46] K. Löffler, T. Scherr, and R. Mikut, “A graph-based cell tracking algorithm with few manually tunable parameters and automated segmentation error correction,” PloS one, vol. 16, no. 9, p. e0249257, 2021.
  • [47] J. Zhang, Y. Wang, E. D. Donarski, T. T. Toma, M. T. Miles, S. T. Acton, and A. Gahlmann, “BCM3D 2.0: accurate segmentation of single bacterial cells in dense biofilms using computationally generated intermediate image representations,” npj Biofilms and Microbiomes, vol. 8, no. 1, p. 99, 2022.
  • [48] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A Transformer-based framework for multivariate time series representation learning,” in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 2021, pp. 2114–2124.
  • [49] F. Karim, S. Majumdar, H. Darabi, and S. Chen, “LSTM fully convolutional networks for time series classification,” IEEE access, vol. 6, pp. 1662–1669, 2017.
  • [50] N. Elsayed, A. S. Maida, and M. Bayoumi, “Deep gated recurrent and convolutional network hybrid model for univariate time series classification,” arXiv preprint arXiv:1812.07683, 2018.
  • [51] X. Zou, Z. Wang, Q. Li, and W. Sheng, “Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification,” Neurocomputing, vol. 367, pp. 39–45, 2019.