HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: tocloft

Authors: achieve the best HTML results from your LaTeX submissions by selecting from this list of supported packages.

License: CC BY 4.0
arXiv:2312.09871v1 [cs.LG] 15 Dec 2023

ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors

Alexander M. Moore Worcester Polytechnic Institute
Worcester MA, USA
[email protected]
   Randy C. Paffenroth {@IEEEauthorhalign} Ken T. Ngo Worcester Polytechnic Institute
Worcester MA, USA
[email protected]
Chemical/Biological Innovative
Material and Ensemble Development Team
U.S. Army DEVCOM Soldier Center
Natick MA, USA
[email protected]
   Joshua R. Uzarski Chemical/Biological Innovative
Material and Ensemble Development Team
U.S. Army DEVCOM Soldier Center
Natick MA, USA
[email protected]
Abstract

Multivariate time series data are ubiquitous in the application of machine learning to problems in the physical sciences. Chemiresistive sensor arrays are highly promising in chemical detection tasks relevant to industrial, safety, and military applications. Sensor arrays are an inherently multivariate time series data collection tool which demand rapid and accurate classification of arbitrary chemical analytes. Previous research has benchmarked data-agnostic multivariate time series classifiers across diverse multivariate time series supervised tasks in order to find general-purpose classification algorithms. To our knowledge, there has yet to be an effort to survey machine learning and time series classification approaches to chemiresistive hardware sensor arrays for the detection of chemical analytes. In addition to benchmarking existing approaches to multivariate time series classifiers, we incorporate findings from a model survey to propose the novel ChemTime approach to sensor array classification for chemical sensing. We design experiments addressing the unique challenges of hardware sensor arrays classification including the rapid classification ability of classifiers and minimization of inference time while maintaining performance for deployed lightweight hardware sensing devices. We find that ChemTime is uniquely positioned for the chemical sensing task by combining rapid and early classification of time series with beneficial inference and high accuracy.

Index Terms:
Chemical sensors, machine learning, multivariate time series, representation learning

I Introduction

Chemical sensors use an array of chemiresistive sensors to produce characteristic resistance signals. Multivariate time series classifiers may be trained to classify the presence of chemical analytes given training examples containing characteristic responses. Chemical analyte detection is vital to industrial and military applications, and carries unique challenges for the application of multivariate time series classifiers. In order to approach the topic of optimal chemical discrimination for a novel hardware sensor array, we propose a variety of supervised learning experiments with additional modifications to account for the challenges unique to the chemical sensing space.

Our findings on the optimal discrimination of chemical analytes leads us to propose a new multivariate time series classifier designed explicitly for chemical sensing. ChemTime utilizes inductive biases in the data and known chemical structures to better encode time series signals to a meaningful chemistry-informed latent space (Section I-C). We demonstrate that these changes yield a performant model which is significantly faster and more lightweight than comparable models while maintaining a high degree of accuracy (Section II-B).

In order to mitigate the effects of model inductive bias, as well as inductive biases of sets of sensors, we consider an empirical survey of diverse models from the established and modern literature for multivariate time series classification as well as diverse sets of chemiresistive sensor coatings. We benchmark each model on a broad set of chemiresistive sensor arrays and investigate patterns in successful models across multiple hardware designs. In addition we supply experiments and results for a variety of specific tasks relevant to chemical sensing with chemiresistive sensor arrays, including limit of detection studies, analysis of the rapid classification abilities of models, and analysis on the time to train and infer on our suite of models.

I-A Contributions

Our contributions to the application of machine learning to the applied sciences and chemical sensor array classification include the following:

  1. 1.

    A benchmarking of modern competitive multivariate time series classifiers from the literature on eleven real-world chemiresistive sensor datasets for the discrimination of a particular chemical analyte (Section II-B).

  2. 2.

    A novel approach to the classification of multivariate time series for chemiresistive sensors based on an adaptation of transfer learning to molecular representation which improves the efficiency frontier for chemical sensing (Section I-C).

  3. 3.

    Analysis for tasks dependent on the rapid classification of chemical analytes including an efficiency frontier for inference time vs. accuracy across the span of benchmarked models

I-B Chemical Sensing

Chemical sensors measure physical and chemical properties of analytes into measurable signals. Examples of chemical sensors include breathalyzers, carbon monoxide sensors, and electrochemical gas sensors [1]. The detection of particular chemical analytes is highly relevant in civilian safety, manufacturing, and military applications [2, 3, 4]. Chemiresistive sensor devices respond to chemical analytes by reporting changes in resistance through a coated resistor. Chemical analytes interact with sensors at the molecular level by bonding with the sensor coating, called adsorption. Analyte adsorption to the sensor coating causes the resistance through the sensing element to change as a function of the binding affinity between the analyte and surface. The binding affinity between analytes and coatings is affected by their molecular and polymer chemical properties which yields the characteristic response curve for the analyte.

Non-chemiresistive chemical sensors include “infrared and Raman spectroscopy, ion mobility spectrometry, surface acoustic wave sensors,“ and other on-site testing technologies requiring complex instrumentation, high cost, and expert operating personnel [2]. These expenses and complications limit the utility of a deployed tool, and increase the cost and time requirements of gathering experimental data for hardware development and machine learning training. More portable sensors may be limited by responsiveness to select analytes and perform may perform poorly in the presence of obscurant chemicals [5, 2, 6]. Contemporary sensor arrays including those used to collect the experimental data discussed here address the challenges of rapid discrimination of multiple target analytes with low-cost, low-power, miniaturized sensors capable of analyte detection despite obscurants with an array of semi-selective chemical sensors that respond to many analytes simultaneously [2].

We propose analyses of machine and deep learning classifiers trained and tested on real-world chemical sensor resistance data from 8-sensor chemiresistive arrays with chemically diverse coatings to maximize analyte discriminability as in [7, 2, 3].

The unique coatings on the chemiresistive sensor cause lead to characteristic resistances of analytes which facilitate discrimination of the gas exposures. Figure 1 shows an example of the contrast in sensor responses to 17.5%percent17.517.5\%17.5 % Analyte A and 17.5%percent17.517.5\%17.5 % Analyte B vapors given the same set of eight sensors with bespoke chemiresistive coatings. From the 8-channel characteristic signal machine learning models learn decision patterns for generalization to unseen testing samples.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Two five second single-analyte exposures to different analytes at the same concentration. Discrepancies in sensor resistance are explained by adsorption interactions between analytes and sensor coatings.

I-C ChemTime

ChemVise [4] introduced a molecular-semantic latent space styled on combined natural image-natural language spaces as in DeVise [8] and DALL-E [9] for improved classification of out-of-distribution chemical analytes. Though ChemVise demonstrated how a chemistry-informed latent space using a pretrained molecular target embedding model improved classification outcomes over baselines, it failed to leverage inherent time series structure in the data.

ChemTime modifies the ChemVise approach by replacing the tabular-data embedding model with an iterative time series embedder with a moving-target approach to signal embedding. At each time step, the iterative encoder model uses a recurrent neural network backbone to encode the resistance signals of the input. At each iteration, a linear projection layer maps the state vector to a point in the molecular embedding space. The model loss is given by the sum of losses over the sequence, 𝕃=𝕃t𝕃subscript𝕃𝑡\mathbb{L}=\sum\mathbb{L}_{t}blackboard_L = ∑ blackboard_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT where 𝕃tsubscript𝕃𝑡\mathbb{L}_{t}blackboard_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT may be given by an embedding distance such as a cosine distance, euclidean distance (MSE Loss), or bespoke representation loss as in DeViSE [8].

ChemTime has multiple benefits over the predecessor architecture. Utilizing sequences of representations in a meaningful latent space yields improved classification outcomes, inference during testing, and earlier classification compared to the fixed-window approach. Results sections will demonstrate the benefits of a light-weight approach for rapid classification, and the benefits of inference and analysis when sequences of representations may be used for meta-classification in the chemistry-informed latent.

1# Require:
2# chemception_enc: Chemception with final linear layer removed
3# RNN: One-layer iterative neural network
4# lm: Linear projection from RNN state to target representation dimension
5# chem_start: time index where analyte exposure begins
6# criterion: Euclidian distance, cosine distance, DEViSE distance
7
8for signal, label in train_loader: # load a batch of labelled sensor samples
9    # Encode training labels to analyte-space. Before analyte flux begins, label should be an innate gas
10    y_rep_seq = [chemception_enc(label) if i > chem_start else chemception_enc(’Nitrogen’) for i in signal.sequence_length]
11
12    # Iterate over sequence with RNN
13    x_seq = RNN(x)
14
15    # Project RNN state sequence to representation space
16    y_hat_seq = [lm(x) for x in x_seq]
17
18    # Classification metric to target embeddings
19    loss = criterion(y_hat_seq, y_rep_seq)
20
21# Downstream classifier on embedded signals
22# Use final output of projection for boosted input
23emb_xtrain = lm(RNN((x_train))[-1]
24opt_model = SVC.train(emb_xtrain, ytrain)
25
26# Use the classifier to predict final representation of testing signals
27emb_xtest = lm(RNN(x_test))[-1]
28emb_ytest_hats = opt_model.predict(emb_xtest)
29test_score = classification_metric(emb_ytest_hats, ytest)

I-D Implementation Details

Label Sequence Generation

ChemTime requires rephrasing a chemical exposure label into a sequence of targets in a chemistry representation space. Data are provided with labels corresponding to the concentration of the analyte exposure. For example, an exposure of 17% Analyte B will be labelled [0,17,0,0]. We build a sequence of targets by first getting the representation of Analyte B as the activations of Analyte B in a chemistry representation model. For the purposes of ChemTime, we utilize Chemception, though any molecular representation could be used [10]. This representation is then unrolled into a sequence by concatenating a ’None’ representation (the activations of the inert gas Nitrogen under the same molecular representation model) for each time step in which the flux of analyte vapor has not begun, and the representation at all subsequent time steps.

RNN and Linear Projection

ChemTime uses a two-stage projection to iterate over a resistance signal and map to the chemistry-informed latent space.

Boosting

Subsequent to training the ChemTime sequence embedding model, a tabular machine learning model may be fit on the final representations in the sequence. This is to classify the samples, which can be done with a naive nearest-target approach, or with a learned decision boundary. ChemTime results discussed here use a simple SVC with a binary decision discriminating Analyte A samples from samples containing Analytes B, C, and D.

I-E Time Series Classification

Time series data are sequences of features with some inherent ordering, such as stock closing prices for each day of a year. Time series classification is highly relevant in applications across all domains in the physical sciences, manufacturing industries, technology, and more [11, 12, 13, 14, 15, 16]. From sensors [3], video analysis [13], computer network traffic analysis [11], biomedical monitoring [12], manufacturing [14], and airline industry efficiency [15]. The structural ordering of the features affects the discriminability of samples in the same manner as neighboring pixels in a natural image determine meaning to a human viewer [17]. This ordering does not need to be through time: in the physical sciences, features may be ordered by frequencies, magnitudes, any ordered axis.

One univariate time series observation s is a sequence of ordered pairs (timestamp,value)𝑡𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝𝑣𝑎𝑙𝑢𝑒(timestamp,value)( italic_t italic_i italic_m italic_e italic_s italic_t italic_a italic_m italic_p , italic_v italic_a italic_l italic_u italic_e ) [18]. Multivariate time series (MTS) extends time series data to contain multiple features at each timestep of a signal, where the time series is a list of vectors over d𝑑ditalic_d dimensions and n𝑛nitalic_n observations. Data set 𝕏𝕏\mathbb{X}blackboard_X is given by observations <X1,,Xn><X_{1},...,X_{n}>< italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT >. The t𝑡titalic_t-th time index of the i𝑖iitalic_i-th sample of dimension k𝑘kitalic_k is the scalar xi,t,ksubscript𝑥𝑖𝑡𝑘x_{i,t,k}italic_x start_POSTSUBSCRIPT italic_i , italic_t , italic_k end_POSTSUBSCRIPT [19]. Rapid classification of multivariate time series will call upon subsequences of x𝑥\vec{x}over→ start_ARG italic_x end_ARG. The length-l prefix subsequences of sample Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are Xi[:,1,t]subscript𝑋𝑖:1𝑡X_{i}[:,1,t]italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ : , 1 , italic_t ], or the first t𝑡titalic_t time observations of sample Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all features. Subsequences are sometimes called shapelets in the literature and are used as discriminative sub-elements of signals with distance metrics [20].

II Study of MTSC Classifiers

We investigate a broad array of approaches to multivariate chemiresistive time series classification including at least one representative from all popular algorithm types to generate an up-to-date and competitive pool of models [19, 21]. We propose a variety of experiments in which the inductive biases and classification capabilities of each model are compared over a large swathe of real-world chemical sensing data sets, with additional analyses that compare model performance for challenges unique to the chemical sensing domain. Section II-B compares the supervised classification performance of the classifiers across a variety of chemical sensing data sets in order to compare typical model performance by factoring out data biases. Section II-C and II-D emphasize the importance of the rapid classification at testing time for chemical analyte detection, which is vital for a variety of health and safety devices. Relative model performances are weighted with their time to train and infer in Section II-D. Additional results in Section II-E investigate confidence and discriminability of out-of-distribution analytes as a benefit of ChemTime.

II-A Included Models

Model Type Application Source
FullyConvolutionalNetwork Conv. Deep Learning Multivariate [22]
ConvolutionalNeuralNetwork Conv. Deep Learning Multivariate [22]
ROCKET Random kernel transform Multivariate [23]
Catch22 Feature extraction transform Multivariate [24]
RandomInterval Concatenates random intervals Multivariate [25]
CanonicalIntervalForest Catch22 on random intervals Multivariate [26]
HIVECOTEV2 Hierarchical transform ensemble Multivariate [27]
ShapeletTransform Rotation Forest on shapelet transform Multivariate [20]
WEASELMUSE Bag-of-patterns for intervals Multivariate [28]
BOSSEnsemble Bag of Symbolic Fourier Symbols Univariate [29]
MatrixProfile Unified motifs and shapelets Univariate [30]
ContinuousIntervalTree Information based decision tree Univariate [31]
RotationForest Forest on random PCA transforms Univariate [32]
MultiLayerPerceptron FCN Deep Learning Non-temporal [33]
SVC Support vector classifier Non-temporal [34]
KNeighbors K-neighbors supervised clustering Non-temporal [35]
GaussianProcess Gaussian process classification Non-temporal [36]
DecisionTree Information-splitting tree Non-temporal [37]
RandomForest Forest of decision trees Non-temporal [38]
AdaBoost Boosting ensemble Non-temporal [39]
GaussianNB Gaussian Naive Bayes classifier Non-temporal [40]
QuadraticDiscriminantAnalysis Quadratic classifier by Bayes’ rule Non-temporal [41]
TABLE I: Competitive classifiers for chemiresistive sensor array classification, a brief description, and their origin.

A diverse cast of models representing multiple approaches to univariate and multivariate time series classification are drawn for our chemical sensing survey. Each is given a brief description in Table I. Univariate time series models are adapted to the multivariate time series classification paradigm with each of two algorithms detailed in Section A-A. Non-temporal machine learning models are adapted using a tabularization of the multivariate time series data with column concatenation described in Section A-A. We defer to [19] for algorithm details of contemporary classifiers, and their respective publications for exact implementation.

II-B Classification Study

In order to provide optimal model classes to chemiresistive sensor researchers for multiple analyte discrimination, we must evaluate the effectiveness of multivariate time series classifiers for general chemiresistive sensor hardware. We estimate classifier performance for a general chemiresistive sensor array by studying eleven different sensor configurations, each with unique surface chemistries. Model performance on these eleven distinct data sets demonstrates the efficacy of each classifier family for the broader domain of chemiresistive sensor array classification.

We study the supervised performance of classifier families with the following process. Each classifier is trained four times on each of eleven real-world chemiresistive sensor array data sets. Each of these four training iterations uses 75%percent7575\%75 % split on the training data where a 25% fold is removed from the training corpus for that classifier split. Then each of these split models is validated on a holdout set of testing data corresponding to that experiment which does not include training split samples nor withheld split samples. This is performed for each model described in Table I, ChemTime and 11 chemical sensing data sets described in Section I-B.

As in the earlier empirical survey The Great Multivariate Time Series Classification Bake Off [19], each model uses a parameterization taken from K-Fold hyperparameterization on a domain-diverse set of classification tasks as provided in the  sktime repository [25]. An empirical study of multivariate time series classifiers must consider the balance of computation, time to train and infer, and ultimate model performance. For the sake of this study, each model is trained and validated once under default hyperparameterization.

Refer to caption
Figure 2: Average model performances across 44 splits of 11 chemiresistive sensor array data sets.
Refer to caption
Figure 3: Classification performance frontier versus time of inference. Longer inference times may lead to improved classification outcomes at the cost of a delayed classification decision.
Refer to caption
Figure 4: Wall-clock time in seconds to test each model on a testing set of 32 holdout trials.

Figure 2 visualizes the mean performance of models across all eleven experimental data sets, with four splits of each training data set. Some models fail to converge for many data sets and have been removed from subsequent analyses (Matrix Profile with Column Ensembles, BOSS Ensemble with Column Ensembles, and Quadratic Discriminant Analysis). Our hypotheses on the chemiresistive data structure have previously assumed shapelets or time war** KNN would be the most successful models. Though these are effective, random transformations with linear classifiers (ROCKET) as well as complex feature ensembling (HIVECOTE) outperform a variety of shapelet-based approaches on average.

Refer to caption
Figure 5: Critical difference plot demonstrating cliques of top 10 models by rank distribution. Clique bars indicate statistical uncertainty of difference in average rank of clique members based on performance across sensor data sets.

Figure 5 shows the critical difference between the top 10 model average ranks over 44 splits of the 11 sensor array training sets. The horizontal bands link a clique of models with statistically insignificant difference of average rank given performance across all splits. In the top-performing clique we find models from each of the primary families of time series classifiers - random kernels, feature ensembles, shapelets, decision trees, and deep learning [16].

II-C Rapid Classification

Refer to caption
Figure 6: F1 scores of top-10 models trained on incrementally decreasing exposure times. Models which fall below an F1 score of 0.8 on the testing set for an exposure time are eliminated.
Refer to caption
Figure 7: Survival plot biased by the inference times factored into the length of exposure. When accounting for the processing time of the model as part of the length of exposure, the improvement of ChemTime is remarkable. Furthermore, ChemTime benefits from real-time signal processing which many others do not.

Research in the rapid detection of harmful chemical analytes often incentives rapid classification for improved safety devices. Here we propose a contest in the rapid classification of chemical Analyte A in increasingly challenging foreshortened signals. This contest begins by comparing classifier testing set scores for 5-second resistance curve exposures. For each iteration of the test, models with an average F1 score below 0.8 are eliminated, and the length of the exposure window is reduced by 0.25 seconds. The experiment is then repeated until no models remain. Figure 6 visualizes the results of our experiment in the survival of models in retaining high classification accuracy during increasingly challenging rounds of chemical sensing by decreased exposure windows.

An early classifier C is serial if 𝐂[s[1,l0])=𝐂(s[1,l0+i])\textbf{C}[s[1,l_{0}])=\textbf{C}(s[1,l_{0}+i])C [ italic_s [ 1 , italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ) = C ( italic_s [ 1 , italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_i ] ) for any i>0𝑖0i>0italic_i > 0 - that is, the classifier C does not benefit from longer prefixes of the data and will not change the estimate with further steps [18]. Investigating Figure 6 in reverse, we find the best classifiers to be serial given exposure times of around 3.25 to 3.5 seconds, and do not benefit from further exposure to signals. Classifiers which remain serial to very brief exposure windows include ROCKET, Catch22, HIVECOTEV2, and RandomInverval classifiers. These represent a surprisingly diverse mix of shapelet-based approaches, model ensembles, and transformation classifiers, and demonstrates that shapelet-based approaches though inductively sound for the data may struggle with variance in sensor responses and generalizability.

II-D Training and Inference Time

To account for discrepancies in training and validation time, we quantify the trade-off between training and performance in order to account for the benefits of hyperparameter tuning for the full empirical study in Figure 11.

Figure 4 compares the time to test each model. Outlaying models include the MatrixProfile-Concat, HIVECOTEV2, and FullyConvolutionalNetwork. The MatrixProfile-Concat training time demonstrates one shortcoming of the column concatenation technique of univariate classifier adaptation which is the significant increase in compute for dimensionality compared to the MatrixProfile-Ensemble. In addition the FullyConvolutionalNetwork would train significantly faster on a graphical processing unit, but specialty hardware may not be available for a real-world deployed tool with limited processing and power requirements.

An inference time plot also tells a very important story. When we deploy a hardware chemical sensing device, the amount of compute may be very limited. The time to process the inference of a sample would need to be factored into the rapid classification scores as the processing of the testing sample would delay reporting the classification result, resulting in wasted time in a safety situation.

To weigh the importance of time to train and test against accurate predictions, we visualize the spread of model testing time and testing performance in Figure 3.There exist extreme outliers in the time to train and test (Appendix full plot) with significant competition from models with slightly lower accuracy but orders of magnitude faster time to train.

Figure 3 investigates the trade-off between inference time and testing set performance. As indicated in Section II-C, the rapid detection of chemical analytes is vital to multiple applications. By investigating the distribution of scores against the model inference times we identify the existence of a frontier in the trade off between classification time and performance. Some models lay along this frontier, but some are inferior in both metrics. Depending on the urgency of the classification, any model along the frontier may be selected when a combination of rapid classification and accuracy are appropriate for the sensing task. In real-world chemical sensing edge devices, rapid classification may be vital.

II-E Semantic Inference

ChemTime has the unique benefit of exploiting domain knowledge of molecular semantic properties. Translating from chemical resistance space to a meaningful semantic space implies improved inferential possibilities including early and rapid classification as well as out-of-distribution classification. For this reason we investigate possible advantages of the ChemTime approach for chemical sensing.

Refer to caption
(a)
Refer to caption
(b)
Figure 8: Iterating through time updates the positions of sample embeddings at every time step. Animated visual available online.

Figure 8 demonstrates how ChemTime iterates over a signal to update the molecular representation in the chemistry latent space. A simple learned projection maps from the internal state of the RNN to the chemistry latent and updates at each resistance timestep. The relative distances between sample labels in the chemistry-informed latent yields further inference on the sequence of representations at testing time.

Figure 9 demonstrates how a sequence of representations at inference time yields additional benefits over a typical black-box classifier. Early prediction and confidence inference are implicit to the design of ChemTime in which the boosting classifier yields a distance from the decision boundary. While this distance is inherent to simple classifiers such as Support Vector Machines, the two-stage classifier of a learned representation alongside the secondary classifier allows inference as well as a high performance, evidenced by the results in Section II-B.

Refer to caption
Figure 9: Distances through time from boosting classifier decision boundaries give inference over samples. A validation set may determine the optimal early classification window for testing set samples based on embedding trajectory.

II-F Discussion

Real-world utility of machine learning for the physical sciences extends past chemiresistive sensor arrays to countless applications. Addressing the inductive biases of a variety of models is particularly motivating when we also have access to a variety of experimental protocols and sensors, which themselves make assumptions about the nature of discrimination. By replicating the experimental design of Ruiz [19], we find that as a trend models which perform well on the diverse UEA archive [42] of multivariate time series classification data sets also perform well on our chemical sensing task. Empirical survey benchmark results fail to find a clear pattern for successful classifier algorithms which differentiate the multivariate sensor array task from a generic multivariate time series classification task. Specialized hardware for the detection of particular chemical analytes for specific industry or safety applications may alter this pattern to highlight the effectiveness of alternative methods.

We find that a ChemTime model with biases designed for the task improves the efficiency for the inference-accuracy trade off identified in Figure 11. The inductive biases of the ChemTime architecture and loss combined with the inference of a boosting classifier yield a performant and extremely rapid classifier with unique advantages against the field in terms of adaptability and flexibility to a field of additional techniques in inference and analysis of representation sequences.

Furthermore, we find an efficient frontier balancing the time it takes for models to perform inference against the supervised learning performance of those classifiers. We determine a similar pattern exists in the training and hyperparameter optimization times for these same classifiers. Finally, we discuss extensions of classifiers to the rapid analyte classification task, and outline how many of the contemporary approaches to time series classification are inappropriate for rapid classification or incompatible with early classification, a relevant subdomain of sensor classification.

III Acknowledgements

This manuscript has been authored with funding provided by the Defense Threat Reduction Agency (DTRA). The publisher acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes.

Approved for public release. Distribution unlimited.

Appendix A Appendix

Appendix one here. Additional results including training time for hyperparameter sweeps, univariate vs. multivariate conversion. Additional images can go here - analysis and conclusions leave in text.

Refer to caption
Figure 10: Average sensor array performance given by the mean of all model performances over four training splits. Outlying models removed.

In addition to investigating the average performance of models across a variety of domain datasets, we are in a unique position in develo** an effective chemical sensing tool at the hardware level. In addition to optimization of model selection across datasets, we seek an effective dataset which leads to successful analyte detection models at testing time. One approach to finding an optimal sensor array set is to compare the average performance of diverse models learning outcomes on the dataset (Figure 10). The corresponding testing accuracy of a model trained on a particular set gives a sense of the predictability and discriminability of the sensors, while “factoring out“ the inductive biases of each model by considering a diverse set of classifiers.

Refer to caption
Figure 11: Full time to infer frontier.

A-A Univariate Adaptation Algorithms

Refer to caption
Figure 12: Two approaches to multivariate adaptation of univariate time series classifiers. Above the line, the column concatenation approach outperforms the column ensembling approach for the dataset and model.

Univariate time series classification models are a well-researched literature and may be adapted to multivariate time series classification tasks with two alterations to the training algorithm of arbitrary univariate classifiers. These adaptations include the following:

  1. 1.

    Column Concatenation: given training data array 𝔻𝔻\mathbb{D}blackboard_D of n𝑛nitalic_n samples of k𝑘kitalic_k dimensions of length t𝑡titalic_t (n,k,t)𝑛𝑘𝑡(n,k,t)( italic_n , italic_k , italic_t ) and classifier C𝐶Citalic_C, reshape the data by concatenating dimension disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the last element of dimension di1subscript𝑑𝑖1d_{i-1}italic_d start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, yielding array of shape (n,k*t)𝑛𝑘𝑡(n,k*t)( italic_n , italic_k * italic_t ):

    (n,k,t)𝑛𝑘𝑡absent\displaystyle(n,k,t)\rightarrow( italic_n , italic_k , italic_t ) → (1)
    (n1,1,,n1,t,,n2,t,nk,1,,nk,t)subscript𝑛11subscript𝑛1𝑡subscript𝑛2𝑡subscript𝑛𝑘1subscript𝑛𝑘𝑡\displaystyle(n_{1,1},...,n_{1,t},...,n_{2,t},...n_{k,1},...,n_{k,t})( italic_n start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT , … italic_n start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT ) (2)

    Then train classifier C𝐶Citalic_C on resulting array 𝔻*superscript𝔻\mathbb{D}^{*}blackboard_D start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of shape (n,k*t)𝑛𝑘𝑡(n,k*t)( italic_n , italic_k * italic_t )

  2. 2.

    Column Ensembling: given training data array D𝐷Ditalic_D of n𝑛nitalic_n samples of k𝑘kitalic_k dimensions of length t𝑡titalic_t (n,k,t)𝑛𝑘𝑡(n,k,t)( italic_n , italic_k , italic_t ) and classifier C𝐶Citalic_C, define k𝑘kitalic_k training subarrays (n,d1,t),(n,dk,t)𝑛subscript𝑑1𝑡𝑛subscript𝑑𝑘𝑡(n,d_{1},t),(n,d_{k},t)( italic_n , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) , ( italic_n , italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_t ) for each dimension d𝑑ditalic_d of k𝑘kitalic_k.

    Train classifier Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on each resulting array 𝔻i=(n,di,k)subscript𝔻𝑖𝑛subscript𝑑𝑖𝑘\mathbb{D}_{i}=(n,d_{i},k)blackboard_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_n , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_k ), and use ensemble voting to classify the sample [43].

These approaches have the advantage of relying on a rich literature of univariate time series classification, but each has a substantial downside. Column concatenation (1) substantially increase the dimension of the feature space, which is particularly disadvantageous for decomposition classifiers such as matrix profile classifiers as well as increasing the effect of the curse of dimensionality for high dimensional spaces. In addition this removes the option to early-classify samples [18, 44] and is incompatibility with state-based approaches such as an RNN. Second, the column ensembling approach (2) may train multiple classifiers which each fail to accurately predict the totality of the sample given one channel of information, as we assume with our chemical sensing data in which the diversity in sensor coating affinities yields discriminability. Individual sensors do not contain enough information for a classifier to make a reasonable prediction.

References

  • [1] M. Rana, P. Chowdhury et al., “Modern applications of quantum dots: Environmentally hazardous metal ion sensing and medical imaging,” in Handbook of Nanomaterials for Sensing Applications.   Elsevier, 2021, pp. 465–503.
  • [2] M. S. Wiederoder, E. C. Nallon, M. Weiss, S. K. McGraw, V. P. Schnee, C. J. Bright, M. P. Polcha, R. Paffenroth, and J. R. Uzarski, “Graphene nanoplatelet-polymer chemiresistive sensor arrays for the detection and discrimination of chemical warfare agent simulants,” ACS sensors, vol. 2, no. 11, pp. 1669–1678, 2017.
  • [3] M. Weiss, M. S. Wiederoder, R. C. Paffenroth, E. C. Nallon, C. J. Bright, V. P. Schnee, S. McGraw, M. Polcha, and J. R. Uzarski, “Applications of the kalman filter to chemical sensors for downstream machine learning,” IEEE Sensors Journal, vol. 18, no. 13, pp. 5455–5463, 2018.
  • [4] A. M. Moore, R. C. Paffenroth, K. T. Ngo, and J. R. Uzarski, “Chemvise: Maximizing out-of-distribution chemical detection with the novel application of zero-shot learning,” arXiv preprint arXiv:2302.04917, 2023.
  • [5] E. J. Pacsial-Ong and Z. P. Aguilar, “Chemical warfare agent detection: a review of current trends and future perspective,” Frontiers in Bioscience-Scholar, vol. 5, no. 2, pp. 516–543, 2013.
  • [6] A. M. Moore, R. C. Paffenroth, K. T. Ngo, and J. R. Uzarski, “Acgans improve chemical sensors for challenging distributions,” International Conference on Machine Learning and Applications 978-1-6654-6283-9/22 ©2022 IEEE DOI 10.1109/ICMLA55696.2022.00047, 2022.
  • [7] E. C. Nallon, V. P. Schnee, C. J. Bright, M. P. Polcha, and Q. Li, “Discrimination enhancement with transient feature analysis of a graphene chemical sensor,” Analytical chemistry, vol. 88, no. 2, pp. 1401–1406, 2016.
  • [8] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. A. Ranzato, and T. Mikolov, “Devise: A deep visual-semantic embedding model,” in Advances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26.   Curran Associates, Inc., 2013. [Online]. Available: https://proceedings.neurips.cc/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf
  • [9] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” 2021. [Online]. Available: https://arxiv.longhoe.net/abs/2102.12092
  • [10] G. B. Goh, C. Siegel, A. Vishnu, N. O. Hodas, and N. Baker, “Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed qsar/qspr models,” 2017. [Online]. Available: https://arxiv.longhoe.net/abs/1706.06689
  • [11] R. Madan and P. S. Mangipudi, “Predicting computer network traffic: A time series forecasting approach using dwt, arima and rnn,” in 2018 Eleventh International Conference on Contemporary Computing (IC3), 2018, pp. 1–5.
  • [12] J. Lin, E. Keogh, A. Fu, and H. Van Herle, “Approximations to magic: Finding unusual medical time series,” in 18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05).   IEEE, 2005, pp. 329–334.
  • [13] P. Jönsson and L. Eklundh, “Timesat—a program for analyzing time-series of satellite sensor data,” Computers & geosciences, vol. 30, no. 8, pp. 833–845, 2004.
  • [14] L. Li, Q. Chang, G. Xiao, and S. Ambani, “Throughput bottleneck prediction of manufacturing systems using time series analysis,” Journal of Manufacturing Science and Engineering, vol. 133, no. 2, 2011.
  • [15] I. M. Semenick Alam and R. C. Sickles, “Time series analysis of deregulatory dynamics and technical efficiency: the case of the us airline industry,” International Economic Review, vol. 41, no. 1, pp. 203–218, 2000.
  • [16] A. Gupta, H. P. Gupta, B. Biswas, and T. Dutta, “Approaches and applications of early classification of time series: A review,” IEEE Transactions on Artificial Intelligence, vol. 1, no. 1, pp. 47–61, 2020.
  • [17] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data mining and knowledge discovery, vol. 31, no. 3, pp. 606–660, 2017.
  • [18] Z. Xing, J. Pei, and P. S. Yu, “Early classification on time series,” Knowledge and information systems, vol. 31, no. 1, pp. 105–127, 2012.
  • [19] A. P. Ruiz, M. Flynn, J. Large, M. Middlehurst, and A. Bagnall, “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Mining and Knowledge Discovery, vol. 35, no. 2, pp. 401–449, 2021.
  • [20] A. Bostrom and A. Bagnall, “Binary shapelet transform for multiclass time series classification,” in Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII.   Springer, 2017, pp. 24–46.
  • [21] G. Ottervanger, M. Baratchi, and H. H. Hoos, “Multietsc: automated machine learning for early time series classification,” Data Mining and Knowledge Discovery, vol. 35, no. 6, pp. 2602–2654, 2021.
  • [22] B. Zhao, H. Lu, S. Chen, J. Liu, and D. Wu, “Convolutional neural networks for time series classification,” Journal of Systems Engineering and Electronics, vol. 28, no. 1, pp. 162–169, 2017.
  • [23] A. Dempster, F. Petitjean, and G. I. Webb, “Rocket: exceptionally fast and accurate time series classification using random convolutional kernels,” Data Mining and Knowledge Discovery, vol. 34, no. 5, pp. 1454–1495, 2020.
  • [24] C. H. Lubba, S. S. Sethi, P. Knaute, S. R. Schultz, B. D. Fulcher, and N. S. Jones, “catch22: Canonical time-series characteristics,” Data Mining and Knowledge Discovery, vol. 33, no. 6, pp. 1821–1852, 2019.
  • [25] M. Löning, A. Bagnall, S. Ganesh, V. Kazakov, J. Lines, and F. J. Király, “sktime: A unified interface for machine learning with time series,” arXiv preprint arXiv:1909.07872, 2019.
  • [26] M. Middlehurst, J. Large, and A. Bagnall, “The canonical interval forest (cif) classifier for time series classification,” in 2020 IEEE international conference on big data (big data).   IEEE, 2020, pp. 188–195.
  • [27] M. Middlehurst, J. Large, M. Flynn, J. Lines, A. Bostrom, and A. Bagnall, “Hive-cote 2.0: a new meta ensemble for time series classification,” Machine Learning, vol. 110, no. 11, pp. 3211–3243, 2021.
  • [28] P. Schäfer and U. Leser, “Multivariate time series classification with weasel+ muse,” arXiv preprint arXiv:1711.11343, 2017.
  • [29] P. Schäfer, “The boss is concerned with time series classification in the presence of noise,” Data Mining and Knowledge Discovery, vol. 29, no. 6, pp. 1505–1530, 2015.
  • [30] C.-C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, Z. Zimmerman, D. F. Silva, A. Mueen, and E. Keogh, “Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile,” Data Mining and Knowledge Discovery, vol. 32, no. 1, pp. 83–123, 2018.
  • [31] H. Deng, G. Runger, E. Tuv, and M. Vladimir, “A time series forest for classification and feature extraction,” Information Sciences, vol. 239, pp. 142–153, 2013.
  • [32] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619–1630, 2006.
  • [33] Z. Wang, W. Yan, and T. Oates, “Time series classification from scratch with deep neural networks: A strong baseline,” in 2017 International joint conference on neural networks (IJCNN).   IEEE, 2017, pp. 1578–1585.
  • [34] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
  • [35] E. Fix and J. L. Hodges, “Discriminatory analysis. nonparametric discrimination: Consistency properties,” International Statistical Review/Revue Internationale de Statistique, vol. 57, no. 3, pp. 238–247, 1989.
  • [36] M. Seeger, “Gaussian processes for machine learning,” International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004.
  • [37] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106, 1986.
  • [38] T. K. Ho, “Random decision forests,” in Proceedings of 3rd international conference on document analysis and recognition, vol. 1.   IEEE, 1995, pp. 278–282.
  • [39] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997.
  • [40] T. F. Chan, G. H. Golub, and R. J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances,” in COMPSTAT 1982 5th Symposium held at Toulouse 1982.   Springer, 1982, pp. 30–41.
  • [41] C. R. Rao, “Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation,” in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 44.   Cambridge University Press, 1948, pp. 50–57.
  • [42] A. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, A. Bostrom, P. Southam, and E. Keogh, “The uea multivariate time series classification archive, 2018,” 2018. [Online]. Available: https://arxiv.longhoe.net/abs/1811.00075
  • [43] R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and systems magazine, vol. 6, no. 3, pp. 21–45, 2006.
  • [44] T. Hartvigsen, C. Sen, X. Kong, and E. Rundensteiner, “Adaptive-halting policy network for early classification,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 101–110.