Testing Topological Data Analysis for Condition Monitoring of Wind Turbines

Simone Casolo\authorNumber1    Alexander Johannes Stasik\authorNumber2    Zhenyou Zhang\authorNumber3    and Signe Riemer-Sørensen\authorNumber4 1 [email protected] 2, 4 [email protected] [email protected] 3 [email protected] Cognite AS, Oslo, Norway Sintef Digital, Oslo, Norway
ANEO AS, Trondheim, Norway
Abstract

We present an investigation of how topological data analysis (TDA) can be applied to condition-based monitoring (CBM) of wind turbines for energy generation.
TDA is a branch of data analysis focusing on extracting meaningful information from complex datasets by analyzing their structure in state space and computing their underlying topological features. By representing data in a high-dimensional state space, TDA enables the identification of patterns, anomalies, and trends in the data that may not be apparent through traditional signal processing methods.
For this study, wind turbine data was acquired from a wind park in Norway via standard vibration sensors at different locations of the turbine’s gearbox. Both the vibration acceleration data and its frequency spectra were recorded at infrequent intervals for a few seconds at high frequency and failure events were labelled as either gear-tooth or ball-bearing failures. The data processing and analysis are based on a pipeline where the time series data is first split into intervals and then transformed into multi-dimensional point clouds via a time-delay embedding. The shape of the point cloud is analyzed with topological methods such as persistent homology to generate topology-based key health indicators based on Betti numbers, information entropy and signal persistence. Such indicators are tested for CBM and diagnosis (fault detection) to identify faults in wind turbines and classify them accordingly. Topological indicators are shown to be an interesting alternative for failure identification and diagnosis of operational failures in wind turbines.

\phmLicenseFootnote

Simone Casolo

1 Introduction

The global demand for renewable energy sources has seen a significant rise in recent decades, with wind energy emerging as a prominent contributor to sustainable power generation [RevWind1]. Wind turbines, pivotal in harnessing wind energy, operate under diverse environmental conditions and mechanical stresses, making their maintenance and monitoring crucial for optimal performance and longevity. Condition-based monitoring (CBM) has emerged as a proactive approach to monitor the health of wind turbines, aiming to detect faults and predict potential failures before they escalate, thus minimizing downtime and maintenance costs [RevWind2].
Traditional CBM methods often rely on spectral signal processing techniques to analyze sensor data for anomaly detection and fault diagnosis. Signal analysis techniques are commonly used for fault diagnosis and typically apply tools such as Fourier or wavelet analysis of frequency signatures from accumulated time series generated from sensors installed on wind turbines. Where possible, machine learning techniques are then used to identify early signatures of failure in the data and alert engineers as soon as the equipment’s health starts deteriorating. However, frequency-based methods often require accumulating signals for a significant time before processing them successfully, making it an ideal method for analyzing failures after they occur. Online fault detection is much more challenging, and together with inherent complexity and non-linearity in wind turbine data, pose challenges for conventional analytical approaches.
To address these challenges, alternative data analysis techniques have gained attention for their ability to extract meaningful insights from complex datasets. Among those, topological data analysis (TDA) has recently risen as a possible alternative. TDA is a branch of data analysis that focuses on revealing the the underlying structure of datasets by analyzing their shape: particularly their topology in high-dimensional state spaces. By representing data as multidimensional point clouds and leveraging mathematical tools from algebraic topology, TDA enables the identification of intricate patterns, anomalies, and trends that may not be discernible through traditional signal processing methods alone (see Fig.1).

Refer to caption
Figure 1: Overview of how the gearbox vibration data are processed by means of topological data analysis.

In this study, we explore how TDA techniques can be employed to analyze vibration data collected from wind turbines at a wind park. Vibration sensors placed strategically in different locations of the turbine’s gearbox provide high-frequency data capturing both vibration acceleration and frequency spectra.By employing a systematic data pipeline, including time-series segmentation and time-delay embedding, we transform the raw sensor data into a multidimensional point cloud and then, process it via topological analysis.
The primary objective of this research is to evaluate how topological indicators derived from TDA, such as Betti numbers, information entropy, and signal persistence can be used or complement more traditional spectral analysis as key health indicators for CBM and fault diagnosis in wind turbines.

2 Data description

For this analysis, we use vibration data collected from two wind turbine gearboxes from a wind park located in Norway. The data sets are proprietary, owned by the wind park operator ANEO (www.aneo.no) and this work is the first publicly available analysis of the data. The data was collected using accelerometers, located at various positions in the gearbox. For the analysis, we focused on sensors that were physically closest to the known failure positions and most correlated with the time of failure of the gearbox. The considered sensors are located at the gearbox high-speed stage front (GbxHssFr), at the gearbox intermediate stage (GbxIss), at the gearbox planetary stage (Gbx1Ps), and at the non-drive end of the generator (GnNDe). The left panel in Figure 2 show a 0.05 stimes0.05s0.05\text{\,}\mathrm{s}start_ARG 0.05 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG example of vibrations recorded from GbxHssFr. The vibration / acceleration data were sampled at 25.6 kHz for 10 seconds at infrequent intervals. The two cases have respectively 23 and 21 samples of 10 s length with a sampling rate of 25.6 kHz. The data is collected at infrequent intervals over approximately a year until the time when failures happened, and the equipment was stopped for maintenance. In the first case, data were acquired from 2022-10-28 to 2023-10-11 and data ended with a ball bearing failure (BBF) at the non-drive end of the generator. In the second case, data was recorded from 2022-05-24 to 2023-06-21 ended with a gear tooth failure (GTF) at the planetary stage section of the gearbox.

3 Methods

In this section, we delineate the methodologies employed for analyzing complex data structures, focusing particularly on spectral analysis and topological data analysis (TDA). Spectral analysis, rooted in the principles of linear algebra and signal processing, extracts valuable insights from data by decomposing it into its constituent frequencies. Conversely, topological data analysis, drawing from the field of algebraic topology, examines the shape and connectivity of data through the lens of persistent homology, providing a holistic understanding of its underlying structure.
Both spectral analysis and TDA offer distinct yet complementary approaches to understanding complex datasets. While spectral analysis emphasizes frequency-based decomposition, TDA highlights the intrinsic topological features of the data. By comparing and contrasting these methodologies, we aim to elucidate their respective strengths, limitations, and applicability in various analytical contexts. This comparative analysis serves as a foundation for our subsequent exploration and interpretation of results, contributing to a comprehensive understanding of the dataset under investigation.

3.1 Spectral analysis

Spectral analysis, a fundamental technique in signal processing and data analysis, provides a powerful framework for decomposing complex data. Rooted in the principles of Fourier series, spectral analysis offers invaluable insights into the underlying structure and dynamics of various data types across diverse domains, including engineering, physics, biology, and finance.
At its core, spectral analysis aims to characterize the frequency content of a signal or dataset. By representing data in the frequency domain, analysts can identify dominant patterns, periodicities, and trends that may not be readily apparent in the time or spatial domain. This decomposition facilitates the extraction of meaningful information, enabling researchers to discern underlying patterns, detect anomalies, and make informed predictions.
Spectral analysis is a common tool for condition monitoring in wind turbines [wind2, wind3]. Vibration data are typically collected from sensors placed in correspondence to moving elements in turbine generators and gearboxes, subject to wear and mechanical failure. Data are analyzed to identify anomalies and expose drift and changes in the data that can be associated with a degradation of the system health and, inturn, lead to its mechanical failure [wind1, RevWind1, RevWind2].
One of the key advantages of spectral analysis lies in its ability to unveil hidden relationships and structures within data. Through techniques such as Fourier transform, wavelet analysis, and singular value decomposition (SVD), analysts can disentangle complex signals into simpler components, each representing a distinct frequency or mode of variation. This spectral decomposition forms the basis for a wide range of applications, including signal filtering, noise reduction, feature extraction, and system identification.

3.2 Topological data analysis

Topological data analysis allows the interpretation of the spatial arrangement of data. This approach has been developed in the last decade and successfully applied to the analysis of data in several fields of engineering, fluid mechanics [CasoloTDA], physics and biology [TDA_Review]. Here we will present a brief introduction to the topic: for a full exposition of this approach, we recommend the excellent articles from Perea and Harer [PereaHarer], Chazal et al. [chazal] and Smith et al. [SMITH2021107202].
A common assumption in data analysis is the hypothesis that there exists a suitable space of parameters where data happen to form a manifold. In this case, it would be fair to assume that the shape of such a manifold would contain information about the data. TDA is one of the tools that can be used to interpret such information. Univariate time series of a scalar signal is not immediately suitable to be analysed with TDA. The signal is therefore embedded with a time-delay approach to form a high-dimensional space via a procedure known as Takens embedding[TakensTheo]. This method embeds a time signal into a vector without loss of information, by defining two parameters: the time-delay τ𝜏\tauitalic_τ and the embedding dimension d𝑑ditalic_d. Then, the time series x(t)x𝑡\textbf{x}(t)x ( italic_t ) is sampled in d𝑑ditalic_d-points, each separated by a time τ𝜏\tauitalic_τ. The embedded d𝑑ditalic_d-dimensional vector is then built as:

x(t)={x(t),x(tτ),,x(tdτ)}x𝑡𝑥𝑡𝑥𝑡𝜏𝑥𝑡𝑑𝜏\textbf{x}(t)=\{x(t),x(t-\tau),\dots,x(t-d\tau)\}x ( italic_t ) = { italic_x ( italic_t ) , italic_x ( italic_t - italic_τ ) , … , italic_x ( italic_t - italic_d italic_τ ) } (1)

As the time series evolves in time, it can be sampled repeatedly to build a series of vectors, which are accumulated to form a point cloud in d𝑑ditalic_d-dimensions. This cloud samples the manifold on which the data lays.
Once the data are represented in the d𝑑ditalic_d-dimensional space of the embedding, this can be analyzed by using algorithms developed in algebraic topology. To build the manifold, it would be required to connect each vector, i.e. point in the cloud within a given radius around each point, to form a network or a cell complex. This process is performed by connecting points lying within a given radius via the creation of Vietoris-Rips complexes: a simplicial (cell) complex representing the connectivity between data points in a dataset. To encode the complexity of the point cloud, we then compute a nested series of complexes that are formed at every point increasing the value of the radius in a process known as filtration. The construction of the complex involves considering all possible subsets of data points and connecting those that are within a specified distance threshold. Overall, the point cloud generated from the time series is a sampling of the shape of the data, and the filtration process generates several simplicial complexes which are the computational descriptions of the shape of the data. As the filtration parameter increases, the Vietoris-Rips complex captures increasingly complex topological features, ranging from individual points to higher–dimensional structures such as loops and voids. Typically, these features are unique to the data manifold [ATTALI2013448] and are the topological structures we consider when analyzing the data.
The presence of loops, voids, etc. is encoded in the concept of homology. Persistent homology analyze the development of data sets by considering the evolution of topological features across different scales. It quantifies the persistence of these features as they emerge, merge, or disappear, providing a robust framework for capturing and characterizing the essential topological structure of complex datasets. Each structure then has a birth and a death value at a given radius of the filtration process, which can be recorded in a diagram known as a persistence diagram, unique for the analyzed shape. Each point in the diagram corresponds to a topological feature per each dimension (connected components in dimension 0, loops in dimension 1, voids in dimension 2, etc.) with its coordinates indicating the scale at which the feature is born and dies (see Figure 2 for an example of a persistence diagram). The persistence of a feature is measured as the difference between its death (d𝑑ditalic_d) and birth (b𝑏bitalic_b) scales. Naturally, persistence diagrams are non-empty only above the diagonal as the death of a feature would occur only after its birth, and the more ’persistent’ a feature is, the further this would lay from the diagonal line.
By analyzing persistence diagrams, it is possible to identify persistent features that are robust across multiple scales and distinguish them from transient noise or artefacts in the data. Topological indicators in each homology dimension Hksubscript𝐻𝑘H_{k}italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can be extracted from persistence diagrams and used to analyze data.
While TDA can be applied to uncover the shape of the data manifold for a signal of an arbitrarily long time, it can also be applied to a sequence of short time windows, sliding forward in time and partially overlap** [PereaHarer]. This sliding windows approach can be used to uncover the local structure of data and their evolution and it has been used successfully to study the dynamics of mechanical systems.

Refer to caption
Figure 2: Left to right: Raw time-series signal, embedded point cloud and persistence diagram for GbxHssFr sensor at normal operation state. Note the toroidal point cloud, resulting from the embedding of the periodic time series. The loop structure is revealed in the persistence diagram as a point (yellow) far from the diagonal, where points created by signal noise tend to accumulate.

3.3 Topology of vibration signals

Topological methods are expected to work particularly well for analyzing periodic time signals and their changes. Mathematically, it can be shown that periodic signals which can be approximated with a trigonometric function of a given frequency, can be embedded into a point cloud of elliptical shape, hence in a loop that should be detected by a high persistence signal of dimension 1 (H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) [PereaHarer]. When the signal is instead composed of combinations of more frequencies these give rise to more complex manifolds such as tori and higher dimensional structures [perea2016persistent].
In the case of oscillating systems, according to the Arnol’d-Liouville theorem in dynamical system theory, systems of n𝑛nitalic_n harmonic oscillators give rise to trajectories on a n𝑛nitalic_n-dimensional torus. This phenomenon emerges due to the conservation of action variables, which characterize the system’s motion in phase space. In a system of harmonic oscillators, each oscillator contributes a set of action-angle variables, representing the oscillation’s amplitude and phase in each dimension. These variables remain constant over time, preserving the system’s dynamics. As a consequence, trajectories in phase space form closed loops, tracing out toroidal surfaces [Arnold]. This behaviour stems from the periodicity of harmonic motion, enabling the system’s state to return to its initial configuration after completing a cycle. The toroidal topology of these trajectories reflects the periodicity and conservation of action variables, illustrating a fundamental principle of dynamical systems theory.
When a vibrating mechanical system such as the gearbox of a wind turbine oscillates, it is reasonable to expect, accounting for deviation and noise, a behaviour similar to that of a harmonic oscillator, hence a trajectory in phase space spanning a manifold similar to a torus. In this case, it would be reasonable to expect some homology signatures that should be visible from the persistence diagrams, making persistent homology a good candidate method for characterizing the dynamics of vibrations at the gearbox and, hopefully, spotting the appearance and evolution of abnormal behaviour from sensors’ time series.

3.4 Analysis strategy

In this work, we have chunks of high-frequency data sparsely collected, each a few weeks or months apart. Every chunk of data is sampled with 25.6 kHztimes25.6kHz25.6\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}start_ARG 25.6 end_ARG start_ARG times end_ARG start_ARG roman_kHz end_ARG for a period of 10 stimes10s10\text{\,}\mathrm{s}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG, allowing for spectral, spectral-temporal or topological data analysis. We assume that any changes happen on time scales of days or weeks, and hence the data is stationary over each of those 10 stimes10s10\text{\,}\mathrm{s}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG segments. Therefore, the main strategy of our analysis focuses on finding trends between time segments as we get closer to the failure time.
The key challenge in this work is the lack of ground truth, as we do not know the onset of the damage that eventually led to the failure of the gearbox. Therefore, we use the early stages of data as a baseline, assuming that the damage developed later. In other words, we are looking for systematic deviations from the early state which is assumed to be healthy. Topological data analysis was performed with the Giotto-TDA code suite [giottoph]. Time series from vibration sensors were embedded using Takens embedding with the optimal time delay and embedding dimension chosen by the built-in standard heuristics based on mutual information [MutualInfo, FalseNN]. Persistence diagrams D𝐷Ditalic_D were then compiled from the Vietoris-Rips complexes obtained from the filtration and used to compute the following topological indicators:
The maximum persistence, defined as the infinity norm for each homology dimension:

𝒫Hk(DHk)=max{b,d}D|db|subscriptsuperscript𝒫subscriptH𝑘subscript𝐷subscriptH𝑘subscript𝑏𝑑𝐷𝑑𝑏\mathcal{P}^{\textrm{H}_{k}}_{\infty}(D_{\textrm{H}_{k}})=\max_{\{b,d\}\in D}|% d-b|caligraphic_P start_POSTSUPERSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = roman_max start_POSTSUBSCRIPT { italic_b , italic_d } ∈ italic_D end_POSTSUBSCRIPT | italic_d - italic_b | (2)

This is a useful shape indicator as noise gives rise to points in D𝐷Ditalic_D with a short lifetime, while relevant features of the points cloud (e.g. loops) are expected to have high persistence.
The normalized persistence entropy is another measure of complexity [PerEntropy, ATIENZA2020107509], E¯Hk(D)subscript¯EsubscriptH𝑘𝐷\overline{\textrm{E}}_{\textrm{H}_{k}}(D)over¯ start_ARG E end_ARG start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_D ), expressed as a measure of the distribution of points along the diagram based on Shannon’s entropy formula:

E¯Hk(D)=1log2𝒮(D){b,d}DHk|db|𝒮(D)log2(|db|𝒮(D))subscript¯EsubscriptH𝑘𝐷1subscript2𝒮𝐷subscript𝑏𝑑subscript𝐷subscriptH𝑘𝑑𝑏𝒮𝐷subscript2𝑑𝑏𝒮𝐷\overline{\textrm{E}}_{\textrm{H}_{k}}(D)=-\frac{1}{\log_{2}{\mathcal{S}(D)}}% \sum_{\{b,d\}\in D_{\textrm{H}_{k}}}\frac{|d-b|}{\mathcal{S}(D)}\log_{2}\left(% \frac{|d-b|}{\mathcal{S}(D)}\right)over¯ start_ARG E end_ARG start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_D ) = - divide start_ARG 1 end_ARG start_ARG roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_S ( italic_D ) end_ARG ∑ start_POSTSUBSCRIPT { italic_b , italic_d } ∈ italic_D start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG | italic_d - italic_b | end_ARG start_ARG caligraphic_S ( italic_D ) end_ARG roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG | italic_d - italic_b | end_ARG start_ARG caligraphic_S ( italic_D ) end_ARG )

where the amplitude 𝒮(DHk)𝒮subscript𝐷subscriptH𝑘\mathcal{S}(D_{\textrm{H}_{k}})caligraphic_S ( italic_D start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) for a given dimension is defined as:

𝒮(DHk)={b,d}D|db|𝒮subscript𝐷subscriptH𝑘subscript𝑏𝑑𝐷𝑑𝑏\mathcal{S}(D_{\textrm{H}_{k}})=\sum_{\{b,d\}\in D}|d-b|caligraphic_S ( italic_D start_POSTSUBSCRIPT H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT { italic_b , italic_d } ∈ italic_D end_POSTSUBSCRIPT | italic_d - italic_b | (3)

Betti curves are another informative topological indicator, which measures the amount of k𝑘kitalic_k-dimensional topological features i.e. the Betti number, βksubscript𝛽𝑘\beta_{k}italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [Hatcher2002], at each value of the filtration parameter. In practice, these ”count” the number of k𝑘kitalic_k-dimensional holes of a space: β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT represents connected components, β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT circles, β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT voids, etc. As an example, for a two-dimensional circle the set of Betti numbers {β0,β1,β2}subscript𝛽0subscript𝛽1subscript𝛽2\{\beta_{0},\beta_{1},\beta_{2}\}{ italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } are {1,1,0}110\{1,1,0\}{ 1 , 1 , 0 }, for a filled disk {1,0,0}100\{1,0,0\}{ 1 , 0 , 0 }, a hollow sphere {1,0,1}101\{1,0,1\}{ 1 , 0 , 1 }, for a filled ball {1,0,0}100\{1,0,0\}{ 1 , 0 , 0 }, for a torus {1,2,1}121\{1,2,1\}{ 1 , 2 , 1 }, etc.
Other indicators are the f𝑓fitalic_f-family of indicators defined here, as proposed by Adcock et al. [Carlsson_f] and used in TDA for the anomaly detection in rotating equipment for manufacturing [chatter2, chatter3] as they combine the highest persistence with amplitude information:

f1=ibi(dibi)f2=i(dmaxdi)(dibi)f3=ibi2(dibi)4f4=i(dmaxdi)2(dibi)4subscript𝑓1subscript𝑖subscript𝑏𝑖subscript𝑑𝑖subscript𝑏𝑖missing-subexpressionsubscript𝑓2subscript𝑖subscript𝑑𝑚𝑎𝑥subscript𝑑𝑖subscript𝑑𝑖subscript𝑏𝑖missing-subexpressionsubscript𝑓3subscript𝑖superscriptsubscript𝑏𝑖2superscriptsubscript𝑑𝑖subscript𝑏𝑖4missing-subexpressionsubscript𝑓4subscript𝑖superscriptsubscript𝑑𝑚𝑎𝑥subscript𝑑𝑖2superscriptsubscript𝑑𝑖subscript𝑏𝑖4\begin{array}[]{c}f_{1}=\sum_{i}b_{i}\cdot(d_{i}-b_{i})\\ \\ f_{2}=\sum_{i}(d_{max}-d_{i})-(d_{i}-b_{i})\\ \\ f_{3}=\sum_{i}b_{i}^{2}\cdot(d_{i}-b_{i})^{4}\\ \\ f_{4}=\sum_{i}(d_{max}-d_{i})^{2}-(d_{i}-b_{i})^{4}\end{array}start_ARRAY start_ROW start_CELL italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARRAY (4)

4 Data Analysis

No data cleaning or pre-processing has been performed to the signal prior to the analysis described in Section 3, hereafter addressed as ’raw data’.

4.1 Bearing Failure

Refer to caption
Figure 3: Fourier transform (normalised to counts) of the signal recorded on 2023-10-28 for GnNDe-BBF and GbxHssFr-BBF. The vertical lines indicate the frequency intervals for which the most dominating peaks are investigated for GbxHssFr.
Refer to caption
Figure 4: Peak height (left axis, circles) and width (right axis, crosses) for three frequency signatures (most dominant peak in the frequency ranges [1000, 1800], [1800, 2300], [2300, 3000] Hz) for GbxHssFr in the bearing failure case.
Refer to caption
Figure 5: Topological indicators computed for the signal GbxHssFr in the bearing failure case. Highlighted the most significant anomaly, dated 2023-10-08.

The bearing failure was reported at the non-drive end of the generator, corresponding to the location of the sensor labelled as ”GnNDe” and the signal was recorded sporadically between October 2022 and the failure on November 11 2023. Each time series records acceleration data for the sensor and the corresponding frequency spectrum is computed from the raw signal through a Fast Fourier Transform (FFT) approximation. Figure 3 shows the spectrum for the signal recorded at the GnNDe (blue) and at the earliest available timestamp, 28-10-2023. We assume this to correspond to a state of ”normal operations”.
Topological analysis shows the point cloud corresponding with GnNDe is not describing a torus, but rather a semi-uniform ball, indicating non-periodic or very noisy behaviour. As a consequence, the H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT persistence can only be interpreted as a measure of how much clustered or diffused the data are in the parameters space, while H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and higher-dimensional homology signals are expected to be low and not significant. Indeed, the only noticeable trend in the topological indicators is a decrease in H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT persistence and an increase in entropy, typically as a consequence of a progressively less structured and more noisy signal. At a closer look, other sensor signals seem more suitable for analysis. In particular, the intermediate and high-speed stage sensors (GbxIss and GbxHss, respectively) show a more periodic and regular behaviour. Indeed the high-speed front (GbxHssFr) sensor shows a clear oscillating signal and a frequency spectrum dominated by a peak at around 1400 Hztimes1400Hz1400\text{\,}\mathrm{H}\mathrm{z}start_ARG 1400 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG and its multiples (orange spectrum in Figure 4). The embedded signal clearly shows a toroidal shape, a ”filled” torus consisting of one main loop induced by the main frequency component, and the direction orthogonal to the loop blown up by the noise. The corresponding persistence diagram then shows a high persistence point for H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and one at H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT corresponding to the loop and proportional to its size.
The analysis of the evolution of the GbxHssFr signal is not trivial. Figure 3 shows the time development of the most dominant peak in each of the three frequency bands shown in Figure 4. We found that the frequencies do not shift significantly until the time of the failure (not shown). The corresponding peak heights and widths show a larger spread, especially at the lowest frequency. We also measure the evolution by computing the mutual distance between the vectors containing Fourier coefficients for each time series. This distance becomes more evident between the signal in the early timestamps (i.e. normal operations) and signals in a few specific days close to the failure, in particular at 2023-10-08 and 2023-10-10, one and three days from the failure, especially for the components included from 0 to 1800 Hztimes1800Hz1800\text{\,}\mathrm{H}\mathrm{z}start_ARG 1800 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG.
We observe a similar behaviour in the skewness and kurtosis of the raw signal, which show a slow decreasing trend, with a very high spike in the latter at the timestamp 08-10-2023, 3 days from the point of failure, which was not evident from the spectra alone. The monitoring of kurtosis in the early detection of bearing failures is well-known in the literature and it is likely to be a good indicator in this case as well [kurt3, kurt2, kurt1].

Refer to caption
Figure 6: Topological indicators obtained by averaging the results of several sliding windows of 5 mstimes5ms5\text{\,}\mathrm{m}\mathrm{s}start_ARG 5 end_ARG start_ARG times end_ARG start_ARG roman_ms end_ARG, computed for each of the chunks for GbxHssFr in the bearing failure case. The most significant anomaly is dated 2023-10-08.
Refer to caption
Figure 7: Left to right: Raw time-series signal, embedded point cloud and persistence diagram for GbxHssFr sensor recorded at 2023-10-08. Comparing the point and the persistence diagram with Figure 2 the loop structure of the point cloud has disappeared, together with the high persistence H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT point in the diagram.

The development of TDA indicators over time are shown in Figure 5 for GbxHssFr. Indeed most indicators show a sharp change around 08-10-2023, particularly the indicators that include the maximum persistence in dimension 1, e.g. 𝒫H1subscriptsuperscript𝒫subscriptH1\mathcal{P}^{\textrm{H}_{1}}_{\infty}caligraphic_P start_POSTSUPERSCRIPT H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, f2(H1)subscript𝑓2subscript𝐻1f_{2}(H_{1})italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and f4(H1)subscript𝑓4subscript𝐻1f_{4}(H_{1})italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). When applying the sliding windows approach of TDA and focusing on the short-term dynamics of the signal, the topological indicators are computed for short time windows (5 mstimes5ms5\text{\,}\mathrm{m}\mathrm{s}start_ARG 5 end_ARG start_ARG times end_ARG start_ARG roman_ms end_ARG) across one signal and then averaged (Figure 6). This deep dive allows us to expose the dynamics of the signal, how the topology of the point cloud changes on short timescales and, in turn, whether the signal frequencies are finely modulated. The sliding window analysis is in perfect agreement with the Fourier analysis and the kurtosis signal, where a sharp change is visible on 2023-10-08. The change in the TDA results can be ascribed to a change in the average frequency of the signal, leading to a shrinkage of the toroidal point cloud to the point of almost closing the ’hole’ of the torus (see Figure 7). This leads to a temporarily abrupt decrease in the persistence of the H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT feature, and an increase in its entropy (entropy scales inversely with the smoothness of the manifold). There is also an apparent amplitude modulation of the raw signal which is hard to capture with TDA, but has been linked before with bearing failures in wind turbines [AM_CBM].

4.2 Gear-tooth failure

A gear tooth damage event was reported on a different wind turbine in the same wind park in July 2023. The signal recorded for the sensor located closest to the failure, Gbx1Ps, has a frequency spectrum fairly similar to that of the high-speed sensor, GbxHssFr: dominated by few isolated frequency contributions. The only significant feature we could identify in the data is a drift in the peak width, similar to the case of the bearing fault, starting around May 2023 (see Figure 8).

Refer to caption
Figure 8: Selection of health indicators for the gear-tooth failure from sensor Gbx1Ps. FFT distance is the geometric distance between the average of the first three spectra in the dataset and each individual spectrum in the range [1800, 3000] Hz. Gbx1Ps.ECU2 is the indicator from standard ISO 10816-3. Peak 2 width and height are the characteristics of the dominating peak in the range [1800, 2300] Hz. Kurtosis is the kurtosis of the raw vibrations. All quantities have been normalised to their maximum value in the time interval.

Interestingly, when integrating the spectrum in the frequency range recommended by standard ISO 10816-3 (hereafter denoted Gbx1Ps.ECU2 where the signal is demodulated between 500-2kHz with the RMS broadband value between 1-150Hz). it appears more evident that a sudden jump in the signal of about 50% occurs between April and May 2023, as shown in Figure 8.
Following the same process as for the bearing fault, we focus on the high-speed gear sensor GbxHssFr, which shows a more regular oscillation pattern (see Figure 9). We apply both the Fourier and TDA analysis to uncover any possible failure signature in the data. Analogously to the bearing fault case, skewness and kurtosis show a drop, associated with an increase in the signal’s median, starting from around May 2023.

Refer to caption
Figure 9: Left to right: Raw time-series signal, embedded point cloud and persistence diagram for GbxHssFr sensor in the gear-tooth failure case. Note the toroidal point cloud, resulting from the embedding of the periodic time series.
Refer to caption
Figure 10: Topological indicators obtained by averaging the results of several sliding windows of 5 mstimes5ms5\text{\,}\mathrm{m}\mathrm{s}start_ARG 5 end_ARG start_ARG times end_ARG start_ARG roman_ms end_ARG, computed for each of the signal GbxHssFr in the gear tooth failure case.

The topology of the data is again that of a ”filled” torus (Figure 9), which is topologically equivalent (homotopy equivalent) to a circle in 2 dimensions. This means that it should be possible to reduce the dimensionality of the Takens embedding to 2, without loss of information. We, therefore, focused on this reduced model for our analysis. The sliding windows processing for the GbxHssFr signal reveals a change in most of the topological indicators (𝒫H0,1subscriptsuperscript𝒫subscriptH01\mathcal{P}^{\textrm{H}_{0,1}}_{\infty}caligraphic_P start_POSTSUPERSCRIPT H start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, E¯H0,1subscript¯EsubscriptH01\overline{\textrm{E}}_{\textrm{H}_{0,1}}over¯ start_ARG E end_ARG start_POSTSUBSCRIPT H start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, 𝒮H0,1)\mathcal{S}_{{H}_{0,1})}caligraphic_S start_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT, etc.) at the same timestamp in April, and again more sharply only 2 days before the failure in July 2023, as visible from Figure 10. Close to the failure there is an increase in the persistence and a decrease in the entropy, signalling a change in the size of the loop when averaged across the 10 stimes10s10\text{\,}\mathrm{s}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG of the signal at a given timestamp, but not in its shape as the Betti number indicator for dimension 1 remains stable.
In TDA, periodic functions get embedded in loops of a size proportional to the size of the sliding window [PereaHarer], therefore, a change in the size of the torus loop should correspond to changes in the period of the gearbox vibrations or some kind of frequency modulation, close to the failure event. By looking at spectrograms for the GbxHssFr signal (Figure 11) it is possible to recover some of the dynamics of the peaks in the spectrum. On one hand, at timestamps far from the failure, the spectra shift only slightly across the 10 stimes10s10\text{\,}\mathrm{s}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG of the recorded signal, and mostly the peaks tend to change width with a timescale of a few seconds. On the other hand, close to the failure it appears that the two main peaks at 1350 Hztimes1350Hz1350\text{\,}\mathrm{H}\mathrm{z}start_ARG 1350 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG and 2700 Hztimes2700Hz2700\text{\,}\mathrm{H}\mathrm{z}start_ARG 2700 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG ”jump” as their relative height tends to oscillate on a 3-4 Hz timescale, i.e. about 40 times across the measurement duration in a jittering fashion. This frequency modulation should also be noticeable in the TDA results, as the size of the loop in the point cloud should change as well. Indeed, this becomes evident when looking at the maximum persistence in dimension 1 (𝒫H1subscriptsuperscript𝒫subscriptH1\mathcal{P}^{\textrm{H}_{1}}_{\infty}caligraphic_P start_POSTSUPERSCRIPT H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT) and in particular to the radius of gyration (Rgyrationsubscript𝑅gyrationR_{\text{gyration}}italic_R start_POSTSUBSCRIPT gyration end_POSTSUBSCRIPT) for the point cloud, defined as:

Rgyration=1Ni=1N(𝐫i𝐫CM)2subscript𝑅gyration1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝐫𝑖subscript𝐫CM2R_{\text{gyration}}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}(\mathbf{r}_{i}-\mathbf{r}_% {\text{CM}})^{2}}italic_R start_POSTSUBSCRIPT gyration end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_r start_POSTSUBSCRIPT CM end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (5)

where N𝑁Nitalic_N is the total number of points in the point cloud, 𝐫isubscript𝐫𝑖\mathbf{r}_{i}bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the position vector of the i𝑖iitalic_i-th point, 𝐫CMsubscript𝐫CM\mathbf{r}_{\text{CM}}bold_r start_POSTSUBSCRIPT CM end_POSTSUBSCRIPT denotes the position vector of the centre of mass of the point cloud.

Refer to caption
Refer to caption
Figure 11: Spectrogram of first and second to last data point before failure.

The gyration radii for the data farther and closest in time to the gear tooth failure are shown in Figure 12 and manifests as a rapid oscillation in the gyration radius. This rapid modulation could indeed be a signature of imminent equipment failure. Interestingly, we notice this kind of modulation is common in other engineering disciplines, such as metal turning and machining, where is a signature of ”chattering”, a pathological resonance in the turning process [chatter0]. Unsurprisingly, TDA has been successfully applied to chatter detection and it was shown to be useful in the early detection and the machine learning identification of such anomalies is several industrial settings [chatter1, chatter2, chatter3, chatter4].

Refer to caption
Figure 12: Radius of gyration from GbxHssFr vibration data recorded at the first data point (blue) and the last data point (red) before the failure event.

5 Conclusion

In this study, we have explored the application of topological data analysis (TDA) in conjunction with spectral analysis for condition-based monitoring (CBM) of wind turbines. Our investigation focused on analyzing vibration data aiming to detect and diagnose potential faults in gearbox components.
Through TDA, we transformed raw vibration data into multidimensional point clouds and leveraged topological indicators such as Betti numbers, persistence diagrams, and entropy to characterize the underlying structure of the data. We compared TDA with traditional spectral analysis methods and observed that TDA offers complementary insights, particularly in identifying complex patterns and anomalies that may not be apparent through conventional signal processing techniques alone.
Our analysis revealed promising results in using TDA for fault detection and diagnosis. In the case of bearing failure, we observed significant changes in topological indicators, particularly in persistence and entropy, preceding the failure event. Similarly, for gear-tooth failure, TDA highlighted distinct changes in the structure of the point cloud, indicating the onset of damage. Furthermore, by integrating spectral analysis with TDA, we were able to uncover additional dynamics in the data, such as frequency modulation, which could serve as early indicators of equipment deterioration. These findings suggest the potential of TDA as a valuable tool for CBM in wind turbines, offering a complementary approach to monitoring and diagnosing faults and to proactive maintenance strategies in renewable energy generation. While TDA is only slightly more computationally demanding than the more traditional spectral analysis methods, it offers additional visual support by providing a manifold representing the data. Changes in the manifold of data in phase space correspond to changes in the vibration dynamics of the system, as is well known from dynamical system theory and therefore changes in the system’s health may be more easily inferred by analyzing the shape of the data in addition to its spectral features.
Future research could explore the integration of TDA with machine learning techniques for more robust fault detection algorithms. Additionally, incorporating real-time monitoring capabilities could enhance the practical applicability of TDA in industrial settings.

Acknowledgment

This publication has been funded by the SFI NorwAI, (Centre for Research-based Innovation, 309834). The authors gratefully acknowledge the financial support from the Research Council of Norway and the partners of the SFI NorwAI, in particular Aneo who shared their data.

Nomenclature

Note that this section is optional.

TDA𝑇𝐷𝐴TDAitalic_T italic_D italic_A Topological Data Analysis
CBM𝐶𝐵𝑀CBMitalic_C italic_B italic_M Condition Based Monitoring
Gbx𝐺𝑏𝑥Gbxitalic_G italic_b italic_x Gearbox
SVD𝑆𝑉𝐷SVDitalic_S italic_V italic_D Singular value decomposition
BBF𝐵𝐵𝐹BBFitalic_B italic_B italic_F Ball bearing failure
GTF𝐺𝑇𝐹GTFitalic_G italic_T italic_F Gear Tooth Failure
RMS𝑅𝑀𝑆RMSitalic_R italic_M italic_S Root-Mean-Square
\PHMbibliography

ijphm