Testing Topological Data Analysis for Condition Monitoring of Wind Turbines
Abstract
We present an investigation of how topological data analysis (TDA) can be applied
to condition-based monitoring (CBM) of wind turbines for energy generation.
TDA is a branch of data analysis focusing on extracting meaningful information from complex datasets by analyzing their structure in state space and computing their underlying topological features. By representing data in a high-dimensional state space, TDA enables the identification of patterns, anomalies, and trends in the data that may not be apparent through traditional signal processing methods.
For this study, wind turbine data was acquired from a wind park in Norway via standard vibration sensors at different locations of the turbine’s gearbox. Both the vibration acceleration data and its frequency spectra were recorded at infrequent intervals for a few seconds at high frequency and failure events were labelled as either gear-tooth or ball-bearing failures. The data processing and analysis are based on a pipeline where the time series data is first split into intervals and then transformed into multi-dimensional point clouds via a time-delay embedding. The shape of the point cloud is analyzed with topological methods such as persistent homology to generate topology-based key health indicators based on Betti numbers, information entropy and signal persistence. Such indicators are tested for CBM and diagnosis (fault detection) to identify faults in wind turbines and classify them accordingly. Topological indicators are shown to be an interesting alternative for failure identification and diagnosis of operational failures in wind turbines.
Simone Casolo
1 Introduction
The global demand for renewable energy sources has seen a significant rise in recent decades, with wind energy emerging as a prominent contributor to sustainable power generation [RevWind1].
Wind turbines, pivotal in harnessing wind energy, operate under
diverse environmental conditions and mechanical stresses, making their maintenance and monitoring crucial for optimal performance and longevity. Condition-based monitoring (CBM) has emerged as a proactive approach to monitor the health of wind turbines, aiming to detect faults and predict potential failures before they escalate, thus minimizing downtime and maintenance costs [RevWind2].
Traditional CBM methods often rely on spectral signal processing techniques to analyze sensor data for anomaly detection and fault diagnosis. Signal analysis techniques are commonly used for fault diagnosis and typically apply tools such as Fourier or wavelet analysis of frequency signatures from accumulated time series generated from sensors installed on wind turbines. Where possible, machine learning techniques are then used to identify early signatures of failure in the data and alert engineers as soon as the equipment’s health starts deteriorating. However, frequency-based methods often require accumulating signals
for a significant time before processing them successfully, making it an ideal method for analyzing failures after they occur. Online fault detection is much more challenging, and together with inherent complexity and non-linearity in wind turbine data, pose challenges for conventional analytical approaches.
To address these challenges, alternative data analysis techniques
have gained attention for their ability to extract meaningful insights
from complex datasets. Among those, topological data analysis (TDA) has
recently risen as a possible alternative.
TDA is a branch of data analysis that focuses on revealing the
the underlying structure of datasets by analyzing their shape:
particularly their topology in high-dimensional state spaces.
By representing data as multidimensional point clouds
and leveraging mathematical tools from algebraic topology, TDA enables the identification of intricate patterns, anomalies, and trends that may not be discernible through traditional signal processing methods alone (see Fig.1).
In this study, we explore how TDA techniques can be employed to analyze vibration data collected from wind turbines at a wind park. Vibration sensors placed strategically in different locations of the turbine’s gearbox provide high-frequency data capturing both vibration acceleration and frequency spectra.By employing a systematic data pipeline, including time-series segmentation and time-delay embedding, we transform the raw sensor data into a multidimensional point cloud and then, process it via topological analysis.
The primary objective of this research is to evaluate how topological indicators derived from TDA, such as Betti numbers, information entropy, and signal persistence can be used or complement more traditional spectral analysis as key health indicators for CBM and fault diagnosis in wind turbines.
2 Data description
For this analysis, we use vibration data collected from two wind turbine gearboxes from a wind park located in Norway. The data sets are proprietary, owned by the wind park operator ANEO (www.aneo.no) and this work is the first publicly available analysis of the data. The data was collected using accelerometers, located at various positions in the gearbox. For the analysis, we focused on sensors that were physically closest to the known failure positions and most correlated with the time of failure of the gearbox. The considered sensors are located at the gearbox high-speed stage front (GbxHssFr), at the gearbox intermediate stage (GbxIss), at the gearbox planetary stage (Gbx1Ps), and at the non-drive end of the generator (GnNDe). The left panel in Figure 2 show a example of vibrations recorded from GbxHssFr. The vibration / acceleration data were sampled at 25.6 kHz for 10 seconds at infrequent intervals. The two cases have respectively 23 and 21 samples of 10 s length with a sampling rate of 25.6 kHz. The data is collected at infrequent intervals over approximately a year until the time when failures happened, and the equipment was stopped for maintenance. In the first case, data were acquired from 2022-10-28 to 2023-10-11 and data ended with a ball bearing failure (BBF) at the non-drive end of the generator. In the second case, data was recorded from 2022-05-24 to 2023-06-21 ended with a gear tooth failure (GTF) at the planetary stage section of the gearbox.
3 Methods
In this section, we delineate the methodologies employed for analyzing complex data
structures, focusing particularly on spectral analysis and topological data analysis
(TDA). Spectral analysis, rooted in the principles of linear algebra and signal
processing, extracts valuable insights from data by decomposing it into its constituent
frequencies. Conversely, topological data analysis, drawing from the field
of algebraic topology, examines the shape and connectivity of data through the lens of
persistent homology, providing a holistic understanding of its underlying structure.
Both spectral analysis and TDA offer distinct yet complementary approaches to
understanding complex datasets. While spectral analysis emphasizes frequency-based
decomposition, TDA highlights the intrinsic topological features of the data. By comparing
and contrasting these methodologies, we aim to elucidate their respective strengths,
limitations, and applicability in various analytical contexts. This comparative analysis
serves as a foundation for our subsequent exploration and interpretation of results,
contributing to a comprehensive understanding of the dataset under investigation.
3.1 Spectral analysis
Spectral analysis, a fundamental technique in signal processing and data analysis,
provides a powerful framework for decomposing complex data. Rooted in the principles of Fourier series, spectral analysis offers invaluable insights into the underlying structure and dynamics of various data types across diverse domains, including engineering, physics, biology, and finance.
At its core, spectral analysis aims to characterize the frequency content of a signal or dataset. By representing data in the frequency domain, analysts can identify dominant patterns, periodicities, and trends that may not be readily apparent in the time or spatial domain. This decomposition facilitates the extraction of meaningful information, enabling researchers to discern underlying patterns, detect anomalies, and make informed predictions.
Spectral analysis is a common tool for condition monitoring
in wind turbines [wind2, wind3]. Vibration data are typically collected
from sensors placed in correspondence to moving elements in turbine generators and
gearboxes, subject to wear and mechanical failure. Data are analyzed to
identify anomalies and expose drift and changes in the data that can be associated
with a degradation of the system health and, inturn, lead to its mechanical failure
[wind1, RevWind1, RevWind2].
One of the key advantages of spectral analysis lies in its ability to unveil hidden relationships and structures within data. Through techniques such as Fourier transform, wavelet analysis, and singular value decomposition (SVD), analysts can disentangle complex signals into simpler components, each representing a distinct frequency or mode of variation. This spectral decomposition forms the basis for a wide range of applications, including signal filtering, noise reduction, feature extraction, and system identification.
3.2 Topological data analysis
Topological data analysis allows the interpretation of the spatial arrangement
of data. This approach has been developed in the last decade and successfully applied to the analysis of data in several fields of engineering, fluid mechanics [CasoloTDA], physics and biology [TDA_Review]. Here we will present a brief introduction to the topic: for a full exposition of this approach, we recommend the excellent articles from Perea and Harer [PereaHarer], Chazal et al. [chazal] and Smith et al. [SMITH2021107202].
A common assumption in data analysis is the hypothesis that there exists a suitable space of parameters where data happen to form a manifold. In this case, it would be fair to assume that the shape of such a manifold would contain information about the data. TDA is one of the tools that can be used to interpret such information. Univariate time series of a scalar signal is not immediately suitable to be analysed with TDA. The signal is therefore embedded with a time-delay approach to form a high-dimensional space via a procedure known as Takens embedding[TakensTheo]. This method embeds a time signal into a vector without loss of information, by defining two parameters: the time-delay and the embedding dimension . Then, the time series is sampled in -points, each separated by a time . The embedded -dimensional vector is then built as:
(1) |
As the time series evolves in time, it can be sampled repeatedly to build
a series of vectors, which are accumulated to form a point cloud in
-dimensions. This cloud samples the manifold on which the data lays.
Once the data are represented in the -dimensional space of the embedding,
this can be analyzed by using algorithms developed in algebraic topology.
To build the manifold, it would be required to connect each vector,
i.e. point in the cloud within a given radius around each point,
to form a network or a cell complex. This process is performed by connecting points lying within a given radius via
the creation of Vietoris-Rips complexes: a simplicial (cell) complex representing the connectivity between data points in a dataset.
To encode the complexity of the point
cloud, we then compute a nested series of complexes that are formed at every point
increasing the value of the radius in a process known as filtration.
The construction of the complex involves considering all possible subsets
of data points and connecting those that are within a specified distance
threshold. Overall, the point cloud
generated from the time series is a sampling of the shape of the
data, and the filtration process generates several simplicial complexes
which are the computational descriptions of the shape of the data.
As the filtration parameter increases, the Vietoris-Rips complex captures
increasingly complex topological features, ranging from individual points
to higher–dimensional structures such as loops and voids. Typically, these
features are unique to the data manifold
[ATTALI2013448] and are the topological structures
we consider when analyzing the data.
The presence of loops, voids, etc. is encoded in the concept of homology.
Persistent homology analyze the development of data sets by considering the
evolution of topological features across different scales. It quantifies the
persistence of these features as they emerge, merge, or disappear, providing a
robust framework for capturing and characterizing the essential topological
structure of complex datasets. Each structure then has
a birth and a death value at a given radius of the filtration process,
which can be recorded in a diagram known as a persistence diagram, unique for the
analyzed shape. Each point in the diagram corresponds to a
topological feature per each dimension (connected components in dimension 0, loops in
dimension 1, voids in dimension 2, etc.)
with its coordinates indicating the scale at which the feature is born and
dies (see Figure 2 for an example of a persistence diagram). The persistence of a feature is measured as the difference between
its death () and birth () scales. Naturally, persistence diagrams
are non-empty only above the diagonal as the death of a feature would occur
only after its birth, and the more ’persistent’ a feature is, the further this
would lay from the diagonal line.
By analyzing persistence diagrams, it is possible to
identify persistent features that are robust across multiple scales and
distinguish them from transient noise or artefacts in the data.
Topological indicators in each homology dimension can be extracted from
persistence diagrams and used to analyze data.
While TDA can be applied to uncover the shape of the data manifold for a
signal of an arbitrarily long time, it can also be applied to a sequence of
short time windows, sliding forward in time and partially overlap**
[PereaHarer].
This sliding windows approach can be used to
uncover the local structure of data and their evolution and it has been used
successfully to study the dynamics of mechanical systems.
3.3 Topology of vibration signals
Topological methods are expected to work particularly well for analyzing
periodic time signals and their changes. Mathematically, it can be shown that
periodic signals which can be approximated with a trigonometric function
of a given frequency, can be embedded into a point cloud of elliptical
shape, hence in a loop that should be detected by a high persistence signal
of dimension 1 () [PereaHarer].
When the signal is instead composed of combinations of more frequencies
these give rise to more complex manifolds such as tori and higher dimensional
structures [perea2016persistent].
In the case of oscillating systems, according to the Arnol’d-Liouville theorem in
dynamical system theory, systems of harmonic oscillators give rise to trajectories
on a -dimensional torus.
This phenomenon emerges due to the conservation of action variables, which
characterize the system’s motion in phase space. In a system of harmonic oscillators,
each oscillator contributes a set of action-angle variables, representing the
oscillation’s amplitude and phase in each dimension. These variables remain constant
over time, preserving the system’s dynamics. As a consequence, trajectories in phase
space form closed loops, tracing out toroidal surfaces [Arnold].
This behaviour stems from the
periodicity of harmonic motion, enabling the system’s state to return to its initial
configuration after completing a cycle. The toroidal topology of these trajectories
reflects the periodicity and conservation of action variables, illustrating a
fundamental principle of dynamical systems theory.
When a vibrating mechanical system such as the gearbox of a wind turbine oscillates,
it is reasonable to expect, accounting for deviation and noise, a behaviour
similar to that of a harmonic oscillator, hence a trajectory in phase space
spanning a manifold similar to a torus. In this case, it would be reasonable to
expect some homology signatures that should be visible from the persistence
diagrams, making persistent homology a good candidate method for characterizing the
dynamics of vibrations at the gearbox and, hopefully, spotting the
appearance and evolution of abnormal behaviour from sensors’ time series.
3.4 Analysis strategy
In this work, we have chunks of high-frequency data sparsely collected, each a few weeks or months apart. Every chunk of data is sampled with for a period of , allowing for spectral, spectral-temporal or topological data analysis. We assume that any changes happen on time scales of days or weeks, and hence the data is stationary over each of those segments. Therefore, the main strategy of our analysis focuses on finding trends
between time segments as we get closer to the failure time.
The key challenge in this work is the lack of ground truth, as we do not know the onset of the damage that eventually led to the failure of the gearbox. Therefore, we use the early stages of data as a baseline, assuming that the damage developed later. In other words, we are looking for systematic deviations from the early state which is assumed to be healthy.
Topological data analysis was performed with the Giotto-TDA code suite
[giottoph]. Time series from vibration sensors were embedded using Takens embedding with the optimal time delay and embedding dimension
chosen by the built-in standard heuristics based on mutual information [MutualInfo, FalseNN]. Persistence diagrams were then compiled from the Vietoris-Rips complexes obtained from the filtration and used to compute the following topological indicators:
The maximum persistence, defined as the infinity norm for each homology dimension:
(2) |
This is a useful shape indicator as noise gives rise
to points in with a short lifetime, while relevant
features of the points cloud
(e.g. loops) are expected to have high persistence.
The normalized persistence entropy is another measure of complexity [PerEntropy, ATIENZA2020107509],
, expressed as a measure
of the distribution of points along the diagram based on Shannon’s
entropy formula:
where the amplitude for a given dimension is defined as:
(3) |
Betti curves are another informative topological indicator, which measures the amount of -dimensional topological features i.e.
the Betti number, [Hatcher2002],
at each value of the filtration parameter.
In practice, these ”count” the number of -dimensional
holes of a space:
represents connected components, circles, voids, etc.
As an example, for a two-dimensional circle the set of Betti numbers
are
, for a filled disk , a hollow sphere ,
for a filled ball , for a torus , etc.
Other indicators are the -family of indicators defined here, as
proposed by Adcock et al. [Carlsson_f] and used in TDA for the
anomaly detection in rotating equipment for manufacturing
[chatter2, chatter3]
as they combine the highest persistence with amplitude information:
(4) |
4 Data Analysis
No data cleaning or pre-processing has been performed to the signal prior to the analysis described in Section 3, hereafter addressed as ’raw data’.
4.1 Bearing Failure
The bearing failure was reported at the non-drive end of the generator, corresponding to the location of the sensor
labelled as ”GnNDe” and the signal was recorded sporadically between October 2022 and the failure on November 11 2023.
Each time series records acceleration data for the sensor and the corresponding frequency spectrum is computed from the raw signal through a Fast Fourier Transform (FFT) approximation. Figure 3 shows the spectrum for the signal recorded at the GnNDe (blue) and at the earliest available timestamp, 28-10-2023. We assume this to correspond to a state of ”normal operations”.
Topological analysis shows the point cloud
corresponding with GnNDe is not describing a torus, but rather a
semi-uniform ball, indicating non-periodic or very noisy behaviour. As a consequence, the persistence can only be interpreted
as a measure of how much clustered or diffused the data are in the parameters space,
while and higher-dimensional homology signals are expected to be
low and not significant. Indeed, the only noticeable trend in the
topological indicators is a decrease in persistence and an
increase in entropy, typically as a consequence of a progressively
less structured and more noisy signal. At a closer look, other sensor
signals seem more suitable for analysis. In particular, the intermediate and
high-speed stage sensors (GbxIss and GbxHss, respectively) show a more
periodic and regular behaviour. Indeed the high-speed front (GbxHssFr)
sensor shows a clear oscillating signal and a frequency
spectrum dominated by a peak at around and its multiples (orange spectrum in Figure 4). The embedded signal clearly shows a toroidal shape,
a ”filled” torus consisting of one main loop induced by the main frequency
component, and the direction orthogonal to the loop blown up by the noise.
The corresponding persistence diagram then shows a high persistence
point for and one at corresponding to the loop and proportional
to its size.
The analysis of the evolution of the GbxHssFr signal is not trivial. Figure 3 shows the time development of the most dominant peak in each of the three frequency bands shown in
Figure 4. We found that the frequencies do not shift
significantly until the time of the failure (not shown).
The corresponding peak heights and widths show a larger spread,
especially at the lowest frequency.
We also measure the evolution by computing the mutual distance between the vectors containing Fourier
coefficients for each time series. This distance becomes more evident between the signal in the early timestamps (i.e. normal
operations) and signals in a few specific days close to the failure,
in particular at 2023-10-08 and 2023-10-10, one and three days from the
failure, especially for the components included from 0 to .
We observe a similar behaviour in the skewness and kurtosis of the raw signal,
which show a slow decreasing trend, with a very high spike in the latter
at the timestamp 08-10-2023, 3 days from the point of failure, which was
not evident from the spectra alone. The monitoring
of kurtosis in the early detection
of bearing failures is well-known in the literature and it is likely
to be a good indicator in this case as well [kurt3, kurt2, kurt1].
The development of TDA indicators over time are shown in Figure 5 for GbxHssFr. Indeed most indicators show a sharp change around 08-10-2023, particularly the indicators that include the maximum persistence in dimension 1, e.g. , and . When applying the sliding windows approach of TDA and focusing on the short-term dynamics of the signal, the topological indicators are computed for short time windows () across one signal and then averaged (Figure 6). This deep dive allows us to expose the dynamics of the signal, how the topology of the point cloud changes on short timescales and, in turn, whether the signal frequencies are finely modulated. The sliding window analysis is in perfect agreement with the Fourier analysis and the kurtosis signal, where a sharp change is visible on 2023-10-08. The change in the TDA results can be ascribed to a change in the average frequency of the signal, leading to a shrinkage of the toroidal point cloud to the point of almost closing the ’hole’ of the torus (see Figure 7). This leads to a temporarily abrupt decrease in the persistence of the feature, and an increase in its entropy (entropy scales inversely with the smoothness of the manifold). There is also an apparent amplitude modulation of the raw signal which is hard to capture with TDA, but has been linked before with bearing failures in wind turbines [AM_CBM].
4.2 Gear-tooth failure
A gear tooth damage event was reported on a different wind turbine in the same
wind park in July 2023. The signal recorded for the sensor located closest
to the failure, Gbx1Ps, has a frequency spectrum fairly similar to that of
the high-speed sensor, GbxHssFr: dominated by few isolated frequency contributions.
The only significant feature we could identify in the data
is a drift in the peak width, similar to the case of
the bearing fault, starting around May 2023 (see Figure
8).
Interestingly, when integrating the spectrum in the frequency range
recommended by standard ISO 10816-3 (hereafter denoted Gbx1Ps.ECU2 where the signal is
demodulated between 500-2kHz with the
RMS broadband value between 1-150Hz).
it appears more evident that a sudden jump in the
signal of about 50% occurs between April and May 2023, as shown in Figure 8.
Following the same process as for the bearing fault, we focus on the high-speed gear
sensor GbxHssFr, which shows a more regular oscillation pattern (see Figure 9). We apply both the Fourier and TDA analysis to uncover any possible failure signature in the data.
Analogously to the bearing fault case, skewness and kurtosis show a drop, associated with an increase in the signal’s median, starting from around May 2023.
The topology of the data is again that of a ”filled” torus (Figure 9), which is topologically equivalent (homotopy equivalent) to a circle in 2 dimensions. This means that it should be possible to reduce the dimensionality of the Takens embedding to 2, without loss of information. We, therefore, focused on this reduced model for our analysis.
The sliding windows processing for the GbxHssFr signal reveals a change in most of the topological indicators (,
, , etc.)
at the same timestamp in April, and again more sharply only 2 days before
the failure in July 2023, as visible from Figure 10. Close to the failure there is an increase in the persistence and a decrease in the entropy,
signalling a change in the size of the loop when averaged across the
of the signal at a given timestamp, but not in its shape as the
Betti number indicator for dimension 1 remains stable.
In TDA, periodic functions get embedded in loops of a size proportional to the size of the sliding window [PereaHarer], therefore, a change in the size of the torus loop should correspond to changes in the period of the gearbox vibrations or some kind of frequency modulation, close to the failure event.
By looking at spectrograms for the GbxHssFr signal (Figure 11)
it is possible to recover some of the dynamics of the peaks in the spectrum. On one hand, at timestamps far from the failure, the spectra shift only slightly across the of the recorded signal, and mostly the peaks tend to change width with a timescale of a few seconds. On the other hand, close to the failure it appears that the two main peaks at and ”jump” as their relative height tends to oscillate on a 3-4 Hz timescale, i.e. about 40 times across the measurement duration in a jittering fashion. This frequency modulation should also
be noticeable in the TDA results, as the size of the loop in the point cloud should
change as well. Indeed, this becomes evident when looking at the
maximum persistence in dimension 1 () and in particular to the radius
of gyration () for the point cloud, defined as:
(5) |
where is the total number of points in the point cloud,
represents the position vector of the -th point,
denotes the position vector of the centre of mass of the point cloud.
The gyration radii for the data farther and closest in time to the gear tooth failure are shown in Figure 12 and manifests as a rapid oscillation in the gyration radius. This rapid modulation could indeed be a signature of imminent
equipment failure. Interestingly, we notice this kind of modulation is common in
other engineering disciplines, such as metal turning and machining, where is
a signature of ”chattering”, a pathological resonance in the turning process
[chatter0]. Unsurprisingly, TDA has been successfully
applied to chatter detection and it was shown to be useful in the early detection
and the machine learning identification of such anomalies is several industrial
settings [chatter1, chatter2, chatter3, chatter4].
5 Conclusion
In this study, we have explored the application of topological
data analysis (TDA) in conjunction with spectral analysis for
condition-based monitoring (CBM) of wind turbines. Our investigation
focused on analyzing vibration data
aiming to detect and diagnose potential faults in gearbox components.
Through TDA, we transformed raw vibration data into multidimensional point
clouds and leveraged topological indicators such as Betti numbers, persistence
diagrams, and entropy to characterize the underlying structure of the data.
We compared TDA with traditional spectral analysis methods and observed that
TDA offers complementary insights, particularly in identifying complex patterns
and anomalies that may not be apparent through conventional signal processing
techniques alone.
Our analysis revealed promising results in using TDA for fault detection and
diagnosis. In the case of bearing failure, we observed significant changes in
topological indicators, particularly in persistence and entropy, preceding the
failure event. Similarly, for gear-tooth failure, TDA highlighted distinct
changes in the structure of the point cloud, indicating the onset of damage.
Furthermore, by integrating spectral analysis with TDA, we were able to uncover
additional dynamics in the data, such as frequency modulation, which could
serve as early indicators of equipment deterioration.
These findings suggest the potential of TDA as a valuable tool for CBM in
wind turbines, offering a complementary approach to monitoring and diagnosing faults
and to proactive maintenance strategies in renewable energy generation.
While TDA is only slightly more computationally demanding than the more
traditional spectral analysis methods,
it offers additional
visual support by providing a manifold representing the data.
Changes in the manifold of data in phase space
correspond to changes in the vibration
dynamics of the system, as is well known from dynamical system theory
and therefore changes in the system’s health may be more easily inferred by
analyzing the shape of the data in addition to its spectral features.
Future research could explore the integration of TDA with machine learning
techniques for more robust fault detection algorithms.
Additionally, incorporating real-time monitoring capabilities could enhance the
practical applicability of TDA in industrial settings.
Acknowledgment
This publication has been funded by the SFI NorwAI, (Centre for Research-based Innovation, 309834). The authors gratefully acknowledge the financial support from the Research Council of Norway and the partners of the SFI NorwAI, in particular Aneo who shared their data.
Nomenclature
Note that this section is optional.
Topological Data Analysis | |
Condition Based Monitoring | |
Gearbox | |
Singular value decomposition | |
Ball bearing failure | |
Gear Tooth Failure | |
Root-Mean-Square |
ijphm