Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

Adrian Höhl

{}^{1*}

, Ivica Obadic

{}^{1,2*}

, Miguel-Ángel Fernández-Torres

{}^{3}

, Hiba Najjar

{}^{4}

, Dario Oliveira

{}^{5}

,
Zeynep Akata

{}^{6,7}

, Andreas Dengel

{}^{4}

and Xiao Xiang Zhu

{}^{1,2}

{}^{1}

Data Science in Earth Observation, Technical University of Munich (TUM)

{}^{2}

Munich Center for Machine Learning, 80333 Munich, Germany

{}^{3}

Image Processing Laboratory (IPL), Universitat de València (UV)

{}^{4}

University of Kaiserslautern-Landau and German Research Center for Artificial Intelligence (DFKI)

{}^{5}

School of Applied Mathematics, Getulio Vargas Foundation

{}^{6}

Institute for Explainable Machine Learning at Helmholtz Munich

{}^{7}

Interpretable and Reliable Machine Learning, Technical University of Munich (TUM)

{}^{*}

Shared first authorship

Abstract

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.

Index Terms:

earth observation, explainable AI (xAI), explainability, interpretable ML (IML), interpretability, remote sensing

I Introduction

Refer to caption — Figure 1: The number of publications of ML and xAI in EO, obtained by using the search query described in Section III-B and Appendix LABEL:apx:search_queries, differ by a factor of $\approx 70$ . (*Calculated amount of publications given the first ten months of the year and assuming a linear trend in 2023.)

Machine Learning (ML) methods have shown outstanding performance in numerous Earth Observation (EO) tasks [1, 2], but mostly they are complex and lack the interpretability and explanation of their decisions. In EO applications, understanding the model’s functioning and visualizing the interpretations for analysis is crucial [3], as it allows practitioners to gain scientific insights, discover biases, trustworthiness and fairness for policy decisions, and to debug and improve a model. Currently, the European Parliament is adopting an Artificial Intelligence (AI) Act to ensure that the methods developed and used in Europe align with fundamental rights and values such as safety, privacy, transparency, explicability, as well as social and environmental wellbeing [4]. It is anticipated that other governments worldwide will implement similar regulations [5]. Many applications in EO could potentially violate these values when data and AI are employed for analysis and decision-making. Nevertheless, explainable AI (xAI) can contribute to aligning these practices with rights and laws. Hence, xAI emerges as a promising research direction to tackle the above-mentioned scientific and regulatory challenges with observational data [6].
Despite these potential benefits, currently, there is a gap between the usage of ML methods in EO and the works that aim to reveal the workings of these models. This gap is illustrated in Figure 1, where the blue curve shows that the number of ML papers in Remote Sensing (RS) has drastically increased in the past few years. Although the works dealing with xAI in EO shown by the green curve have also increased rapidly, there is still a gap by a factor of $\approx 70$ compared to the number of papers of ML in RS. This increasing number of xAI in RS papers motivates us to summarize the existing work in the field and provide an overview to RS practitioners about the recent developments, which might lead to narrowing this gap and making xAI approaches more common in the field for RS.

xAI methods are typically designed to work on natural images. However, the RS imagery has different properties than natural images [7]. First, images are captured from above. This perspective comes with unique scales, resolutions, and shadows. For instance, a RS image can cover whole landscapes, thousands of square kilometers, while natural images can only cover a tiny fraction of it [8]. Second, RS captures images in other electromagnetic spectra, apart from the usual RGB channels. From hyperspectral over Synthetic Aperture Radar (SAR) to Light Detection and Ranging (LiDAR), RS does cover a wide range of reflectance data. Third, usual Red-Green-Blue (RGB) cameras are primarily passive, while RS can be active, which changes properties like the radar shadows, foreshortening, layover, elevation displacement, and speckle effects [9]. Besides the different image properties, the tasks differ as well. Computer Vision (CV) primarily addresses dynamic scenarios where objects can be spatially covered by others. In contrast, the observed processes in RS happen on different spectral and spatiotemporal scales, long-term and short-term, and the systems modeled are very complex and diverse. Often, the observed systems are not fully understood and can not be observed completely or only indirectly. For instance, weather events, hydrology, hazards, ecosystems, and urban dynamics. Although there are numerous reviews for xAI in the literature [10, 11, 12], they typically do not reflect on the works specific for RS nor reveal how the existing xAI approaches tackle the above challenges related to remote sensing data. Therefore, a review of xAI tailored to the field of RS is necessary to reveal the key trends, common objectives, challenges, and latest developments.

While current reviews of xAI in EO are focused on social, regulatory, and stakeholder perspectives [13, 14], specific subtopics in RS [15], or do not provide a broad literature database [16, 15, 17], this paper targets the applications and approaches from xAI in RS and follows a systematic approach to gather a comprehensive literature database. To this end, we conduct a systematic literature search for xAI in RS in three commonly used literature databases in RS. From this literature database, we identify the relevant papers and provide an overview of typical usages, objectives, and new approaches in the field. Additionally, we propose a categorization of the employed xAI methods, which allows to get a detailed overview and understanding of the xAI taxonomy and techniques. We found that xAI methods for RS are most often used for landcover monitoring, agriculture monitoring, and natural hazard monitoring. They cover 40% of all 207 papers, with 34, 26, and 26 papers, respectively. Overall, SHapley Additive exPlanations (SHAP) is the most frequent method, used in over 38% of all publications. However, we observe distinct variations of xAI methods in the EO tasks. For instance, besides local approximation methods (29%), backpropagation methods (23%), particularly Class Activation Map** (CAM) methods (18%), are very prominent in landcover monitoring. Conversely, behind local approximation (22%), perturbation methods (22%) are widely utilized in interpreting agricultural models. It should be noted, however, that most explanation outcomes are not evaluated quantitatively and over 90% of the authors only provide anecdotal evidence. Our work also includes a table with concise information about each paper included in our study. Furthermore, we discuss the alignment of the usage of xAI in RS with standard practices in xAI and the evaluation of xAI in RS. As such, this review aims to assist users in the field of EO for the usage of xAI. Finally, the combination of xAI with related fields, the unique properties of RS data, and the interpretability of Deep Learning (DL) and their implication into the lack of labels in RS are the challenges and limitations we identified while conducting this review.

In summary, this review provides the following main contributions:

•

First, we present an overview and categorization of current xAI methods.
•

Second, we summarize the state-of-the-art (SOTA) xAI approaches in RS through the analysis of a comprehensive literature database.
•

Third, we identify objectives and practices for evaluating xAI methods in RS.
•

Finally, we discuss challenges, limitations, and future directions for xAI in RS.

The remainder of this review is structured as follows. We discuss the related work and draw the distinction between existing xAI reviews and ours in Section II. The methodological approach for our systematic review and our research questions are introduced in Section III. Then, in Section IV, the taxonomy of xAI is clarified, a categorization of the methods is developed, and the most common xAI methods are explained. The results of this review are presented in Section V. They show the current state of xAI in the RS literature, trends, and novel approaches for RS tasks. Last but not least, Section LABEL:sec:discussion discusses these results and how they align with the recommended practices in xAI. Furthermore, connections to current challenges and limitations are drawn. In the end, we summarize the main takeaways and conclusions from our review and present an outlook.

II Related Work

Numerous resources for either the field of xAI [18, 10, 19] or ML for EO applications [1, 20] are available in the literature. In contrast, to the best of our knowledge, in the overlap** area of these two fields, there are only two reviews [13, 17]. [13] aims to summarize the existing works of xAI in EO and addresses the xAI usage from a regulatory and societal perspective, discussing the requirements and type of xAI that is needed in EO from policy, regulation, and politics [13]. However, only a small literature database is used in this study and the author relies on a high-level categorization of the xAI methods. As such, this work does not provide a comprehensive overview of current xAI approaches in EO. Further, the work of [17] categorizes the identified works in xAI according to the general challenges in the bio- and geosciences [17]. Their categorization of xAI properties and their emphasis on considering expert knowledge constitute two highlights of this review. Furthermore, the presented challenges are still faced by researchers today. Yet, they do not use a broad literature database to extensively cover the xAI approaches used in EO. There also exist xAI reviews for certain EO tasks or perspectives [14, 15, 16]. Still, they do not comprehensively summarize the work done in the broad research field of RS in EO because they focus only on specific parts of the literature. [14] conduct a review on the stakeholders and goals within human-centered xAI applied in RS [14]. Their findings indicate an underrepresentation of non-developer stakeholders in this area. [15] [15] review DL methods and investigate to which degree the methods can explain human wealth or poverty from satellite imagery. [16] [16] focus on xAI in conjunction with Deep Neural Networks (DNNs) that incorporate geographic structures and knowledge. They discuss three challenges when applying xAI to geo-referenced data: challenges from xAI, geospatial AI, and geosocial applications. Based on a short use case on land use classification and relying on SHAP explanations, they show that the geometry, topology, scale, and localization are of great importance. Finally, a group at Colorado State University published a survey on their work using xAI for climate and weather forecasting [21].

To overcome these shortcomings, we approach xAI in EO systematically and provide an extensive literature database, resulting in a comprehensive summary of the current literature. This work is characterized by our interest in the application and usage of xAI, putting a special emphasis on RS sensors, while others focus on the application of geological features, natural sciences, or social implications. Not only do we present an overview of the current challenges in the field, but we also highlight the state-of-the-art methods for tackling these challenges. We believe this could provide valuable insight into current limitations faced in the field. Compared to the existing literature, we look at the topic from a technical perspective without reflecting on regulatory or ethical implications originating from integrating xAI into the field of EO. Because of the contrasting approaches relative to xAI, this review excludes works from related and overlap** domains, such as physics-aware ML, uncertainty quantification, or causal inference. Instead, we refer the reader to overviews in their fields: [22], [23], and [24, 25], respectively.

III Research method

In this review, we aim for transparency and reproducibility of our work by following the PRISMA scheme [26]. We apply the appropriate PRISMA requirements for our field to provide a transparent, complete, and trustworthy review. Since literature search engines, such as Google Scholar, do not provide the same search results for all users (the results depend on the geographic location and time) [27], we avoided the use of such search engines and relied on databases where the results can be reproduced. In addition to the PRISMA guidelines, we provide the full search results and the exclusion criteria in the supplementary material to further improve transparency and reproducibility.

III-A Research Objective

The final goal of this survey is to summarize and compare the work conducted in the field of xAI and EO with a focus on RS. Furthermore, we want to show where and how the xAI methods are adopted, why they are used, and which problems arise when applying them. In particular, we will attempt to answer the following research questions:

•

RQ1 - Which explainable AI approaches have been used and which methods have been developed in the literature for EO tasks?
•

RQ2 - How are xAI explanations analyzed, interpreted, and evaluated?
•

RQ3 - What are the objectives and findings of using xAI in RS?
•

RQ4 - How do the utilized xAI approaches in RS align with the recommended practices in the field of xAI?
•

RQ5 - What are the limitations, challenges, and new developments of xAI in RS?

III-B Search procedure

The search query consists of two major parts: keywords related to xAI and keywords related to EO. All the nested keywords are connected via an $\operatorname{OR}$ operator, while the two parts are connected via the $\operatorname{AND}$ operator. Due to the interchangeably used taxonomy in both fields, we added additional keywords to the generally known terms to receive as many relevant papers as possible, attempting not to excessively increase the false positive rate. For example, we included specific types of remote sensing sensors and used wildcards to cover different ways authors might refer to xAI.

⬇

\Bigl{[}

Earth observation OR remote sensing OR earth science OR

\bigl{(}

(satellite OR aerial OR airborne OR spaceborne OR radar) AND (image OR data)

\bigl{)}

OR LiDAR OR SAR OR UAV OR Sentinel OR Landsat OR MODIS OR gaofen OR ceres

\Bigl{]}

AND

\Bigl{[}

xai OR

\bigl{(}

(interpret* OR explain*) AND (deep learning OR machine learning OR artificial intelligence OR dl OR ml OR ai OR model)

\bigl{)}

\Bigl{]}

This general search query was adapted to the different search filters in three databases: Scopus¹¹1https://www.scopus.com/, Springer²²2https://link.springer.com/, and IEEE³³3https://ieeexplore.ieee.org/. Appendix LABEL:apx:search_queries lists the exact search queries for each database.

The field of xAI started to gain momentum from 2014 [28]. However, we found that search results of xAI papers in RS before 2017 are rarely relevant, supporting that the methods have only been used in the last few years in this area. Therefore, we searched journal and conference papers between 01.01.2017 and 31.10.2023, which is the last time the search was executed in the databases. Hence, our review comprises more than the last six years. We considered papers discussing RS in EO, namely RS sensors mounted to aerial vehicles and satellites. We included all the available data from these sensors, as well as high-level and compound products, such as Digital Elevation Models (DEMs) and the ERA5 dataset [29]. By contrast, the filtering or exclusion criteria we followed can be summarized as follows:

•

EC1 - Publication unrelated to explainable AI
•

EC2 - Publication unrelated to Remote Sensing
•

EC3 - Review/survey/short conference paper
•

EC4 - Publication published before 2017
•

EC5 - Publication not written in English

The covered work discusses the interpretability of the method or the explanation results. Hence, it is not enough to only use a model which is interpretable by itself, like a decision tree or linear regression (EC1). We explicitly excluded in-situ measurements and pictures actively taken in the surrounding environment, which are non-RS products, as well as papers out of the scope of RS and xAI (EC2). Furthermore, review and survey papers were excluded (EC3) since they would be included in the related work section. Also, short conference papers that do not exceed a minimum of 5 pages are excluded (EC3) because they usually contain preliminary or incomplete results. The results of the search were filtered in three steps: (i) removing duplicates, conference abstracts, and reports, (ii) screening through the abstract, and (iii) screening through the full text. At least one author read each abstract and paper; whenever there was doubt, the other authors read the corresponding paper as well. In total, our search results in 1075 papers, merging the different sources and removing duplicates, left us with 964 papers. In the first shallow abstract screening, we removed 607 papers since they were unrelated to our review topic. Three papers had to be excluded due to a lack of access rights. After the full-text screening, we were left with 147 papers. These papers were accompanied by 60 papers we had in our libraries. The procedure is summarized in Figure 2. Reasons for removing a specific paper can be found in the supplementary material.

IV Explanation methods in ML

This chapter provides a general overview of xAI. We first present the taxonomy used to describe common distinctions between explanation methods. Then, we introduce our categorization of xAI methods. Subsequently, we describe in detail a selection of the methods commonly used in the field of RS. Finally, we give an overview of the different metrics proposed in the literature for evaluating these methods and present the main objectives of using xAI.

There exist several terms for explainable AI, such as interpretable Machine Learning (IML) and interpretable AI. These terms often refer to the same concept: the explanation or interpretation of AI models [30]. Hence, we will use them interchangeably in this paper.
In the literature, three common distinctions exist when categorizing xAI methods: ante-hoc vs. post-hoc, model-agnostic vs. model-specific, and local vs. global [12, 31, 32, 18]. The ante-hoc and post-hoc taxonomy refer to the stage where the explanation is generated. A xAI method that provides interpretations within or simultaneously with the training process is called ante-hoc. In contrast, a post-hoc method explains the model after the training phase using a separate algorithm that could even be distinct from the one followed by the model. Model-agnostic methods have the ability to generate explanations for any model. The method does not access the model’s internal state or parameters, and the explanations are created by analyzing the changes in the model’s output when modifying its inputs. Model-specific methods are exclusively designed for specific architectures and typically have access to the model’s inner workings. Local methods explain individual instances and the model’s behavior at a particular sample. In contrast, global methods explain the model’s behavior on the entire dataset. In practice, local explanations can be leveraged to achieve global explanations. Through aggregation over a set of input instances chosen to represent the dataset, local explanations can provide insights into the general behavior of the inspected model. The aggregation mechanism should be carefully defined since straightforward aggregation on some xAI methods might lead to erroneous results. Some researchers further investigated the question of finding meaningful aggregation rules of local explanations [33, 34]. The methods presented in this review are categorized into the three universal categories introduced above, as shown in Figure 3.

IV-A xAI methods categorization

In this section, we introduce a categorization of the most important xAI methods in the literature. We build on the categorical foundation of [10] and further adjust these categories to capture a larger set of xAI methods. The categories are structured in a tree-like design. The tree has two internal layers, which describe a hierarchy of primary and secondary categories and a leaf layer which indicates an individual or group of specific methods. In the following, a high-level overview of the categories of xAI methods is explained following the structure of Figure 3. Our four primary categories are feature attribution, distillation, intrinsic explanations, and contrastive examples. Feature attribution methods highlight the input features, significantly influencing the output. Alternatively, distillation builds a new interpretable model from the behavior of the complex model. Intrinsic methods focus on making the model itself or its components inherently interpretable. Lastly, contrastive examples concentrate on showing simulated or real examples and allow an explanation by comparing them. Each of these categories splits into 2 to 3 subcategories.

For the sake of completeness, it should be noted that feature selection methods are not considered in our categorization and are listed separately in the results. Even though they are related to feature interpretation, they serve different purposes. Feature selection can be defined as a strategy to reduce the dimensionality of input space to improve the model performance and reduce its computational cost [35]. While feature selection can constitute the first step in a ML pipeline, feature interpretation is usually the last step and typically involves more advanced techniques than only looking at the predictive performance. Therefore, feature selection and xAI can be complementary. On the one hand, feature selection reduces the input space that needs to be interpreted. On the other hand, xAI can provide more qualitative insights for selection, such as uncovering a bias introduced by a feature.

{forest}

[xAI, for tree= child anchor=west, parent anchor=east, grow’=east, minimum size=.65cm, draw, rounded corners, anchor=west, edge path= [\forestoptionedge] (.child anchor) -— +(-5pt,0) – +(-5pt,0) —- (!u.parent anchor)\forestoptionedge label; , where level=0text width=2cm, where level=1text width=3.4cm, where level=2text width=4cm, where level=3text width=7cm, [Feature attribution, [Backpropagation, [Activation Maximization [36], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Gradient [37], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Integrated Gradients [38], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Deconvolution [39], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Class Activation Map** (CAM) [40]
and variants (e.g. Grad-CAM [41]), tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Layer-wise Relevance Propagation (LRP) [42], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] [Perturbation, [Occlusion Sensitivity [43], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Partial Dependence Plot (PDP) [44], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Accumulated Local Effects (ALE) [45], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] ] [Distillation [Local Approximation [Local Interpretable Model-agnostic Explanation (LIME) [46], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [SHapley Additive exPlanations (SHAP) [47], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] [Model Translation [Rule [48, 49], Tree [50, 51, 52], Graph [53, 54], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=black, inner sep=2.5pt] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] ] [Intrinsic [Interpretable by Design [Decision Rule and Decision Tree [55], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] [Linear Regression [55],
Generalized Linear Model (GLM) [56],
Generalized Additive Model (GAM) [57], tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] [Latent Dirichlet Allocation (LDA) [58], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] ] [Embedding Space [Attention Mechanism [59, 60], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] [Activation Assessment, tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] [Concept Discovery (e.g. TCAV [61], ACE [62]), tikz=\node[draw=black, diamond, fill=blue!30, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] [Joint Training [Explanation Association [63, 64], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] [Prototype Learning [65, 66, 67], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] [Model Association (e.g. text explanations [68]), tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=blue!30, inner sep=2pt, ] at([xshift=-25pt].north east);] ] ] [Contrastive Examples [Counterfactuals [[69] [69], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] [Example-based [[70] [70], tikz=\node[draw=black, diamond, fill=black, inner sep=1.5pt] at([xshift=-5pt].north east);\node[draw=black, rectangle, fill=blue!30, inner sep=2.5pt, ] at([xshift=-15pt].north east);\node[draw=black, circle, fill=black, inner sep=2pt, ] at([xshift=-25pt].north east);] ] ] ] \node[below=1ex,draw,rectangle, rounded corners, inner sep=5pt] at (current bounding box.south) ; \node(BB)[use as bounding box, inner sep=0, outer sep=0, fit to=tree]; \useasboundingbox(BB.north west)rectangle([yshift=-3ex]BB.south east);

Figure 3: Categorization of xAI methods based on [10] [10]

IV-A1 Feature attribution

Feature attribution methods rely on the trained ML model to estimate the importance of the input features. Depending on whether the explanations are generated by inspecting the model internals or by analyzing the changes in the model’s output after modifying the input features, the methods in this category are further split into backpropagation and perturbation methods. The output of these methods is commonly a saliency plot which determines the contribution of the input features to the model prediction. In the case of imagery inputs, the output of these methods can be visualized as a heatmap, also called a saliency map, which highlights regions relevant to the model prediction.

Backpropagation

These methods leverage the inherent structure of DNNs and estimate the relevant features by propagating the output values at the top or intermediate layers of the network to the input features. In the majority, these methods compute gradients to this end. The Deconvolution method, also known as Deconvolutional Neural Network (NN) [39], is designed to reverse the convolutional operations from Convolutional Neural Networks (CNNs). Reconstructing the input space from the feature maps of the CNN allows visualizing which information was learned and how the input is transformed across different network layers. Similarly, Layer-wise Relevance Propagation (LRP) [42] calculates relevance scores for individual input features through layerwise backpropagating the neuron’s activations from the output, utilizing specialized propagation rules. The scores indicate the significance of the connection between input and output. Various adaptations have been proposed which apply different propagation rules based on the design of the networks [71, 72, 73]. In contrast, the Gradient or Saliency method [37] uses the partial derivative with respect to the input to create the attribution maps. Rather than computing the gradient once, Integrated Gradients [38] calculates the integral of gradients with respect to the input features along an interpolation or path defined between a baseline input and the instance to be explained. CAM [40] visually explains CNNs through attribution heatmaps by introducing a global pooling layer right before the top fully connected one. Using the weights of the latter layer for a particular class, the heatmap is generated by computing the weighted average of the activation maps in the last convolutional layer before being upsampled to match the size of the input tensor for explainability purposes. One extension of CAM is Gradient-weighted Class Activation Map** (Grad-CAM) [41], which replaces these weights by the gradient of the output with respect to the last convolutional layer, thus removing the original requirement of a final global pooling layer.

Perturbation

The perturbation methods assess feature importance by measuring the sensitivity of model predictions to changes in the input features. These methods are distinguished by how the features are perturbed. Among others, perturbations include blurring, averaging, shuffling, or adding noise. For example, the Occlusion method [43] tries to remove a feature by occluding the features with a neutral value. Permutation Feature Importance (PFI) [74] permutes the features along their dimension, destroying the original relationship between input and output values. The Partial Dependence Plot (PDP) [44] method is designed to show the average influence of a single input feature on the decision while marginalizing the remaining features, which are fixed. Therefore, it assumes feature independence. A similar approach called Accumulated Local Effects (ALE) [45] can handle correlated features by averaging over the conditional distribution.

IV-A2 Model distillation

Model distillation methods approximate the predictive behavior of a complex model by training a simpler surrogate model that is usually interpretable-by-design. By replicating the predictions of the complex model, the surrogate model offers hypotheses about the relevant features and the correlations learned by the complex model without providing further insights into its internal decision mechanism. Distillation approaches are categorized into a) local approximation methods, which train the surrogate model in a small neighborhood around an individual local example, and b) model translation methods, which replicate the behavior of the complex model over the entire dataset.

Local approximation

Approaches in this category focus on explaining individual predictions of the complex model by inspecting a small neighborhood around the instances to be explained. In contrast to the backpropagation and perturbation methods, which operate on the raw input features, the local approximation approaches transform the input features into a simplified representation space, such as superpixels for imagery inputs. A prominent approach in this category is Local Interpretable Model-agnostic Explanation (LIME) [46]. It creates a new dataset in the neighborhood of the target instance by perturbing its simplified representation. Next, an interpretable surrogate model is trained to approximate the predictions of the complex model on this newly created dataset. Hence, the explanation for the complex model is distilled to the interpretation of the surrogate model. A similar strategy is employed by the SHAP framework [47]. Concretely, [47] [47] introduce Kernel SHAP, which utilizes the LIME’s framework under specific constraints to obtain the feature importance by approximating their Shapley values, a method grounded in game theory for estimating the player’s contribution in cooperative games [75]. While Kernel SHAP is a model-agnostic approach, Shapley values can also be approximated with model-specific approaches such as Deep SHAP [47] for neural networks and Tree SHAP [76] which enables fast approximation of these values for tree-based models.

Model translation

All the model’s decisions on the entire dataset are approximated into a simple surrogate model. Typically, the interpretable-by-design methods summarized in the next section are used as surrogate models, such as rule-based [77, 78], tree-based [50, 79] or graph-based [53, 80].

IV-A3 Intrinsic

ML models, which provide an explanation by themselves based on their structure, components, parameters, or outputs. Alternatively, a human-interpretable explanation can be obtained by visualizing them.

Interpretable-by-Design

These methods are interpretable by humans because of their simplicity in design, architecture, and decision process. Decision Rules are hierarchical IF-THEN statements, assessing conditions and determining a decision. Fuzzy rules [81] are designed to address uncertainty and imprecision, which are frequently encountered in nature and classical precise rules struggle to represent, e.g., by partial membership of classes (fuzzy sets) [82]. Since the proximity of these rules to natural language, they are interpretable by humans. The Decision Tree [83] greedily learns decision rules. Their internal structure is a binary tree, where each internal node is a condition and each leaf is a decision.

Generalized Additive Models (GAMs) [57] are statistical modeling techniques that approximate data using an additive function: $g(E(y))=f_{1}(X_{1})+f_{2}(X_{2})+...+f_{i}(X_{i})+\beta_{0}+\epsilon$ , where $f_{i}$ are smooth, non-linear functions transforming the features $X_{i}$ , $\beta_{0}$ the intercept coefficient, and $\epsilon$ represents the error term. On the other hand, Generalized Linear Models (GLMs) [56] consider a linear relationship defined by means of a specific distribution. Each $f_{i}$ becomes a coefficient $\beta_{i}$ and the function $g$ now computes a weighted sum of features, which allows representing the mean of various exponential-family distributions. A simple example of this model type is Linear Regression (LR), which assumes a Gaussian data distribution where the function $g$ is the identity. In this context, the model explanation can be obtained by examining the coefficients.

A well-known approach for generative probabilistic topic modeling is Latent Dirichlet Allocation (LDA) [58]. A dataset is assumed to be organized in corpora or collections, and each collection contains discrete units, such as documents comprised of words. A distribution of these units characterizes both the collections and the topics. By analyzing the proportions of the units in the collections, LDA estimates the underlying topics. The decisions of the Latent Dirichlet Allocation (LDA) model and the identified topics can be interpreted by examining the predicted proportions for each collection.

Embedding space

These approaches process the activations in the latent space of DNNs to interpret its workings. The Attention Mechanism [59, 60] creates high-level feature representations by using the attention weights to model the dependencies between the different elements in the input. Hence, visualizing the attention weights is a common procedure to assess the relevant features for the model decisions. Activation Assessment analyzes the activations in the latent space of a NN based on projection techniques. Commonly used are dimensionality reduction approaches [84] (e.g., t-distributed Stochastic Neighbor Embedding (t-SNE) [85] and Uniform Manifold Approximation and Projection (UMAP) [86]) or neuron receptive fields in CNNs [87]. Concept-based Explanations summarize the activations in the latent space in terms of interpretable, high-level concepts. The concepts represent visual patterns in computer vision, and they are either user-defined in Testing with Concept Activation Vectors (TCAV) [61] or learned in an unsupervised manner [62]. These approaches provide global explanations by quantifying the concept relevance per class.

Joint training

This category provides ante-hoc explanations by introducing an additional learning task to the model. This task is jointly optimized with the original learning objective and is used as an explanation. The methods in this family typically differ based on the explanation representation and how the additional task is integrated with the original model.

Explanation association methods impose the inference of a black-box model to rely upon human-interpretable concepts. A prominent approach in this category is the concept bottleneck models that represent the concepts as neurons in the latent space of the model. They introduce an additional task, first predicting the interpretable concepts in an intermediate layer of a DL model. Then, the model predictions are derived based on these concepts. During training, a regularization term is added to the loss function that enforces alignment of the latent space according to the interpretable concepts [63]. To estimate the concept importance directly, [64] predict the final output by linearly combining the concept activation maps [64].

Prototype learning approaches aim to identify a set of representative examples (prototypes) from the dataset and enable an interpretable decision mechanism by decomposing the model predictions based on the instance’s similarity with the learned prototypes [65]. Thus, visualizing the prototypes enables global model interpretability, while the similarity with an input instance offers local model explanations. One popular approach for prototype learning on image classification tasks is the ProtoPNet architecture introduced by [66] [66]. The prototypes represent image parts and are encoded as convolutional filters in a prototype layer of the proposed network. Their weights are optimized with the supervised learning loss of the network and additional constraints that ensure both the clustering of the prototypes according to their class and the separability from the other classes. One extension of this approach is the Neural Prototype Trees [67], which organizes the prototypes as nodes in a binary decision tree. Each node computes the similarity of the corresponding prototype with an instance. These similarities are used to route the instance towards the leaves of the tree containing the class predictions.

In contrast to the previous approaches, which encode the explanation within the model that performs inference for the original learning task, the model association methods introduce an external model that generates explanations. These approaches are often utilized to provide textual explanations for CV tasks. An example of such an approach is presented by [68], who derive text explanations for self-driving cars based on jointly training a vehicle controller and a textual explanation generator [68]. The vehicle controller is a CNN model that recognizes the car’s movements with spatial attention maps. Next, the explanation generator, which is a Long Short-Term Memory neural network (LSTM) model, processes the context vectors and the spatial attention maps from the controller to produce the text explanation.

IV-A4 Contrastive Examples

Methods within this category provide alternative examples to an input instance and allow obtaining an explanation by comparing them. Usually, examples that are close to each other in the input space yet lead to a different outcome than the original input instance are shown.

Counterfactuals

This explanation type aims to discover the smallest change required for an instance to achieve a predefined prediction. Essentially, they answer the question, ”Why does it yield output X rather than output Y?” which is very close to human reasoning [88]. They have a close proximity to adversarial examples, although their objectives differ significantly. Adversarial examples usually want to achieve a confident prediction with a minimal perturbed instance, whose change should remain imperceptible for humans. Conversely, counterfactuals aim to provide a diverse set of examples and should allow representing the decision boundary of the model. [69] [69] introduce an optimization problem whose primary objective is to find a counterfactual, denoted as $x^{\prime}$ , which is as close as possible to the original input $x$ . As such, the distance function $d(\cdot,\cdot)$ between the counterfactual and the original input should be minimized. The corresponding loss function is as follows:

L(\lambda,x_{i},x^{\prime},y^{\prime})=\lambda(f_{w}(x^{\prime})-y^{\prime})^{% 2}+d(x_{i},x^{\prime}),

(1)

where $y^{\prime}$ represents the desired output, $f_{w}$ is the model with fixed weights, and $\lambda$ serves as a regularization parameter to balance the contribution of the proximity of the original output to the desired output $y^{\prime}$ with respect to the similarity between counterfactual and input.

Example-based explanations

Unlike counterfactuals, which can generate artificial instances, example-based explanations usually present existing ”historical” training instances and showcase similar instances to the input under consideration [84]. The user can connect, correlate, and reason based on the analogies. The explanatory approach aligns with case-based interpretable-by-design model explanations, e.g., k-Nearest-Neighbors (kNNs). For example, [70] [70] train a skip-gram model and evaluate the model using the nearest neighbors determined by the distances in the embedded space. Furthermore, they illustrate that the acquired word representations have a linear relationship, allowing for the computation of analogies through vector addition.

IV-B Common Explanation methods in RS

In this section, we provide an in-depth explanation of several popular xAI methods in the field of RS.

IV-B1 Class Activation Map** (CAM) and Grad-CAM

[40] introduce CAM [40] to obtain visual heatmaps for class discriminative localization in CNNs. This technique aims to identify important regions within an image that influence the model’s decision toward each of the $C$ classes. The approach requires the presence of a global average pooling layer between the last convolutional layer and the top fully connected layer of the CNN architecture. To calculate the class activation maps, the weights $w_{c,k}$ at the fully connected layer associated with a particular class $c$ are used to estimate the importance of each feature or activation map $A_{k}$ at the input of the global average pooling layer. Finally, the saliency map highlighting the discriminative regions for the class $c$ is computed as the weighted sum of these $K$ activation maps with the following equation:

\operatorname{S}^{CAM}_{c}=\sum_{k=1}^{K}w_{c,k}A_{k}

(2)

[41] generalize CAM to a broader range of CNN architectures by introducing Grad-CAM [41]. In contrast to CAM, which makes use of the weights corresponding to class $c$ , Grad-CAM utilizes the average gradient of the logit $y_{c}$ for each of the $K$ feature maps in the last convolutional layer to assess their importance for class $c$ . Additionally, a ReLU function is applied, propagating only the positive values. Therefore Eq. 2 becomes:

\operatorname{S}^{Grad-CAM}_{c}=\operatorname{ReLU}\bigg{(}\sum_{k=1}^{K}\Big{% (}\frac{1}{N}\sum_{i}\sum_{j}\frac{\partial y_{c}}{\partial A_{i,j}}\Big{)}A_{% k}\bigg{)},

(3)

where $(i,j)$ stand for the spatial coordinates of each of the $N$ locations in every activation map $A_{k}$ . Because the convolutional layers of the networks usually reduce the input size, the activation maps need to be upsampled to the original input size. However, this upsampling process can lead to imprecisions and other backpropagation methods produce more fine-grained heatmaps, which directly attribute to each pixel, such as Layer-wise Relevance Propagation (LRP) or Integrated Gradients (IG). Nevertheless, recent method revisions, such as LayerCAM [89], have aimed to produce more detailed heatmaps.

IV-B2 Occlusion senstivity

[43] propose a perturbation-based method [43] to visualize the importance of different image regions. Their approach slides a patch over the image and observes the sensitivity of the model’s prediction. Different values for the patch, its size, and sampling techniques can be considered. The assigned importance is directly proportional to the drop in performance of the model after occluding the patch. Consequently, this method is model-agnostic and can be applied to any kind of architecture.

IV-B3 Local Interpretable Model-agnostic Explanation (LIME)

[46] proposed LIME [46], a model-agnostic method that approximates the behavior of a complex model locally, in the neighborhood of a target instance. Concretely, to explain the prediction of a complex model $f$ for a target instance $x$ , LIME performs the following main steps: 1) a dataset is created around the neighborhood of $x$ by randomly performing perturbations on it (e.g., adding noise, hiding or blurring parts of the input, etc.), 2) an interpretable by-design surrogate model $g$ is trained on this dataset and 3) the internals of $g$ are inspected to provide an explanation. It is important to note that LIME generates explanations on a simplified representation space that is interpretable to humans. For example, when the input is an image, the simplified representation can correspond to a binary vector, indicating the presence of superpixels decomposing the image. Therefore, in the first step, LIME creates a dataset by perturbing the simplified representation of $x$ . This dataset is labeled according to the predictions of the complex model $f$ on the perturbed instances (the perturbed instances are reverse-transformed into the original input representation before feeding it into the complex model). In the second step, the surrogate model $g$ is trained on this dataset by weighting the perturbed samples based on their similarity with $x$ . Finally, the internals of $g$ are inspected to explain the prediction of $f$ for $x$ . For instance, in case $g$ is a linear regression model, its coefficients can be used to assess the feature importance. In contrast, if $g$ is a decision tree, inspecting its rules can serve to explain the predictions of the complex model $f$ .

IV-B4 SHapley Additive exPlanations (SHAP)

The Shapley values [75] constitute an approach from cooperative game theory used to estimate the importance of the input features for the prediction of a ML model $f$ on a local instance, as well as their average marginal contribution across all possible coalitions of features. Concretely, given an instance $x$ , the Shapley value $\phi_{i}$ for feature $i$ is computed as follows:

\phi_{i}(f,x)=\sum_{S\subseteq F\backslash\{i\}}\frac{\left|S\right|!\left(F-% \left|S\right|-1\right)!}{F!}\left[f_{x}\left(S\cup\{i\}\right)-f_{x}\left(S% \right)\right],

(4)

where $F$ is the set of all input features, $S$ is a coalition of features, and $f_{x}\left(S\right)$ is the marginalized prediction over the features not included in $S$ while the features in the coalition $S$ take the values of the instance $x$ . The Shapley values are considered to fairly assign the contribution of the input features to the model prediction as they satisfy the following properties: 1) the sum of the feature contributions adds up to the difference between the model prediction for the instance $x$ and the average model prediction on the dataset (efficiency), 2) the contributions of two features are the same if they equally contribute to all coalitions (symmetry), 3) a zero contribution is assigned to the features that do not change the model prediction (dummy), and 4) the Shapley value of a feature for an ensemble of ML models can be computed by aggregating the individual Shapley values across the models in the ensemble (additivity) [19].

The analytical solution for the Shapley values can be a computationally expensive operation for models trained on more than a few features, as Eq. 4 requires iterating over all possible feature coalitions and computing the marginal contribution in each coalition. Therefore, in practice, the Shapley values are estimated with approximation techniques. [47] introduce the SHAP framework [47] as a unified approach for model interpretability based on the family of additive feature attribution methods. This family represents the explanation through the coefficients of a linear model and has a similar set of desired explanation properties to the ones of the Shapley values. Although other xAI methods like LIME, LRP, and Deep Learning Important FeaTures (DeepLIFT) [90] can be represented in the form of additive feature attribution methods, the only additive method that satisfies the desired explanation properties is the one having the Shapley values as its linear coefficients (referred as SHAP values). This formulation enables the approximation of the Shapley values with a model-agnostic approach based on LIME. Concretely, the authors introduce Kernel SHAP, an approach that estimates the Shapley values by constraining the LIME method to rely on a linear model as a surrogate model and to use a specific similarity function for the weighting of the perturbed instances. To leverage the internal structure of the ML models for a fast approximation of the Shapley values, the authors also propose model-specific approaches, namely Deep SHAP for DNNs and Tree SHAP [76] for tree-based models.

IV-B5 Attention mechanism

Attention mechanisms represent an integral component of neural networks that mimics cognitive attention [91]. Although initially proposed for natural language processing tasks like machine translation and sentiment analysis [59, 60], they have also been leveraged recently in CV applications [92], as well as for graph-structured data [93]. These modules induce attention weights on the input features, determining how a neural network combines them to produce a high-level feature representation. Thus, visualizing the attention weights is a well-known approach to understanding the relevant features for the model predictions and assessing the interaction of the input features in the context of the learning task [94, 95].
An attention mechanism is usually defined for a query $\bf q$ and matrices of key and value pairs $\mathbf{K}=[\mathbf{k}_{1},...,\mathbf{k}_{L}]$ and $\mathbf{V}=[\mathbf{v}_{1},...,\mathbf{v}_{L}]$ , respectively. It outputs a high-level representation $\bf c$ depending on the values and based on the alignment of the keys $\bf K$ with the query $\bf q$ . The alignment function is specified according to the attention weights and can be computed using various functions proposed in the literature [96]. One of the most widely used mechanisms is the scaled dot-product attention introduced by [60] in [60], where attention weights $\alpha$ are computed as follows:

\alpha=\operatorname{softmax}(\frac{\mathbf{q}\mathbf{K}^{T}}{\sqrt{d_{k}}}),

(5)

being $d_{k}$ the embedding dimension of the keys. Then, the high-level value representation $\bf v$ is computed as a linear combination of the attention weights and the value vectors $\bf c=\alpha\bf V$ .

IV-C Evaluation of xAI Methods

Evaluating explanation quality and its trustworthiness is an essential methodological challenge in xAI, which has received considerable attention in recent years. The existing evaluation strategies can be categorized into (i) functional approaches based on quantitative metrics and (ii) user studies [97].

IV-C1 Functional evaluation metrics

These metrics assess the explanation quality by quantitatively describing to which extent an explanation satisfies a certain set of desired properties. A commonly evaluated explanation property is faithfulness (also called correctness) [98], which asserts how close an explanation method approximates the actual model workings. Various functional metrics are proposed to evaluate explanation faithfulness. For example, metrics based on randomization tests are introduced in [99] to evaluate the explanation sensitivity to randomization in model weights and label permutation. The results reveal that most of the evaluated backpropagation methods do not pass these tests. Another common approach to evaluate explanation faithfulness is based on the perturbation of the input features. For instance, [100] measures the changes in the model output after perturbation of the supposedly important features, as estimated by the explanation method. Next, metrics based on perturbation are also used to evaluate other explanation properties like robustness (also referred to as explanation sensitivity) which inspect the impact of small perturbations of the input features on the resulting explanation [101, 102]. Explanations with low sensitivity are preferred as this indicates that the explanation is robust to minor variations in the input. However, it is worth noting that the perturbation-based metrics might result in examples with different distribution than the instances used for model training, which questions whether the drop in model performance can be attributed to the distribution shift or to the perturbation of the important features. To address this issue, [103] [103] propose model retraining on a modified dataset where a fraction of the most important features identified by the xAI method is perturbed. Further, an improved evaluation strategy that avoids the need for model retraining based on information theory is presented by [104] [104]. Additionally, other evaluation approaches also include properties like localization, which determines whether the explanation identifies the ground-truth region of interest, or complexity, with the aim of measuring the sparsity of an explanation [101]. For a detailed categorization of the different explanation quality properties and a summary of the approaches used in the literature to functionally evaluate them, the reader is referred to [98].

IV-C2 User studies

Conversely, experiments in which humans evaluate the quality of explanations can be conducted. User studies in [97] are further categorized into an application-grounded evaluation and human-grounded evaluation, depending on the evaluation task, the type of participants, and the considered explanation quality criteria.

On the one hand, Application-grounded evaluation studies typically involve domain experts who evaluate the explanation in the context of the learning task. For instance, the authors in [105] perform a case study where physicians assess the predictions of a CNN model that classifies electrocardiogram heartbeats. This study measures physicians’ agreement with the CNN model when they are presented with (i) explanations computed with the LIME method and (ii) nearest-neighbour examples obtained from the latent space of a trained CNN model. The results indicate higher agreement for the nearest-neighbor example explanations, which simultaneously enable the physicians to relate model outcomes to clinically relevant concepts in contrast to the LIME feature importance visualization.

On the other hand, human-grounded evaluation consists of user studies where the participants are typically non-domain experts that evaluate more general notions of explanation quality. For example, [106] [106] evaluate whether explanations computed with the LRP method help lay users recruited via an online crowdsourcing platform to understand the decisions of CNN model for image classification. The study’s conclusions identify a significant statistical result, which shows that users predict the model decisions with higher accuracy when presented with LRP explanations. A detailed survey on the user studies for xAI methods evaluation is presented in [107].

IV-D xAI Objectives

In this study, we group the objectives of utilizing xAI in RS according to the following four reasons for using xAI, as defined in [30]: (1) explain to justify, (2) explain to control, (3) explain to discover, and (4) explain to improve. Explain to justify is motivated by the need to explain individual outcomes, which ensures that the ML systems comply with legislations, such as enabling users the ”right to explanations”. Furthermore, the explanations can enable a detailed understanding of the workings of the ML model. Hence, the explain to control objective is relevant for assessing model trustworthiness and can help to identify potential errors, biases, and flaws of the ML model. These insights can be used to discover scientific knowledge and new insights about the underlying process that is modeled with the ML system or to further improve the existing model. [108] classify the improvement techniques based on the xAI insights into (1) augmenting the input data, (2) augmenting the intermediate features, (3) augmenting the loss function, (4) augmenting the gradient, and (5) augmenting the ML model [108]. For the sake of completeness, we consider adapting existing xAI methods as another improvement strategy.

V XAI in RS

Here we summarize the research on xAI in RS and answer the research questions RQ1, RQ2, and RQ3. First, we bring attention to the common practices and highlight new approaches (RQ1). Next, we summarize the methodologies to understand and evaluate the model explanations (RQ2). Further, we outline the common research questions across the different RS tasks that practitioners aim to answer with xAI (RQ3).

V-A RQ1: Usage/Applications of xAI in Remote Sensing

Table I contains the full list of publications included in our study. The papers are arranged by considering groups for the different EO tasks (see Appendix LABEL:sec:glossary) and following the xAI categorization shown in Figure 3. xAI methods, objectives, and evaluation types are also listed in the table.

V-A1 Application of existing xAI methods to RS

Figure 4 shows the number of papers using a xAI method grouped by the categories introduced, while Figure LABEL:fig:xai_models_categories_methods (see Appendix LABEL:sec:add_eo_task_plots) illustrates all the combinations of model and methods in the literature. Local approximation methods are the most frequently used in over 94 publications. Through their popularity, they are used on the most diverse set of EO tasks and models. In most cases, they interpret tree-based models (i.e., Random Forest (RF) and tree ensembles), followed by CNNs and Multilayer Perceptrons (MLPs). Also, with over 65%, most of these publications rely solely on local approximation methods without evaluating other methods. Backpropagation methods follow with a small gap of 72 papers, leveraging mostly CAM variants. While a lot of CNN architectures are interpreted, most time series models, like the LSTM, are also located here. In contrast, most papers using transformers are among the 35 publications that leverage embedding space interpretation techniques, which was to be expected since attention is already the centerpiece of the architecture. Many publications in this category also assess the models’ feature space. Followed by a large gap to the SHAP and LIME methods, 56 publications use various perturbation methods. The number of publications is fairly well proportioned between the methods, but PFI is the most widely used, followed by occlusion and PDP. 15 publications leverage a diverse set of interpretable-by-design models. Although Generalized Linear Models (GLMs), particular LR, are the most common, newer models like the Explainable Boosting Machines (EBM) are gaining recognition. Overall, only a few papers employ joint training, like prototypes or explanation associations. Even less popular are model translations, counterfactuals, and example-based explanations.

The Mean Decrease in Impurity (MDI) or Gini importance is often used in feature selection for global importance measurements and can be easily obtained for tree-based methods [109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]. This shows that xAI and feature selection have fluent boundaries, as the Gini index is used for feature selection, i.e., when deciding about the depth in the tree with the purity of a split, but also allows for the interpretation of the decision process of the model. Nevertheless, we will refer to them as feature selection methods.

Approaches in common EO tasks

To provide a deeper insight, we will now discuss in detail the three most representative EO tasks according to the number of papers, which are landcover map**, agriculture monitoring, and natural hazard monitoring. The graphs in Figure 5 show these tasks and the frequency of the combination in the usage of ML models and xAI methods. Figure LABEL:fig:xai_models_categories_methods in Appendix LABEL:sec:add_eo_task_plots provides a detailed overview of the usage patterns for all the EO tasks considered in this study.

Landcover monitoring
Due to the well-established datasets, the most prominent EO task is often employed for evaluating or develo** new xAI methods. Further, a low amount of expert knowledge is needed to elucidate the model outputs for this task compared to others, such as atmosphere monitoring or ecosystem interactions. The majority of the 36 landcover map** papers present CNN architectures, leading to the favored use of backpropagation methods, especially from the CAM family. For example, [126] leverage Grad-CAM to identify the reasons for misclassified images [126], while [127] assess transfer learning and the interpretation resilience with CAM [127]. The numerous publications proposing novel or evaluating methods can be found in the corresponding Sections V-A2 and V-B2, respectively. Albeit with lower frequency than the CAM approaches, the workings of the CNN models are also interpreted based on the local approximation approaches. For example, SHAP is utilized to reveal the salient pixels and the global importance of the spectral bands in [128], LIME is used to improve model performance on the misclassified examples in [129], and both approaches are utilized in [130] to validate the model predictions.

The second outstanding ML model for landcover monitoring is RF and exclusively model-agnostic xAI methods are used for its interpretation. [112] explain the correlations regarding land use for tea cultivation through LIME [112]. The most frequently used approach, SHAP, is leveraged in [131], while PFI is explored in [110] and [132] to rank reflectance data from various satellites and DEM information. The DEM information seems to be valuable since both [131] and [132] identify slope as the most important feature, while elevation is ranked the highest in [110].

Unique to this task, the less common counterfactual and example-based methods are applied to CNNs. [133] utilize a Generative Adversarial Network (GAN) to generate counterfactual time series for the Normalized Difference Vegetation Index (NDVI) of different landcover classes. The generator introduces perturbations to instances and attempts to change the prediction of the pre-trained classifier. Meanwhile, a discriminator ensures the quality of the newly generated instance [133]. Moreover, [134] show examples of the dataset belonging to the same class by computing the similarity in the latent space of the last layer. These examples can be used to identify Out-Of-Distribution (OOD) instances or cases where the classifier lacks generalization [134].

Agricultural monitoring
The primary tasks in this group are crop yield prediction and crop type classification. Two additional tasks are tackled: irrigation scheme classification [135] and lodging detection [136]. The methodologies employed revolve around model-agnostic and backpropagation approaches, namely perturbation, local approximation techniques, and CAM methods. These methods are frequently integrated with gradient boosting and RF tree-based models. Gradient boosting models are consitently interpreted with SHAP [137, 136, 138, 139, 140]. For example, [136] combine gradient boosting with SHAP to identify maize lodging from Unmanned Aerial Vehicles (UAV) images [136]. The key features identified are plant height computed from a DEM and Digital Surface Model (DSM) and textural features from the Gray-Level Co-Occurrence Matrix.

Many different xAI strategies have been considered for crop yield prediction. [138] assess the feature importance of weather and spectral bands across three different crop phenology times on administrative boundary levels by summarizing the SHAP values [138], while [137] examine the relevance of soil features and DEM [137]. Both studies emphasize the importance of specific time steps and that the spectral bands provide valuable information. [139] investigate soil features, DEM, and spectral reflectance with a finer spatial resolution at subfield yields. Interestingly, no spatially consistent limiting factors are identified for an entire field, indicating the potential of countermeasures on a subfield level [139].

Other studies utilize RF and perturbation methods, specifically PFI [141] and ALE or PDP [142]. The feature importance of soil, weather, and spectral data on yield potential is analyzed in [141], while PDP and ALE are used to assess the interactions between management, weather, and soil data in [142]. Their findings suggest that residue management or rate application decisions can significantly influence crop yield.

Time series data is frequently employed for crop map**, leading to the utilization of Recurrent Neural Networks (RNNs) [143, 144, 145, 121] and transformers [146, 147, 148, 121] architectures. Given LSTM models for crop yield prediction, IG and SHAP are used to attribute soil moisture, weather, and reflectance data in [144] and [145]. Both studies reach the same conclusion: high temperatures during the growth season have a negative impact on crop yield. LSTM with attention and transformers allow to differentiate between corn and soybeans in [121]. Both models are interpreted not only through their corresponding attention mechanisms but also considering gradients and activation projection with t-SNE. Overall, both models agree in their attribution, emphasizing the middle of the year as an important period when corn starts to silk and soybeans begin to bloom. [149] apply CNNs to time series for predicting crop yield from vegetation indices and weather data. Attributions and scenario analysis for different weather conditions are provided with a method derived from CAM and modified for regression, named Regression Activation Map** (RAM) [149].

Crop classification with prototypes through a CNN encoder is proposed by [150] [150]. The LLP-Co method uses a priori proportions of the classes to match the instance proportions assigned to the prototypes. A distinctive approach based on a variational adversarial network for crop yield and irrigation scheme classification is introduced by [135] [135]. To learn more meaningful latent representations, the discriminator of the GAN architecture also needs to classify the latent representations from the encoder into the correct classes. Additionally, LRP is used to attribute the inputs. [117] use a model translation method to identify RF subtrees, common rules within the RF model, and their associated error rates [117]. Other works in agriculture monitoring encompass Gaussian Processs (GPs) [151, 152] and EBM [153]. Notably, the methods for agricultural monitoring are evaluated solely with anecdotal evidence. No study employs quantitative metrics, and only [145] carry out a small user study.

Natural hazard monitoring
The third main task consists of 26 papers on landslide susceptibility, fire and flood monitoring, as well as geological hazard monitoring. The typical features used as input for landslide susceptibility assessment are DEM variables, landcover and vegetation (e.g., NDVI) information, and weather variables such as temperature or precipitation. Often, they are supplemented by human factors like the distance to roads or hydrological properties (e.g., drainage density, soil type). Usually, these models are interpreted with SHAP: RFs [154, 155], Support Vector Machines (SVMs) [155], CNNs [156], Feed Forward Neural Networks (FNNs) [156, 157, 156], and Gradient Boosting (GB) [158, 159]. Additionally, LIME, PFI or PDP are applied in the same fashion in [160, 156]. Other approaches include interpretable models [161, 162, 163].

In summary, rainfall [162, 164, 156], slope [161, 158], elevation, aspect [164], curvature [154], distance to road [158, 160], and NDVI [154, 160] are among the most important features for this task. Moreover, higher NDVI suggests a decrease in landslide probability [160]. In contrast, mines can increase the probability [162].

The flood map** approaches rely on similar features as landslide susceptibility (being rainfall a permanent component) and leverage mostly tree-based models or CNNs with SHAP [165, 166, 167], PDP [167] or prototypes [168]. The elevation and slope have the biggest influence on floods worldwide and in Turkey, according to [165, 166], while rainfall, road density, and building density have more influence on urban floods following [167]. [168] detect floods by performing adaptive k-means clustering of the image pixels in the latent space of a U-Net encoder [168]. This approach associates the prototypes with the cluster centers and enables the interpretation of the model decisions in terms of linguistic IF … THEN rules.

Model-agnostic methods (SHAP, PFI, PDP) are mostly applied to explain fire susceptibility prediction models (RFs and FNNs, see [169, 170]), which typically rely on landcover, weather, and DEM data, as the former tasks. [170] shows that a climate fire index (created other work with DNN), NDVI, and slope are critical indicators for Italy, while [169] finds that humidity, wind speed, and rainfall most are most important factors for Australia. Further, [171] reveals that fires are more severe in areas with higher elevation and in those where the dominant vegetation types are shrubs and open forests.

Various works explore geological hazards. For example, [172] uses CNN to classify various disaster events (e.g., building damage, fire) from aerial imagery and reveals the salient regions with a weighted combination of LIME and SHAP attributions. Further, [173] utilizes Grad-CAM to identify the mistakes of the CNN model for predicting volcano deformation patterns and uses t-SNE to evaluate the differences between the latent space representations of real and simulated data. Regarding other geological hazards, [174] explores the tunnel geothermal disaster susceptibility based on land surface temperature, river density, and other geological factors. They predict the susceptibility based on an ensemble of machine learning models and evaluate the factor importance with PFI, PDP, and LIME. Their analysis reveals that land surface temperature, fault density, earthquake peak acceleration, and river density are among the critical indicators. In another study, [175] explores earthquake probability prediction with eXtreme Gradient Boosting (XGBoost) and explains the model predictions with SHAP.

Distinct approaches across EO tasks

Besides the above-described common practices of applying xAI in RS, we also identified distinct modeling approaches applied to specific EO tasks. For instance, intrinsic CNN architectures based on the xAI category explanation association are introduced for socioeconomic indicators estimation. The semantic bottleneck models are particularly used to decompose the model prediction into a linear combination of human-understandable concepts, as described in Section IV-A. Consequently, these models explain the complex relationship between the remote sensing scene and the socioeconomic indicators through a set of interpretable concepts. In [176], a semantic bottleneck model aims to predict environment scenicness which is shown to be an important proxy for socioeconomic indicators, such as quality of life or health. The proposed model is jointly optimized to first predict intermediate concepts corresponding to landcover classes, followed by a linear layer that estimates the scenicness based on the predicted landcover classes. This architecture allows the authors to directly relate the scenicness prediction to landcover classes which are human-understandable concepts. Similarly, for the problem of estimating the living quality from aerial images, a semantic bottleneck model is proposed in [177] with the interpretable concepts being different liveability dimensions like population statistics, building quality, physical environment, safety, and access to amenities.

Although interpretable-by-design models provide lower performances than DL architectures, they are leveraged in diverse contexts. Linear regression is used to support more complex methods and evaluates the linear trend for a feature [142, 178]. The application of GLM is observed for small sample sizes [179] and Generalized Additive Models (GAMs) are used for larger datasets [180, 153, 161, 162]. [180] reduce the feature space with Principal Component Analysis (PCA) in [180], while advanced versions based on EBMs are introduced in [153, 161]. EBMs use pairwise feature interactions within the GAM and consider gradient boosting to train each feature function consecutively. [162] follows this idea and uses FNNs as function approximations [162]. A decision tree incorporating linear regression models at the terminal leaves is employed in [181] to gain scientific insights into the partitioning of precipitation into evapotranspiration and runoff. [182] apply LDA to SAR images. Their bag-of-words approach uses superpixels as words to do landcover map** [182, 183]. Last but not least, [152] analyze the weights of the GP to find anomalous samples [152].

Fuzzy logic-based models are another approach that is mainly used to evaluate the trustworthiness of ML models. An Ordered Weighted Averaging (OWA) fusion function is presented in [184] for burned area map**, which allows controlling if the fusion results are affected by more false positives than false negatives, and vice versa. At the same time, it foresees if there are only a few highly or many low relevant factors when providing a particular output. The outputs of Fuzzy Logic Systems (FLSs) for tree monitoring [185] can also be easily validated. Last but not least, measure-, integral- and data-centric indices based on the Choquet integral (an aggregation function defined with respect to the fuzzy measure) are introduced in [186, 187, 188, 189] intending to develop more understandable ensembles for landcover map**.

xAI methods are not only used to gain insights about the model behavior. Some works explore their abilities to classify or enhance training [190, 191]. [191] [191] leverage Grad-CAM to create masks that occlude features the network has emphasized, encouraging the network to exploit other features [191]. The output of three NNs is merged through attention in another study [164]. The outputs and the attributions of the DeepLIFT backpropagation method are the key values of the attention layer. In contrast, [190] [190] directly classify objects on top of the CAM attribution map. Hence, their weakly supervised method does not need bounding boxes, making the image labels sufficient. They propose three distinct methods to segment the attribution maps into bounding boxes, each performing differently depending on the number, heterogeneity, and complexity of the objects. Furthermore, the same object detection strategy is used by [192] to evaluate the xAI methods for their localization abilities.

V-A2 Adapted xAI approaches

The xAI methods commonly used in RS are originally designed to work on classical computer vision datasets, such as ImageNet [193]. These are mostly composed of natural images which, as described in Section I, significantly differ from remote sensing acquisitions. This raises the question of whether the utilized xAI methods fit remote sensing data well. In this respect, we identified several works that propose new approaches considering remote sensing data properties to produce better explainability insights. This is typically performed by adapting the existing xAI methods or by proposing new DL architectures tailored to RS applications.

Recently, as illustrated in Figure 6, there has been an increase in the tailoring of xAI methods for RS properties. 50% of the publications have been published in the last year, and no novel method was identified before 2021. A strong focus is given to modifying the CAM and Grad-CAM methods for different RS applications and domains. For example, [194] exploit that the target objects in SAR images occupy only a small portion of the image to propose a new CAM method which, instead of upsampling the feature map of the convolutional layer to the input image, downsamples the input image to the feature map of the last convolutional layer [194]. This operation results in saliency maps that localize precisely the targets in SAR images compared to the Grad-CAM method. Additionally, a CAM method able to produce much more fine-grained saliency maps than the prior CAM methods is introduced by [192] in [192]. Similar to Layer-CAM [89], they use shallow layers to get more fine-grained results but also rely on scores, following the idea of Score-CAM [195], which are not as noisy as gradients. Another attempt to improve current CAM methods was proposed by [196], who utilize the attribution maps from all network layers and decrease their number through only retaining maps which minimize the information loss according to the Kullback-Leiber (KL)-divergence [196]. Additionally, local attribution maps are generated by masking the image and weighting the maps by the corresponding bounding box and their prediction confidence. These local maps need to be smoothed with a Gaussian kernel to avoid sharp boundaries in the resulting CAM. This novel approach, called Crown-CAM, is evaluated on a localization metric and outperforms (augmented) Score-CAM and Eigen-CAM on a tree crown localization task. CAM variants for hyperspectral images are developed in [197]. The saliency map is now a 3D volume instead of a 2D image, and each voxel attributes the different channels in depth, which provides pixel-wise and spectral-cumulative attributions. Other Grad-CAM adaptions proposed are a median pooling [198] and a pixel-wise [199] variant.

Regarding deep learning approaches, a model prototype approach for RS is proposed in [200], where the ProtoPNet architecture [66] is adapted to also consider the location of the features. The network is iteratively trained in 3 stages. Firstly, the encoder and prototype layers are trained to produce the prototypes. Secondly, the prototypes are replaced by the nearest prototype of the corresponding class. Lastly, only the output layer weights are trained to produce the final prediction. In contrast to ProtoPNet, the prototype similarity is scaled with a location value learned by the network. This acknowledges the location of the prototypes in the image and makes them location-aware. Another approach is presented in [201], where a reconstruction objective is added to the loss function to enable the Grad-CAM ++ method to more accurately localize multiple target objects within an aerial image scene.

Finally, we also identified one approach addressing the model agnostic methods. Specifically, [202] adapts the model-agnostic PFI method to incorporate spatial distances [202]. Analogous to the PFI method, the features are permuted, and the mean decrease in predictive accuracy is assessed. Notably, features are permuted across various predefined distances, revealing the spatial importance or sensitivity of the model.

V-B RQ2: Interpretation and evaluation of xAI explanations

V-B1 Understanding and validating explanations

The properties of remotely sensed data mentioned in Section I can hinder the intuitive understanding of the semantics of the objects or the individual pixels in the remote sensing scene. Therefore, an obstacle that arises when applying xAI in RS is explanation interpretation, as the relevant features often do not have a straightforward interpretation. We identify that this challenge is frequently tackled by transforming the raw features into interpretable indices used for model training [203, 144] or by associating domain knowledge with the explanation at the post-hoc stage [146, 121].

The creation of interpretable input spaces

The data preprocessing in ML usually generates meaningful features for models. Ensuring human-understandable features is essential for comprehending input-output relationships or gaining knowledge of the model. For instance, understanding xAI attributions of SAR images can pose challenges. However, [199] address this by transforming the feature space into human-interpretable factors using a U-Net architecture [199]. They derive three interpretable variables from the VH polarization backscatter coefficients and the VV polarization interferometric coherence of the Sentinel-1 images, providing insights into the temporal variance and the temporal minimum. While the temporal variance changes between different crops and landforms over time, the temporal minimum is specific for flooded rice fields due to their proximity to water. This facilitates the understanding of the attribution of the applied Grad-CAM method.
A well-known preprocessing technique is the creation of spectral indices, like the NDVI vegetation index. This approach leverages the known characteristics of the various spectral channels and facilitates a more accessible representation and interpretation of the features. Our study observes a significant adoption of these indices.
Encoding the feature space through dimensionality reduction approaches is commonly believed to decrease the interpretability. However, [132] demonstrates that a helpful interpretation from a complex, correlated feature space can still be maintained [132]. The author employs structured PCA for feature space reduction and RF for classification. The xAI methods ALE, PDP, SHAP, and PFI are evaluated both in the original and the transformed feature space. The results indicate that the behavior of the features can be identified in the principal components and allow the extraction of the relationship between the main feature groups.

Explanation interpretation with domain knowledge

When raw inputs are used for model training, domain knowledge is often utilized to reveal the semantics of the relevant features. It is usually derived from already established indicators for the task under study, by utilizing external geolocated data sources, or based on expert knowledge.

One task where existing indicators are commonly used is agriculture monitoring, where insights about crop phenology are utilized to uncover the semantics of the relevant features. Concretely, [146] associate the NDVI index with the time points that are highly attended by the transformer encoder model to reveal the key phenological events for crop discrimination [146]. Similarly, agronomical knowledge is considered in [121, 118] to reveal the phenological stages by which the crops are classified. Further, the SHAP values of the input features used in [138] for crop yield estimation are aggregated according to the observation acquisition date into the stages of crop development and crop harvest. This aggregation reveals the criticality of the observations close to the harvest stage for the yield prediction. Similar efforts for explanation interpretation can also be found for other tasks, such as for predicting landslide susceptibility. For instance, [159] [159] relate the spatial heterogeneity of the SHAP values to the natural characteristics and human activities for the following factors: lithology, slope, elevation, rainfall, and NDVI. They argue the differences in factor contributions can be attributed to local regional characteristics such as topography, geology, or vegetation.

Regarding geolocated data sources, we identified that landcover class labels are associated with the input observations to interpret the explanation. [204] identify the relationship between urban topology and the average household income [204]. For this purpose, the Grad-CAM attributions of the image pixels are related to their landcover classes. Commercial/residential units are characterized by low income, while natural areas link to higher income.

In certain works, expert knowledge is used to validate or interpret the findings. For example, the most important drivers of landslides noted by the field investigation reports are compared to the features sorted by explanation magnitudes in [164]. LDA is leveraged in [182] for unsupervised sea ice classification, and closely related classes are identified by KL divergence. The LDA derived topics and probabilities, together with the interclass distances and segmented images, enable experts to assess the physical relationship between these classes. For example, water bodies, melted snow, and water currents have a similar topic distribution and a substantial physical similarity: liquid water. In [205], a comprehensive approach is employed for integrating expert knowledge to interpret and guide the process of finding and validating dwelling styles and their evolution within ethnic communities. Initially, the building footprint is extracted from satellite imagery, and with the help of experts, the styles are classified into different types. Subsequently, xAI is applied, leveraging XGBoost and SHAP to determine the importance, with the experts inferring the semantic meaning. In the last step, the experts guide the clustering of the proximity of the styles and the analysis of the geographical distribution. The results reveal the emergence of mixed ethical styles inheriting from the three traditional styles, which can be correlated to migration records.

V-B2 Evaluation of xAI methods for RS

As indicated in Section IV-C, xAI evaluation poses an open challenge. Figure 7 illustrates that the majority of the literature relies on anecdotal evidence, often involving the visualization of arbitrarily or cherry-picked examples. In addition to this informal evaluation, some authors [200, 206, 207] evaluated their methods on straightforward toy tasks where humans can easily identify the sought relationships.

[145] exclusively conduct a user survey to assess (1) the importance of the DL features by experts and (2) judge the importance by the xAI method [145]. Five crop modeling experts assigned important scores to the features, subsequently compared to the post-hoc SHAP importance scores. Afterward, the experts categorized the model explanations into four categories (strong) agree, (strong) disagree, and should provide a justification. Overall, it is demonstrated that experts can understand the model explanations, and the explanations enable the experts to get insights into the models. However, the task remains challenging and has the potential for misconceptions about the model behavior.

Regarding quantitative evaluation, we identified 16 studies testing the xAI methods for RS problems with functional metrics. These metrics mainly asses the explanation quality properties described in Section IV-C and the localization ability of the xAI methods. The backpropagation methods are most frequently evaluated with a particular focus on the CAM approaches.

The explanation quality properties are evaluated in [208, 197, 198, 209, 194, 210, 127]. [208] [208] evaluate Saliency, Input*Gradient (I*G), IG, GuidedBackprop, (Guided) Grad-CAM, Occlusion, DeepLift, LIME, and Smooth Gradient (SmoothGrad) on landcover map** tasks. Utilized metrics are max-sensitivity, file size, computation time, and Most Relevant First (MoRef). Max-sensitivity measures the reliability (maximum change in explanation) when the input is slightly perturbed, the file size is used as a proxy for explanation sparsity, and MoRef measures how fast the classification accuracy declines when removing the most relevant explanations. The results indicate no obvious choice for this task. While Occlusion, Grad-CAM, and LIME were the most reliable according to the max-sensitivity metric, they lack high-resolution explanations. Grad-CAM also emerged as the most computationally efficient choice. Evaluating faithfulness with MoRef is also conducted in [197] where the performance of different CAM methods is compared. Also, a class sensitivity metric is employed to measure the correlation between the attributions of different target classes. Additionally, [198] leverages median-pooling Grad-CAM, Grad-CAM, Grad-CAM ++, and SmoothGrad-CAM ++ on the metric drop/increase in confidence when occluding important regions from the image. Similar metrics are also evaluated in [194], where two different occlusion tests are employed to benchmark various CAM approaches with proposed Self-Matching CAM. While the methods perform similarly when the most influential pixels are perturbed, only Self-Matching CAM observes a minor drop in the prediction difference when only the most salient pixels are not occluded, demonstrating that its explanation focuses on the target object in the image Further, [209] develop attention NN and compare it to attention networks from the literature, CAM, Grad-CAM and Layer-CAM. The metrics max-sensitivity and average % drop/increase in confidence are utilized. In a different study, Low faithfulness of the Grad-CAM explanations based on similar metrics is demonstrated in [210] for the task of real estate appraisal. This study also conducts model and data randomization tests to find that Grad-CAM is sensitive to changes in network weights and label randomization. A similar evaluation for sensitivity to changes in the network weights is conducted in [127], where the goal is to assess how damaging the network weights affect the inference of the CNN models. To evaluate this, the authors compare the CAM maps of the original model against the CAM maps of the model with disabled weights. Their results indicate that the ResNet model exhibits the highest resilience to network changes as the CAM map model with disabled weights matches most closely the CAM map of the original ResNet model.

The localization ability is evaluated in [211, 192, 196] For instance, [211] [211] use various CAM approaches to evaluate the ResNet model on the tasks of dealing with large variance problems and its localization ability. Particularly, segmentation maps were derived from CAM, Grad-CAM, Grad-CAM ++, SmoothGrad-CAM ++, and Score-CAM approaches by thresholding their attributions. Grad-CAM showed superior performance in localization accuracy and the ability to extract complex features in images with large variance. In [192], the localization capabilities of the new CSG-CAM method are compared to CAM methods, Grad-CAM, Grad-CAM ++ and Score-CAM. To this end, a weakly supervised segmentation task driven by the xAI attributions maps is considered. If the attributions succeed a threshold for a certain class, the class is assumed to be contained in the scene. Similarly, [196] evaluates the localization ability of different CAM approaches for tree crown detection.

When it comes to the evaluation of other xAI methods, despite their high usage, the SHAP explanations are quantitatively evaluated only in [212] where a FNN trained using the features with the highest SHAP values slightly outperforms the model trained on the complete set of features. Further, the attention weights are evaluated in [146] by inspecting drops in the accuracy for crop map** when the transformer model is trained on a subset of dates with the highest attention values. The results verify that attention weights select the key dates for crop discrimination as training the model with only the top 15 dates is sufficient to approximate the accuracy of the model trained on the complete dataset. Finally, [133] [133] employ distinct metrics for their counterfactual generation. These metrics aim to ensure a certain quality of the counterfactuals. They used proximity (evaluating closeness to the original input instance, measured by $l_{2}$ distance), compactness (ensuring a small number of perturbations across time steps), stability (measuring the consistency for comparable input samples) and plausibility (measuring adherence to the same data distribution).

V-C RQ3: xAI Objectives and Findings in RS

In this section, we analyze the motivation for applying xAI in RS according to the common objectives specified in Section IV-D. The frequency of the objectives for using xAI displayed in Figure 8a reflects a similar trend as in [13]. Namely, the objective to control is the most commonly found in 127 works. It is followed by a large gap in the objective to discover insights, which is met in 51 works. Next, follow the objectives to improve and to justify which are found in 27 and 5 works, respectively. Moreover, Figure 8b shows that the objective to discover has a unique distribution across the EO tasks: it frequently occurs in studies monitoring the atmosphere, vegetation, and the human environment interaction. In contrast to other objectives, which are often identified in the studies related to the three main EO tasks described in Section V-A.

The observed distribution of the EO tasks with the objective to discover indicates that these studies focus on knowledge extraction for more recent EO tasks, which are used to deal with monitoring and understanding the driving factors behind extreme events related to climate changes or natural hazards. These studies are of utmost importance as their insights can enable the application of early preventive measures for disaster management. For instance, the problem of uncovering the key drivers for wildfire susceptibility is tackled in [213]. Applying SHAP and PDP on the trained ML model reveals that soil moisture, humidity, temperature variables, wind speed, and NDVI are among the most important factors associated with wildfires. Regarding monitoring natural hazards, [214] uses SHAP to identify that volcanic deposits, terrain properties, and vegetation types are strongly linked to vegetation vulnerability after volcanic eruptions. Concretely, increased vegetation vulnerability is associated with higher lapilli accumulations, being crops and forests the most and the least susceptible vegetation types, respectively [214].
Extracting novel scientific insights can also be valuable to gain new knowledge about ambiguously defined concepts. One such concept is wilderness. Although monitoring wilderness areas can be an important indicator of sustainable development, there is no uniform definition in the literature for wilderness, which is usually described with several philosophical reflections. [215] aim to infer wilderness characteristics from Sentinel-2 images. By analyzing occlusion sensitivity maps, they reveal that wilderness is characterized by large areas containing natural undisrupted soils, in contrast to anthropogenic areas that have specific edge shapes and lie close to impervious structures [215]. Besides discovering new knowledge, scientific insights can also be used to identify the reasons for inaccuracies in Earth system models. [216] provide insights into a lightning model’s structural deficits by predicting its error with a gradient boosting algorithm and interpreting it with SHAP. The error is computed as the difference between the output of the model and satellite observational data. Their analysis reveals potential deficits in high convective precipitation and landcover heterogeneities [216].

On the other hand, the studies with the objective to control are mainly comparing the inference mechanisms of various established ML models used for EO. Consequently, they are mostly conducted in landcover map**. One example of such study is presented in [217], where SHAP feature importance rankings are compared across several ML models trained on Sentinel-2 images, topography, phenology, and texture features. This study illustrates that the used models consistently rank the relevance of the red edge bands for predicting the agriculture class and the chlorophyll-sensitive bands for identifying deciduous trees. Conversely, inconsistencies among the models are found when analyzing the importance of the texture features.

Agricultural monitoring is the second most common task where this objective is considered. These studies typically focus on assessing the reliability of the proposed models by quantifying the relevance of the multitemporal information. For instance, [147] use the gradient method to measure the temporal importance assigned by various DL models for crop classification. Their analysis reveals that the transformer and the LSTM approaches ignore the observations obscured by clouds and focus on a relatively small number of observations when compared to the CNN models [147]. Following a similar approach, [121] evaluate the generalization capabilities of these models when inference is performed in different years than the ones used for model training [121]. Their experiments indicate that the LSTM model adapts better to changes in crop phenology induced by late plantation compared to the transformer model.

The objective to control also supports different types of studies that anticipate the model decisions in scenarios that can occur in practical applications. For instance, [218] investigate the impact of exposing a DNN, initially trained on cloud-free images, to cloudy images [218]. They apply Grad-CAM to identify crucial regions for the classifier in both image types. Their findings reveal different factors why the network misclassifies the OOD examples, including the coverage of structures through cloud cover and shadows, as well as the homogeneity or heterogeneity induced by different cloud types. Forecasting weather extremes constitutes another application where verifying that ML models generalize well enough to predict unlikely and rare events is essential. [219] provide a showcase for extreme rainfall downscaling [219]. By interpreting a CNN with Grad-CAM they show that the model is able to learn the most relevant meteorological features.

The studies with the objective to justify usually discuss why the insights about the model workings are relevant for the policymakers. For instance, [143] argue that the interpretability of the DL models in agricultural applications is critical to ensure fair payouts to the farmers according to the EU Common Agricultural Policy (CAP) [143]. By applying a perturbation approach, the authors find that the summer acquisitions and the red and near-infrared Sentinel-2 spectral bands carry essential information for the decisions of the used RNN model. Further, the human footprint index, which represents the human pressure on the landscape and can be a valuable metric for environmental assessments, is predicted from Landsat imagery in [220]. LRP is then leveraged to visually highlight the relevant features in the images. Besides policy-makers, the approach presented in [134] can also assist individual users during the production phase of a ML model, as it justifies the model’s validity for inference by providing example-based explanations. If the explanation example does not fit the input instance, the model is considered unreliable for the RS image classification task.

Lastly, a large part of the works with the objective to improve focus on adapting the existing xAI methods to RS tasks and are already described in Section V-A2. Concerning the other techniques for model improvement based on xAI insights, we identified that data augmentation and model augmentation are commonly performed.

Data augmentation is applied by [221], who simulate synthetic data for training a CNN model for the detection of volcanic deformations [173]. By conducting an explainability analysis with Grad-CAM on real data, the authors identify that the model wrongly predicts volcanic deformations on, e.g., salt lakes and slope-induced signals, which are patterns that were not considered in the simulated data. These insights are used to improve the prediction performance by fine-tuning the last layer of the CNN model on a hybrid synthetic-real dataset that accounts for these patterns. An iterative model improvement for satellite onboard processing through a weakly supervised human-in-the-loop is proposed by [209] in [209]. In their scenario, the satellite has a convolutional attention network for object classification onboard. Initially, the uncertain explanations and images are identified by assessing the similarity of the attention maps across the attention blocks. An inconsistency metric is introduced to measure the similarity of the attribution maps emphasizing commonly highlighted regions. If a sample performs badly on that metric, experts on the ground refine it by labeling the incorrect pixels in the attention map. In the last step, a local model is retrained with the refined data, and the onboard model gets updated. A last example of data augmentation is the work by [191] [191] where, for every image, a mask based on the Grad-CAM output is used during training to occlude those regions for which the model provides the highest activation, attempting to encourage the network to explore other features in the image.

On the other hand, model augmentation is performed in [222], where xAI is initially used to compare the workings of the Vision Transformer (ViT) and CNN models for monocular height estimation. Upsampling the feature maps of the penultimate layer to the input image unveils that even though the ViT learns more disentangled representations than the CNN, a neuron in the penultimate layer of the ViT still encodes semantics of multiple objects. This is resolved by proposing a new transformer model that learns distinct latent representations per semantic class.

TABLE I: A complete list of all relevant papers in this review, aggregated by EO Task and xAI Category. (^†xy indicates a new method which was derived from xy; a full list of all used acronyms can be found in Appendix LABEL:sec:glossary.)

EO Task	xAI Category	Paper, xAI Methods	Model	Evaluation			Objective
				Toy Task	Anecdotal	Quantitative	Control	Improve	Discover	Justify
Agricultural Monitoring	Backpropagation	[199] PWGrad-CAM^†Grad-CAM	CNN		✓		✓
		[149] RAM	CNN		✓
	Backpropagation, Embedding Space	[147] Gradient, Attention, Activation Assessment	Transformer, CNN, ConvLSTM, LSTM		✓		✓
	Backpropagation, Embedding Space, Feature Selection	[121] MDI, Gradient, Attention, Activation Assessment	RF, Transformer, aLSTM		✓		✓
	Backpropagation, Joint Training	[135] Explanation Association, LRP	GAN		✓		✓	✓
	Backpropagation, Local Approximation	[144] SHAP, IG	LSTM		✓		✓
		[145] SHAP, IG	LSTM		✓		✓
	Embedding Space	[148] Attention	Transformer		✓		✓
	Feature Selection	[118] MDI, GFFS	RF		✓		✓
	Interpretable by Design	[153] EBM	EBM		✓		✓
	Interpretable by Design, Perturbation	[152] GP, Occlusion	GP		✓		✓
		[142] GLM, PFI, ALE, PDP	GLM, RF		✓
	Joint Training	[150] Prototype	CNN		✓		✓
	Local Approximation	[137] SHAP	GB		✓		✓
		[136] SHAP	GB		✓		✓
		[138] SHAP	GB		✓		✓
		[223] SHAP	RF		✓
		[139] SHAP	GB		✓		✓
		[140] SHAP	GLM, SVM, FNN, RF, kNN, GB		✓		✓
	Model Translation, Feature Selection	[117] MDI, Rule Extraction	RF, DT		✓		✓
	Perturbation	[202] Spatial Variable Importance Profiles^†PFI	LDA, kNN LDA, RF		✓			✓
		[141] PFI	RF		✓		✓
		[151]	GP		✓		✓
	Perturbation, Embedding Space	[143] Occlusion, Activation Assessment	LSTM		✓					✓
		[146] Occlusion, Attention	Transformer			✓	✓
Atmosphere Monitoring	Backpropagation	[224] IG	NN Ensemble		✓
	Backpropagation, Local Approximation, Perturbation	[225] SHAP, XRAI, Occlusion	CNN		✓
	Embedding Space	[226] Activation Assessment	FNN		✓
		[227] Attention	FNN		✓
	Embedding Space, Joint Training, Feature Selection	[122] Model Association, MDI, Activation Assessment	RF, NN+ML Ensemble, FNN		✓		✓
	Feature Selection	[123] MDI	Tree Ensemble		✓		✓
		[124] MDI	Tree Ensemble		✓		✓
		[125] MDI	Tree Ensemble		✓		✓
	Interpretable by Design	[228] GLM^†GLM	GLM		✓			✓
	Local Approximation	[229] SHAP	FNN		✓
		[230] SHAP	GLM, GB		✓
		[231] SHAP	GB		✓		✓
		[232] SHAP	GB		✓
	Local Approximation, Feature Selection	[115] MDI, SHAP	RF		✓		✓
	Perturbation	[233] PFI	aCNN		✓		✓
		[234] PFI, ALE	RF		✓		✓	✓
	Perturbation, Feature Selection	[120] MDI, PDP	RF		✓
	Perturbation, Local Approximation	[235] SHAP, LIME, Ceteris Paribus Profiles, PFI	RF		✓
		[236] SHAP, PDP	GB		✓		✓
Building Map**	Backpropagation	[237] Grad-CAM	CNN		✓		✓
		[238] A*G, DeepLift	CNN		✓		✓
		[239] Grad-CAM	CoAtNet		✓		✓
		[211] CAM, Score-CAM, Grad-CAM++, SmoothGrad-CAM++, Grad-CAM	CNN			✓	✓
	Backpropagation, Embedding Space	[240] Attention, Grad-CAM	CNN		✓		✓
	Backpropagation, Embedding Space, Perturbation	[222] IG, Occlusion, Activation Assessment	Transformer, CNN		✓		✓	✓
	Local Approximation	[205] SHAP	GB		✓
	Local Approximation, Feature Selection	[114] MDI, SHAP	RF, FNN		✓			✓
Ecosystem Interactions	Interpretable by Design	[180] GAM	GAM		✓
	Interpretable by Design, Local Approximation	[178] GLM, SHAP	GLM, GB		✓
	Local Approximation	[241] SHAP	GB		✓		✓
		[242] SHAP	GB		✓
	Perturbation, Local Approximation	[243] SHAP, PFI	GB		✓		✓
Human Environment Interaction	Backpropagation	[204] GuidedGrad-CAM	CNN		✓
		[220] LRP	CNN		✓		✓			✓
	Backpropagation, Embedding Space, Perturbation	[210] Occlusion, Grad-CAM, Activation Assessment	GLM, NN Ensemble			✓				✓
	Interpretable by Design	[244] GLM	GLM		✓		✓
	Joint Training	[177] Explanation Association	CNN		✓		✓
		[176] Explanation Association	CNN		✓
	Local Approximation	[245] SHAP	RF		✓
		[246] SHAP	GB		✓
		[247] SHAP	GB		✓		✓
		[248] SHAP	GB		✓
		[249] SHAP, LIME	RF		✓
	Perturbation	[250] ALE	RF		✓
		[251] PFI	CNN		✓		✓
	Perturbation, Embedding Space	[215] Occlusion, Activation Assessment	CNN		✓
	Perturbation, Local Approximation	[252] SHAP, LIME, Occlusion	FNN		✓
Hydrology Monitoring	Backpropagation	[206] Gradient	GP	✓	✓			✓
		[253] Gradient	GP		✓		✓
		[254] Grad-CAM	CNN		✓	✓	✓	✓
		[255] Grad-CAM	CNN		✓		✓
		[256] Gradient	CNN		✓
		[257] Grad-CAM	CNN		✓			✓
	Backpropagation, Local Approximation	[258] SHAP, IG	CNN		✓		✓
	Backpropagation, Perturbation	[259] PDP, IG, DeepLift, Expected Gradients	LSTM		✓		✓
	Embedding Space	[260] Attention	CNN		✓		✓
		[207] smlp^†Attention	FNN	✓	✓			✓
	Interpretable by Design, Local Approximation, Perturbation	[181] Cubist, ALE, LIME, PFI	Cubist, GB		✓
	Local Approximation	[261] SHAP	FNN		✓		✓
		[262] SHAP	RF, GB, DT, GB		✓		✓
Landcover Map**	Backpropagation	[197] 3DGrad-CAM^†Grad-CAM	CNN		✓	✓	✓	✓
		[218] Grad-CAM	CNN		✓		✓
		[192] CSG-CAM^†Grad-CAM, Score-CAM, Grad-CAM++, Grad-CAM	CNN		✓	✓		✓
		[201] ERC-CAM^†CAM	CNN		✓			✓
		[263] XRAI	CNN		✓		✓
		[264] Grad-CAM	CNN		✓	✓		✓
		[126] Grad-CAM	CNN		✓		✓
		[198] MPGrad-CAM^†Grad-CAM, Grad-CAM++, SmoothGrad-CAM++, Grad-CAM	CNN		✓	✓		✓
		[127] CAM	CNN		✓	✓	✓
	Backpropagation, Embedding Space	[209] CAM, Attention, Grad-CAM, Layer-CAM, Attention, Attention	CNN			✓		✓
	Backpropagation, Local Approximation, Perturbation	[208] IG, DeepLift, Gradient, GuidedBackprop, LIME, GuidedGrad-CAM, Grad-CAM, I*G, Occlusion	CNN		✓	✓	✓
	Counterfactuals	[133] GAN	CNN			✓	✓
	Embedding Space	[265] Activation Assessment	CNN, CapsuleNet		✓		✓
		[266] Activation Assessment	CNN		✓		✓
		[267] Attention	CNN		✓		✓
		[268] Attention, Activation Assessment	CNN		✓			✓
	Embedding Space, Local Approximation	[189] SHAP, Activation Assessment	NN Ensemble		✓		✓
		[129] LIME, Activation Assessment	CNN		✓			✓
		[187] SHAP, Activation Assessment	NN Ensemble		✓		✓	✓
		[188] SHAP, Activation Assessment	NN Ensemble		✓		✓
	Example-based	[134] WIK	CNN		✓		✓	✓		✓
	Interpretable by Design	[269] Rules	NN+Tree Ensemble		✓		✓
		[183] LDA	LDA		✓		✓
	Interpretable by Design, Embedding Space	[270] LDA, Activation Assessment	NN Ensemble		✓			✓
	Joint Training	[271] Prototype	CNN		✓			✓
	Local Approximation	[131] SHAP	RF		✓		✓
		[130] SHAP, LIME	CNN		✓		✓
		[217] SHAP	SVM, RF, GB, CNN		✓		✓
		[128] SHAP	CNN		✓		✓
		[272] LIME	CNN		✓		✓
	Local Approximation, Feature Selection	[111] MDI, SHAP	RF, CNN		✓		✓
		[112] MDI, LIME	RF		✓		✓
	Perturbation, Local Approximation	[132] SHAP, ALE, PDP, PFI	RF		✓			✓
	Perturbation, Local Approximation, Feature Selection	[110] MDI, SHAP, PFI	RF		✓		✓
Natural Hazard Monitoring	Backpropagation, Embedding Space	[164] DeepLift, Attention	NN Ensemble		✓			✓
	Backpropagation, Embedding Space, Perturbation	[221] Occlusion, Grad-CAM, Activation Assessment	CNN		✓		✓	✓
	Backpropagation, Local Approximation, Perturbation	[213] SHAP, IG, PDP	LSTM		✓
	Embedding Space, Local Approximation	[162] SHAP, Activation Assessment	GAMI-Net		✓		✓
	Interpretable by Design	[161] EBM	EBM		✓		✓
		[184] Fuzzy Rules	Rules		✓		✓			✓
		[163] SNN^†FNN	GAM		✓			✓
	Joint Training	[168] Prototype	CNN		✓			✓
	Local Approximation	[169] SHAP	FNN		✓
		[155] SHAP	SVM, RF		✓		✓
		[165] SHAP	RF, GB		✓		✓
		[157] SHAP	FNN		✓		✓
		[273] SHAP	GB		✓		✓
		[158] SHAP	GB		✓			✓
		[175] SHAP	NN+Tree Ensemble		✓		✓
		[166] SHAP	CNN		✓		✓
		[172] SHAP, LIME	CNN		✓		✓
		[154] SHAP	RF		✓		✓
		[159] SHAP	GB		✓
	Perturbation	[171] PFI, PDP	RF		✓
	Perturbation, Feature Selection	[119] MDI, PFI	GLM, SVM, FNN, RF, kNN, GLM, Tree Ensemble, NB		✓		✓
	Perturbation, Local Approximation	[156] SHAP, PDP, PFI	CNN, NN Ensemble, FNN		✓		✓
		[174] LIME, Occlusion, PDP	Tree Ensemble		✓		✓
		[170] SHAP, PFI	RF		✓		✓
		[160] LIME, PDP	RF		✓		✓
		[167] SHAP, PDP	GB		✓		✓
Soil Monitoring	Backpropagation, Perturbation	[274] SquareGrad, VarGrad, Gradient, PFI, SmoothGrad, I*G, IG	ConvLSTM		✓		✓
	Local Approximation	[275] SHAP	GB		✓		✓
		[276] SHAP	CNN		✓		✓
		[277] SHAP	SVM		✓		✓
	Perturbation	[278] PFI	RF		✓		✓
Surface Temperature Prediction	Backpropagation	[279] LRP	FNN		✓		✓
	Local Approximation	[280] SHAP	GB		✓
		[281] SHAP	GB		✓		✓
		[282] SHAP	GB		✓
		[283] SHAP	GB		✓
	Local Approximation, Feature Selection	[115] MDI, SHAP	RF		✓		✓
Target Map**	Backpropagation	[194] Self-Matching-CAM^†Grad-CAM	CNN			✓		✓
		[284] CAM	CNN		✓		✓
		[190] CAM	CNN					✓
		[285] IG, Score-CAM	NN Ensemble		✓		✓
		[286] Grad-CAM	CNN		✓		✓
	Backpropagation, Embedding Space	[287] CAM, Activation Assessment	BagNet		✓		✓
	Backpropagation, Embedding Space, Perturbation	[191] Attention, Empirical Receptive Field, Grad-CAM, Activation Assessment	CNN		✓		✓	✓
	Backpropagation, Local Approximation	[288] GuidedBackprop, SHAP	CNN		✓			✓
		[289] CAM, LIME	CNN		✓			✓
	Embedding Space	[290] Attention	CNN					✓
		[291] Activation Assessment	VAE		✓		✓
		[292] Activation Assessment	BagNet		✓		✓
		[293] Attention	aCNN		✓		✓
	Local Approximation	[294] SHAP	CNN		✓			✓
		[295] LIME	CNN		✓		✓
		[296] LIME	CNN		✓			✓
	Perturbation	[297] Occlusion	NN Ensemble		✓		✓
Vegetation Monitoring	Backpropagation	[196] Eigen-CAM, AugScore-CAM, Crown-CAM^†Grad-CAM, Score-CAM	CNN			✓		✓
		[298] GuidedGrad-CAM	CNN		✓		✓
	Backpropagation, Local Approximation	[299] Deconvolution, Gradient, SHAP, GuidedGrad-CAM	CNN			✓		✓
	Interpretable by Design	[179] GLM	GLM		✓
		[185] Fuzzy Rules	FLS		✓		✓
	Joint Training	[300] Explanation Association	CNN		✓		✓
	Local Approximation	[212] SHAP	FNN			✓	✓	✓
		[203] SHAP	SVM, RF, GB		✓		✓
		[301] SHAP	Extra Tree		✓		✓
		[302] SHAP	RF		✓
	Local Approximation, Feature Selection	[109] MDI, LIME	GB		✓		✓
	Model Translation	[303] Rule Extraction	RF		✓
		[304] Rule Extraction	RF		✓			✓
	Perturbation	[305] PFI, PDP	RF		✓
	Perturbation, Local Approximation	[214] SHAP, PFI	GB		✓
		[306] SHAP, PFI	RF		✓
Weather Climate Prediction	Backpropagation	[307] LRP	CNN		✓		✓
		[308] LRP	FNN		✓		✓
		[309] LRP	FNN		✓
		[219] Grad-CAM	CNN		✓		✓
		[310] GuidedBackprop	CNN		✓			✓
	Backpropagation, Feature Selection	[113] MDI, Grad-CAM	RF, CNN		✓		✓
	Feature Selection	[311] TreeInterpreter	RF		✓
	Joint Training	[200] Prototype^†ProtoPNet	CNN	✓	✓			✓
	Local Approximation	[312] SHAP	GB		✓		✓
		[216] SHAP	GB		✓			✓
	Perturbation, Local Approximation	[313] SHAP, PFI	GLM, RF, SVM, FNN		✓		✓
Other	Backpropagation	[314] Gradient	FNN		✓		✓
	Feature Selection	[116] MDI	RF		✓		✓
	Interpretable by Design	[182] LDA	LDA		✓
	Interpretable by Design, Embedding Space	[270] LDA, Activation Assessment	NN Ensemble		✓			✓
	Joint Training	[315] Explanation Association	CNN		✓		✓
	Local Approximation	[316] SHAP	RF		✓		✓
		[317] SHAP	RF		✓		✓
		[318] SHAP	GB		✓
	Perturbation	[319] PFI, PDP	RF		✓

VI Discussion

In this section, we address research questions RQ4 and RQ5 and discuss the usage of xAI methods in RS, the adaptation of xAI methods to RS problems, as well as the evaluation of such methods (RQ4). Furthermore, we highlight the challenges, limitations, and emerging research directions in the field (RQ5).

xAI in RS is still a very young and dynamic field. Figure LABEL:fig:yearly_eo_xaif should give a broad overview of the development and evolution of the field in the last years and provide the reader with the context for the usage as well as the faced challenges of xAI in RS. Figure LABEL:fig:yearly_eo_xaifa illustrates varying trends in the EO tasks, with vegetation, atmosphere, or natural hazard monitoring recently getting more attention. Meanwhile, agricultural monitoring and target map** show a constant trend, while landcover map** publications are decreasing. This suggests a varying time lag before new approaches from ML are widely applied in RS. Indeed, landcover map** is one of the most established EO tasks, with a variety of benchmark datasets and models. While many approaches have already been applied to landcover map**, other tasks are currently being explored, creating a trend in these areas. Similarly, Figure LABEL:fig:yearly_eo_xaifb reveals increasing trends for local approximation and perturbation methods while backpropagation methods stagnate. This stagnation may be attributed to an interconnection between EO tasks. Initial assessments of xAI often involve simple and operable methods, specifically, local approximation and perturbation methods, which are model-agnostic. Hence, reflecting the high usage of xAI in RS while these xAI methods are extensively explored.

VI-A RQ4: Recommended practices in xAI for EO

VI-A1 The usage of xAI

In order to get meaningful and reliable explanations, xAI methods should be only applied to ML models that have achieved a good generalization [320]. Furthermore, the robustness of ML models and xAI methods should be quantified, as it is common for them to disagree [321]. To maintain consistent outcomes, it is advisable to consider a set of initializations of xAI methods and different ML models, as it is done when benchmarking the performance of a ML model. For instance, in the works identified in our review, different model types [165, 203, 156, 254, 155, 121, 259], model seeds or configurations [145, 259], and xAI method seeds [233] were used. Furthermore, the outcomes of different xAI methods can be compared, as it is done in [155, 181, 192, 238, 259, 209, 282, 196, 144, 142, 117, 198, 211, 178]. For example, [259] apply IG, Expected Gradients (EG), and DeepLift to exhaustively interpret different RNN-based architectures (RNN, LSTM, GRU), also varying their numbers of neurons [259]. Other related works are [178], where [178] compare the results of gradient boosting and SHAP applied to a LR model, or [238], where [238] compare the use of I*G and DeepLift in several layers.

As can be derived from the results of this review, the high and only usage of the SHAP model-agnostic method, or similar, suggests that this method is used regardless of the problem. We want to draw attention to this trend since there is no single method that fits every problem [320]. The practitioner should keep in mind that each method makes certain assumptions (e.g., SHAP assumes feature independence that simplifies the problem’s definition). These assumptions can often be violated, for example, with time series data, and harm the interpretation results. Therefore, the assumptions of the specific method need to be considered and aligned with the data structure, task, and underlying model. Additionally, each model-agnostic method represents one approach or paradigm to generate explanations, regardless of the trained model and input dataset. For specific tasks, specialized methods exist that fit their need and a method should not only be used because it is popular and easily accessible [322]. Furthermore, the choice of the xAI method and its evaluation should be discussed in each paper. This should be as much a part of the scientific dialogue as the model selection and evaluation. Numerous papers in this review did not mention their arguments for selecting their xAI methods. The question arises if the authors have considered the specificities of the used methods. Many of the existing xAI methods, like SHAP, IG, or LRP, are only an approximation of the underlying model which comes with simplifying assumptions. We encourage the authors to address this issue to lay the basis for the understanding and scientific evaluation of the chosen approach. Good perspectives in this respect are provided by [242] in [242].

Some of the modified Backpropagation (BP) methods, like Guided BP, Deep Taylor Decomposition (DTD), and variants of LRP, have been proven to be unfaithful to the model [99, 323]. [99] [99] demonstrate that perturbing the weights of the neural network does not significantly change the resulting explanations of the network. Another work builds a foundational theoretical framework, showing that many methods fail to provide class-sensitive explanations and highlight only low-level features [323]. Nevertheless, these methods are used in many works, as seen in Section V-A. If such a method is used, it is advisable to use one of the sanity checks mentioned in Section IV-C to verify the strategy applied. But, as pointed out in several comparisons of different methods [324, 325, 208], there exists no single xAI technique that satisfies all evaluation metrics and is an obvious choice for all EO problems. Rather, the methods should be chosen for the specific task and goal, taking into account specifications such as the need for global or local explanations, their model-agnostic or model-specific nature, their input type, the computational time required for the generation of explanations, the quality or detail of these explanations, or the suitability for the end-user, among others. A short discussion on how to evaluate these methods can be found in the following section. When using methods where the input is altered, like IG or SHAP, the baseline is the neutral input, which should capture the absence of any meaningful feature, and all outcomes are compared to the output for this input. Hence, the choice of the baseline is crucial and some methods have a built-in baseline by construction. This is especially important when different methods need to be compared [326, 327].

Often, the practitioner is interested in the causal relationships and the explanation results of black-box models might lead to unjustified causal claims [320]. xAI methods do not provide the cause. Even when the explanation supports the hypothesis from the research objective, it does not mean that this hypothesis has been proven. Standard ML approaches approximate correlations from data; therefore, the model cannot assess the causal structure of the data. However, it can help experts find unknown confounders and correlations by giving them a tool to investigate the learned correlations. The user must be very careful when making such assumptions to discover underlying structures and be aware that observational data often lacks common confounders, has strong feature correlations and its causal structure is usually unknown. Causal inference is a complementary research field to xAI and is not covered in this review. Nevertheless, we identified several works discussing the causality problem in xAI [214, 136, 241, 279].

VI-A2 Evaluation of xAI

As mentioned in Section V-B2, anecdotal evidence is frequently reported as the evaluation of the applied approaches. Cherry-picking and qualitative evaluation of explanations represent a challenge to humans. Because human perception is mainly visual, humans are biased toward certain types of xAI explanations. Besides others, humans introduce cognitive distortion by drawing more attention to negative examples and looking for simple but complete explanations [328]. That is accompanied by other human biases, like the confirmation bias, which is well-known in psychology and favors explanations fitting the expectations, while contrary explanations are ignored [329]. These self-introduced biases can lead to wrong reasoning about the explanations and promoting a particular type of visualization. Hence, it is hard to quantify the results objectively through anecdotal evaluation. Therefore, a quantitative evaluation of xAI methods is recommended. In addition, qualitative evaluation can be done, but it should not be the only type of evaluation. When users are expected to interact with explanations, attention should be paid to creating understandable and appropriate explanations for the specific end user, their requirements, and their level of knowledge [330]. Then, kee** in mind these biases is essential. Nevertheless, cherry-picking tends to be insufficient for evaluating all types of explanations and, as the only assessment criteria, it does not lead to a trustworthy evaluation of xAI.

Beyond evaluation based on anecdotal evidence, there also exist quantitative evaluation metrics (see Section IV-C) to assess the explanation methods. However, only a few works listed in Section V-B2 evaluate their methods against these metrics. In contrast to these studies, it has been observed that a more detailed quantitative evaluation of xAI is conducted in climate science, given climate models. For instance, [325] [325] evaluate seven Backpropagation methods, among them three CAM-based methods. They use a climate model to get reliable ground truth data for their explanations and compare the xAI methods with functional explanation evaluation by computing the following metrics: robustness, faithfulness, randomization, complexity, and localization. For their classification task, LRP and I*G are the best-performing methods for the evaluated metrics. Some of these backpropagation methods are also compared for their fidelity in [324]. First, the methods are evaluated on a synthetically created dataset. Then, the results are related to those on a climate simulation classification dataset. While Gradient, SmoothGrad, I*G, IG, and $\text{LRP}_{z}$ are very noisy and may not provide insights into the model, $\text{LRP}_{comp}$ and SHAP seem to highlight the attributions in a more meaningful way. IG and SHAP are also evaluated with different baselines in [327]. The results confirm that the baseline needs to be chosen carefully because a wrong baseline can lead to different interpretations, both in attribution magnitude and highlighted feature. Also, the authors suggest that all methods should be evaluated on the same baseline for comparison purposes.

There is no standard way to evaluate xAI until now, and discussions on how to properly do this are still ongoing. Nonetheless, different toolboxes have been developed for comparing xAI methods using quantitative metrics [101, 331, 332]. Last but not least, only one survey [145] evaluates the usefulness of the explanations to experts. However, with the emphasis on human-centered AI in the current research landscape [333], user studies are becoming significant to quantify the benefit of the explanations and the understanding of the end-users. Because they have been largely unexplored in the context of xAI in RS, they pose a promising research direction.

VI-A3 xAI benchmark datasets

Real-world datasets lack a controlled ground truth for evaluating xAI methods and a regulated environment is needed to lay the foundation for fully transparent datasets. Similar to [334], there are efforts to create synthetic datasets for EO tasks. [335] generated a fully synthetic dataset where they leverage local piece-wise linear functions to create a non-linear response to the input drawn from a Gaussian distribution [335]. This method allows for creating a regional climate prediction task from Sea Surface Temperature (SST). They show that a simple FNN can approximate this function and evaluate different post-hoc xAI methods. To get reliable ground truth data, it is also possible to use a simulation. For example, climate models [336] can be leveraged. Other approaches to generate fully-synthetic RS images include [337, 338, 339]. While [337] simulate different RS sensors, atmospheres, and scenes, including different terrains, materials, and weather conditions [337], [338] add different compositions of the vehicles [338] and [339] show that the motions of vehicles can be included in the simulation [339]. A framework for generating synthetic EO datasets is presented in [340], demonstrating the efficiency of these datasets for ML on off-shore wind farm detection. The framework aims to extract expert knowledge about the objects to be modeled in a machine-readable format (e.g., structure, relationships, etc.), which can be combined to create new datasets. However, there is no work comparing xAI methods on these synthetic RS datasets.

VI-B RQ5: Challenges, limitations, and future directions

We identified several challenges that emerge from xAI, RS, or the combination of them. Figure LABEL:fig:chall_mindmap gives an overview of these challenges, which are categorized by color according to the area from which they emerge, i.e., those emerging from RS and xAI are in green and blue, respectively.

VI-B1 Combination of xAI with related fields

Physics-aware ML

Physics-aware ML and xAI share the objective of enhancing model reliability, trustworthiness, and transparency. Both rely on domain expertise: xAI enables experts to discover new insights into physical processes. This is illustrated by studies such as [181], which explores runoff and evapotranspiration parameterization, and [216], where the errors of an Earth system model are predicted to hypothesize about wrong model assumptions using xAI. Conversely, integrating expert knowledge from physics helps to understand the models because the incorporated prior knowledge is human-understandable. The combination of both fields can yield physically sound and interpretable models. We have identified several approaches relying on physics-aware features applied for knowledge extraction from SAR data. For instance, [341] introduce a CNN operating in the complex domain which aims to predict physical scattering properties from Polarimetric SAR (PolSAR) images [341]. These properties are derived with the $H-A-\alpha$ target decomposition method and describe various urban, vegetation, and ocean surfaces. [270] inject similar physics-aware concepts into a CNN model for sea-ice classification and conducts a xAI analysis to show that the fusion provides a better separation of the different classes based on their physical properties [270]. Given the vehicle classification task, [297] extract vehicle parts with the attribute scattering center (ASC) model and construct images for the extracted parts based on their scattering centers selected with k-means clustering [297]. The image parts are used as filters for the convolution of the original image. With occlusion analysis, the authors demonstrate that incorporating the vehicle parts in the model inputs helps to improve model robustness, enabling the practitioners to obtain intuitive explanations and validate the model workings with domain knowledge. In more recent work, [290] also proposes an intrinsically interpretable CNN model that first processes the vehicle parts and the original image with a CNN encoder [290]. Next, an attention module uses the encoder output given the original image as queries, while the keys and the values are derived from the encoder output for the target parts. Finally, a convolution operation is applied to factor the contribution of the vehicle parts into the class logits, thus explaining the model prediction based on the different parts. All these works support that physics-aware ML and xAI could complement each other. Further design of these approaches and their extension to other types of RS data, such as optical or hyperspectral images, is a promising research direction for model improvement and may also constitute an important step towards consolidating the usage of explainable architectures.

Uncertainty

The uncertainty intervals give insight into the generalization and training process of the model, which is especially useful when dealing with extrapolation and anomaly detection. While explanations can give more detailed information about the models’ internal representations, they lack an important metric: model confidence or explanation reliability. Explanations which have a high variability or are inaccurate may lead to misinterpretations. The combination of uncertainty and xAI can enhance the information we can get from only one approach. For instance, [342] [342] and [343] [343] propose methods to provide uncertainty intervals around the explanations. RS has not been excluded from the convergence of these two disciplines [344, 345]. Through the adaptation of a perturbation method for object detection (D-RISE) and its combination with Deep Gaussian models, [344] get attribution maps for model uncertainty [344]. They show the efficiency of their approach on SAR object detection, where trustworthy predictions are especially needed since SAR images are hard to interpret for humans. An uncertainty aware, interpretable-by-design model frequently applied here is GPs [206, 253, 152, 151]. Further, [206] analytically derive the input features’ sensitivity to the variance estimate of a GP model that can be later used for uncertainty evaluation and selection of relevant features [206].

Hence, complementing xAI methods with uncertainty constitutes an interesting research direction to ensure more reliable explanations and better interpretation results.

Causal inference

Understanding causality in xAI is fundamental for uncovering the cause-and-effect relationships within the model’s decision-making process. Techniques like causal inference, counterfactual reasoning, and causal graphical models are employed to trace the causal relationships between input features and model predictions [346]. These methods aim to not just highlight correlations but elucidate the direct causal links, enabling better comprehension of how the AI system arrives at its decisions. Moreover, the integration of causal reasoning into interpretable ML models, like causal Bayesian networks or causal decision trees, has shown promise in elucidating causal links between input features and predictions [347]. In RS, causality involves understanding the cause-and-effect dynamics between environmental processes and the measurements acquired by RS instruments, with the aim of monitoring natural resources and ecosystems. Approaches such as structural equation modeling, directed acyclic graphs, and Granger causality have been utilized to untangle causal relationships within RS datasets [24]. These methods aim to identify causal pathways between environmental variables, allowing scientists to comprehend how changes in one variable may cause alterations in others. However, causality for explaining trained models in RS is yet to be extensively explored, with many opportunities ahead in environmental studies, land-use planning, disaster management, and climate change research.

VI-B2 Challenges from RS properties

xAI in CV applications does not account for RS image properties such as the existence of different sources, scales, geographic relationships, and temporal dependencies. The scale in RS describes the resolution and scope of input data [8]. While the scope determines the geographic extent, the resolution specifies to which degree information can be captured, defining the shape, granularity, and boundaries of the objects, among others. Additionally, some concepts or objects, like landcover or mountains, have no distinct boundaries or areas. Instead, changes are more continuous and have irregular shapes. Hence, spatial resolution automatically has semantic implications since the presence or absence of information in the data influences the ability to distinguish and interpret specific features or objects within the scene. This remains a challenge for RS and xAI [16]: most methods have problems with the high granularity of features, highlight an increased spatial extent, or are very noisy. New methods are constantly being developed to overcome these disadvantages. For instance, the well-known Grad-CAM has this problem, and many studies want to enhance its capabilities. Recent works leverage the lower-level feature representations to create much more fine-grained saliency maps [196, 192]. [202] addresses the challenge of scale by decoupling the xAI method from spatial scale, yielding in a permutation method which attributes across various distances [202]. The model can also already tackle the scaling problem: a scale-invariant autoencoder has been recently proposed in [348]. The scale of RS observations also determines the spectral resolution. Every sensor captures reflectance data across different wavelengths, extracting different ground-level information. Until now, a comprehensive xAI evaluation for multispectral and hyperspectral RS images is missing, taking into account the spectral property that distinguishes them from conventional images. Additionally, the current literature does not consider the varying spectra and spectral resolutions between the platforms. [197] use a 3D visualization technique originating from medical imaging, together with a CAM variant adapted for hyperspectral images, to show the importance of different spectral bands per pixel [197]. This provides insights both into the spatial and spectral attribution, representing an advantage over the current CAM variants. Current CAM methods only visualize the spatial attribution despite the importance of the different spectral bands for specific objects and environments.

Another challenge is the topology of the data [16]. Geographic relations can be a hidden confounder not captured in the data. For example, social structures are strongly tied to the location or have symbiotic relationships that exist in and between the biosphere, lithosphere, cryosphere, and hydrosphere. That makes analyzing and using xAI in RS difficult. Therefore, a proper set of training features is very important. One approach to tackling this challenge focuses on location-awareness [200], where the prototypes are specific to locations in the input. This assumes a static location in the input image and considers the relative location from the input. While this approach works for global images or images for the same location, it does not take into account changing locations. For example, there is a trade-off between the complexity or size of a model and the resolution of the RS images. Working with static locations and high-resolution images would require huge resources for the model. Moreover, the flexibility of using varying locations in the input and being location-aware could not be achieved. A different strategy that facilitates the learning of spatial relations based on a U-Net model is presented in [224] for monitoring the exposure to surface ozone. A grid cell is used as input to the U-Net model and consists of high-level features for a geographical area. Hence, this setup allows the usage of the IG attribution method to uncover the contributions of the neighboring areas for surface ozone pollution. Currently, there is an increasing interest in the integration of topological data analysis (TDA) [349] with ML [350, 351]. [352] attempts to discover the topology of the data in an unsupervised way by map** each pixel through a lens function, dividing it into subsets, and clustering within these subsets [352]. Edges are drawn between the subsets if they share at least one pixel. Different groups are generated according to the number of pixels in each subset and the connectivity, which classify the pixels into different categories. This approach is also applied in [353] to classify and hierarchically visualize the data in a dendrogram.

Furthermore, RS is often composed of sequential data. Since most xAI methods are developed for natural images, they do not account for temporal dependencies [354]. However, besides the standard methods, several possibilities designed for sequential data exist in the literature [355], one of them is the attention mechanism which we found to be frequently utilized for such problems The only other approach identified in this review that explicitly tackles the problem of time dependencies is presented by [228]. They learn regression coefficients per feature, time point, and location. Additionally, the spatiotemporal dependencies are disentangled with a random effect component where the latent variables follow a temporal Markovian process. This approach is evaluated for air quality estimation where the regression coefficients are used to reveal the temporal importance of various meteorological and landcover features. Besides these approaches, the other works typically rely on approaches like backpropagation that do not explicitly consider temporal relations. For instance, [256] aggregate saliency maps for individual time steps into cyclical saliency maps to identify sea surface temperature areas relevant for river flow prediction [256]. These are computed as an average of individual saliency maps over periods of months, seasons, or calendar years. The authors argue that such aggregation is meaningful in the climate context as it allows summarizing the spatial importance across different time frames while improving the robustness towards gradient fluctuation and noise. A similar strategy is also followed in [149], where a CNN is used on the temporal dimension and CAM is applied for attribution purposes.

VI-B3 Towards interpretable Deep Neural Networks

The initial research for interpreting NNs focused on develo** saliency map methods. In Section V-A, we have seen that this trend is also prominent in RS, as methods like Grad-CAM are among the most utilized xAI approaches for explaining DL models. The saliency maps highlight the relevant features without providing additional insights into how these features are used by the model [356]. Moreover, these highlighted features usually do not correspond to high-level concepts that humans can easily interpret [61]. In recent years, these limitations have been addressed with alternative explanation paradigms such as concept-based explanations and intrinsically interpretable DNNs [357].

The concept-based explanations described in Section IV-A3 enable global interpretation of the model workings in terms of intuitive high-level concepts understandable to humans. Amid the growing number of these approaches in CV [61, 62, 358, 356, 359], we did not identify any works in RS. The lack of usage of these approaches originates from the properties of RS data described in Section LABEL:sec:rs_properties_challenge. First, the concepts defined in CV range from simple primitives like colors and textures to more complex visual patterns like object parts [61, 62]. While simple visual primitives can also be used as concepts in RS, defining complex visual patterns requires modeling the topological relations inherent in RS data. In addition, concepts in CV are typically extracted from an external probe dataset and a recent study [360] recommends using a probe dataset with a similar distribution as the dataset used for the learning task. Hence, to ensure faithful concept explanations in RS, the probe dataset should conform to the topological relations of the dataset on which the ML model was trained.

Even though the concept-based explanations can offer more intuitive explanations than the saliency maps, they are still post-hoc methods, and as already described in Section IV-C, they are not always faithful to the model workings. Moreover, these methods often approximate the underlying model behavior and do not make the model more transparent. While mainly simple and easily understandable ML models are meant with interpretable models, we want to emphasize the efforts of trying to make DNNs inherently interpretable. Current xAI literature is going in that direction, either by introducing layers in the neural network, like the attention mechanism that can offer insights into the relevant features, or by designing intrinsically interpretable DNNs.

While transformers were developed for one-dimensional sequential inputs, their CV counterpart, ViTs [92], adapt their structure for the two dimensions of an image by splitting the image into smaller parts. Spatio-temporal inputs are often encountered in the EO domain. The Earthformer [361] tackles spatiotemporal inputs through cuboid attention, i.e., by applying self-attention on input tensors decomposed into local cuboids, and then attending to global cuboids which summarize the overall status of the system. Another ViT adaption to the RS domain is shown in [362], where a temporal transformer encoder followed by a spatial one is used to process all patches composing the time series. Additional ideas for ViTs relying on related concepts and suitable for RS space-time data can be found in [363, 364, 365, 366]. Further exploration of the attention mechanisms in these new transformer architectures is still missing in the EO literature and could be a promising direction for latent space analysis-based xAI. It should be noted that attention mechanisms are not solely used in transformers but also in CNNs and RNNs [209, 267, 268, 260, 240, 191, 290, 121].

Other intrinsically interpretable DNN approaches for generating ante-hoc explanations recently explored in RS are prototype networks, BagNets, and Graph Neural Networks. As described in Section IV-A3, prototype neural networks enforce a reasoning process that classifies input examples based on their similarity to prototypical parts of images of a given class. In this review, we have identified the following approaches [200, 168] that have already been described in Sections V-A2 and V-A1, respectively. A recent work introduces a more interpretable CNN model, the BagNet [87], which, in contrast to the traditional CNN, is only applied to a small and local receptive field. Therefore, the activations for each class in each receptive field are combined to provide a prediction by addition at the image level. The magnitude of the receptive fields’ activations corresponds to the importance of the features, and an attribution map can be directly compiled from the activations. Examples of this methodology applied to SAR vehicle detection are in [287, 292]. A similar approach integrates NNs in GAMs for landslide susceptibility prediction, where the NNs approximates the GAM’s spline functions. The model enhances its interpretability but leverages the capabilities of a NN [162]. Another study equipped a DL model with an interpretable-by-design component, a linear regression model is jointly training onto the same targets via backpropagation [122].

In contrast to the NN approaches, graphs provide an intuitive representation of many real-world problems regarding objects and their relations. GNNs emerged as a popular paradigm for learning high-level object representation by message-passing from the object neighbors [367]. This reasoning process enables GNNs to represent complex patterns of relationships between objects and generalizes their applicability to arbitrary geometric structures as opposed to CNN, which operate on two-dimensional grids (images) and one-dimensional sequences (text) [368]. Moreover, Graph Convolutional Networkss (GCNs) [369], which perform message-passing with the convolution operation, enable efficient semi-supervised learning in scenarios where not all objects in the graph are labeled. These properties have fostered the usage of GNNs in RS, especially in hyperspectral image classification tasks. Concretely, to overcome the limitation of the popular CNN approaches to capture the topological relations and the irregular object shapes inherent to hyperspectral data, various GCN approaches have been proposed [370, 371, 372], which demonstrate the potential of using GNNs for improving hyperspectral image classification results. Recent works have also tackled the scarcity of labels challenge with approaches based on GCN, such as [373, 374]. Although GNNs produce an intuitive representation of the problem, they are still considered black-box models [375]. A promising research direction is adopting the novel xAI methods for interpreting GNNs [376] to the approaches used in RS.

VI-B4 Restricted availability of labeled data

The high number of active airborne platforms like Sentinel-2 generates vast amounts of data. However, only a small portion of this data can be labeled. The lack of labels represents a challenge in the typical knowledge extraction pipeline in RS, which is based on training supervised ML algorithms on a labeled dataset. Hence, one recently popular approach to tackle this issue is to apply the Self-Supervised Learning (SSL) paradigm. SSL allows learning latent representations from the unlabeled data. These are useful in efficiently solving the downstream learning task for which only limited labels are available. SSL is achieved by optimizing a NN to solve a pretext task with pseudo-labels [377]. We have identified a promising research direction for improving model interpretability: combining the traditional ML modeling approaches with the recent DL approaches. A pretext task for the problem of SAR sea ice classification is proposed in [270], where the pseudo-labels are derived from a mixture of LDA topics. Topics are based on physical scattering properties extracted from unlabeled SAR images. Thus, the model is able to provide an interpretable representation of the sea-ice types in terms of their physical properties (e.g., water bodies and sea floating ice are represented with a similar set of properties that match the actual semantic definition of these classes). SSL pretraining also preserves the physics consistency of the features throughout the NN layers. As a result, the different-looking images of the same class are positioned close in the latent space, leading to improved classification results compared to a supervised CNN. [271] leverage contrastive learning to create prototypes for landcover classification [271]. Their online learning approach fuses the images with their enhanced counterparts at different resolutions. After a ResNet encodes them into a feature space, they get mapped into a unit sphere where the prototypes are clustered. Then, the resulting codes of the prototypes corresponding to the differently enhanced images are swapped to compute the loss. Another unsupervised prototype approach is provided in [150].

VI-B5 Lack of standardized and objective evaluation of xAI

As mentioned in Section LABEL:sec:rq4_recommended_partices, the qualification of the interpretability is essential for xAI approaches [84]. Contrary to this, there is no definition of a sufficient explanation or interpretation. Furthermore, no standardized and objective evaluation for xAI methods has been established, posing a challenge for potential partitioners outside of the xAI domain. Although metrics and tools for the evaluation exist, incorporating xAI into a ML pipeline would mean acquiring the domain knowledge from xAI to provide a good evaluation for the methods used by a RS partitioner. An alternative approach to the evaluation metrics is the conduction of user studies. These studies evaluate how useful the interpretation appears to the participants. However, these user studies can introduce subjectivity because preferences for interpretability can vary between different groups. Hence, conducting a well-designed user study involves a lot of effort, costs, and domain knowledge, which again would need to be acquired by the partitioner. Even though frameworks for xAI have been developed recently [378, 379], they are tailored towards a specific type of method or problem [378]. Others do focus more on the life-cycle of the systems than the evaluation [379]. In summary, the evaluation of xAI lacks a standardized methodology, potentially limiting non-experts applying the methods. This circumstance might have contributed to the large number of anecdotal evidence we encountered in this review.

VII Conclusion

This paper provides a detailed overview of the state-of-the-art of xAI in RS by conducting a systematic review of the existing work in the field. First, we collected a large set of publications by executing extensive search queries in the established literature databases for the last six years. Subsequently, we introduced a categorization for the existing xAI methods to structure these works. Our analysis reveals that a substantial amount of work is motivated by the assessment of the trustworthiness of the ML models for traditional EO tasks like landcover map** and agricultural monitoring. More recently, xAI has been increasingly utilized for the discovery of scientific insights for critical EO problems related to climate change, extreme events, or urbanization. Although dominantly established xAI methods like SHAP or Grad-CAM are frequently used, we observe the increased development of adapted xAI methods to capture the specifics of RS data. Further, we summarize the works combining xAI with other fields, like physics and uncertainty, that produce scientifically sound ML models, ultimately enhancing the quality of the extracted explanations.

These highlights clearly illustrate that xAI in RS is a young field with high potential to augment the ML knowledge extraction process. At the same time, comparing how the identified practices in this review relate to the latest developments in xAI and the current limitations in EO gives a hint to the fundamental challenges that need to be addressed in future work. One research direction of paramount importance is the development of interpretable DNNs to address the shortcomings of the widely-used saliency methods in RS. Further, as most studies conduct only anecdotal evaluation, verifying the reliability of the xAI outputs supports future work on quantitative evaluation and user studies. Moreover, the adapted methods indicate that the traditional xAI approaches do not conform to the properties of RS data. Therefore, we encourage future work on develo** models to address those drawbacks related to scale, topology, and temporal dependencies in RS data. Another challenge is the lack of labeled data, which is currently tackled by combining SSL with xAI to design approaches that outperform supervised ML models, additionally offering intuitive model interpretability. Summarizing our contributions, we hope that the insights provided in this review enable the researchers to better understand the state-of-the-art in this field and promote the development of novel methods by tackling the research directions proposed.

-A Search queries

Due to the implementation of the search in the databases, they cannot handle deep and nested search queries. For this reason, we repeated different versions of different words, like AI in ”explain* AI” and ”interpret* AI”.

-A1 IEEE

In the IEEE search, abstract, title, and keywords have the search terms ”Abstract”, ”Document Title”, and ”Index Terms”, respectively. Unfortunately, it is not possible to search with more than 8 wildcards (”*”) in one search query. Hence, we split the queries into different parts and merge the results in a second step. The search query cannot include the year or the language, so the results were filtered in the masks of their website.

⬇

("Abstract":earth observation OR "Abstract":remote sensing OR "Abstract":earth science OR "Abstract":satellite data OR "Abstract":satellite image OR "Abstract":aerial image OR "Abstract":aerial data OR "Abstract":airborne image OR "Abstract":airborne data OR "Abstract":radar image OR "Abstract":radar data OR "Abstract":spaceborne image OR "Abstract":spaceborne data OR "Abstract":LiDAR OR "Abstract":SAR OR "Abstract":UAV OR "Abstract":Sentinel OR "Abstract":Landsat OR "Abstract":MODIS OR "Abstract":gaofen OR "Abstract":ceres) AND ("Abstract":"xai" OR "Abstract":"interpret* model" OR "Abstract":"interpret* deep learning" OR "Abstract":"interpret* machine learning" OR "Abstract":"interpret* artificial intelligence" OR "Abstract":"interpret* dl" OR "Abstract":"interpret* ml")

("Abstract":earth observation OR "Abstract":remote sensing OR "Abstract":earth science OR "Abstract":satellite data OR "Abstract":satellite image OR "Abstract":aerial image OR "Abstract":aerial data OR "Abstract":airborne image OR "Abstract":airborne data OR "Abstract":radar image OR "Abstract":radar data OR "Abstract":spaceborne image OR "Abstract":spaceborne data OR "Abstract":LiDAR OR "Abstract":SAR OR "Abstract":UAV OR "Abstract":Sentinel OR "Abstract":Landsat OR "Abstract":MODIS OR "Abstract":gaofen OR "Abstract":ceres) AND ("Abstract":"interpret* AI" OR "Abstract":"explain* model" OR "Abstract":"explain* deep learning" OR "Abstract":"explain* machine learning" OR "Abstract":"explain* artificial intelligence" OR "Abstract":"explain* dl" OR "Abstract":"explain* ml" OR "Abstract":"explain* AI")

("Index Terms":earth observation OR "Index Terms":remote sensing OR "Index Terms":earth science OR "Index Terms":satellite data OR "Index Terms":satellite image OR "Index Terms":aerial image OR "Index Terms":aerial data OR "Index Terms":airborne image OR "Index Terms":airborne data OR "Index Terms":radar image OR "Index Terms":radar data OR "Index Terms":spaceborne image OR "Index Terms":spaceborne data OR "Index Terms":LiDAR OR "Index Terms":SAR OR "Index Terms":UAV OR "Index Terms":Sentinel OR "Index Terms":Landsat OR "Index Terms":MODIS OR "Index Terms":gaofen OR "Index Terms":ceres) AND ("Index Terms":"xai" OR "Index Terms":"interpret* model" OR "Index Terms":"interpret* deep learning" OR "Index Terms":"interpret* machine learning" OR "Index Terms":"interpret* artificial intelligence" OR "Index Terms":"interpret* dl" OR "Index Terms":"interpret* ml")

("Index Terms":earth observation OR "Index Terms":remote sensing OR "Index Terms":earth science OR "Index Terms":satellite data OR "Index Terms":satellite image OR "Index Terms":aerial image OR "Index Terms":aerial data OR "Index Terms":airborne image OR "Index Terms":airborne data OR "Index Terms":radar image OR "Index Terms":radar data OR "Index Terms":spaceborne image OR "Index Terms":spaceborne data OR "Index Terms":LiDAR OR "Index Terms":SAR OR "Index Terms":UAV OR "Index Terms":Sentinel OR "Index Terms":Landsat OR "Index Terms":MODIS OR "Index Terms":gaofen OR "Index Terms":ceres) AND ("Index Terms":"interpret* AI" OR "Index Terms":"explain* model" OR "Index Terms":"explain* deep learning" OR "Index Terms":"explain* machine learning" OR "Index Terms":"explain* artificial intelligence" OR "Index Terms":"explain* dl" OR "Index Terms":"explain* ml" OR "Index Terms":"explain* AI")

("Document Title":earth observation OR "Document Title":remote sensing OR "Document Title":earth science OR "Document Title":satellite data OR "Document Title":satellite image OR "Document Title":aerial image OR "Document Title":aerial data OR "Document Title":airborne image OR "Document Title":airborne data OR "Document Title":radar image OR "Document Title":radar data OR "Document Title":spaceborne image OR "Document Title":spaceborne data OR "Document Title":LiDAR OR "Document Title":SAR OR "Document Title":UAV OR "Document Title":Sentinel OR "Document Title":Landsat OR "Document Title":MODIS OR "Document Title":gaofen OR "Document Title":ceres) AND ("Document Title":"xai" OR "Document Title":"interpret* model" OR "Document Title":"interpret* deep learning" OR "Document Title":"interpret* machine learning" OR "Document Title":"interpret* artificial intelligence" OR "Document Title":"interpret* dl" OR "Document Title":"interpret* ml")

("Document Title":earth observation OR "Document Title":remote sensing OR "Document Title":earth science OR "Document Title":satellite data OR "Document Title":satellite image OR "Document Title":aerial image OR "Document Title":aerial data OR "Document Title":airborne image OR "Document Title":airborne data OR "Document Title":radar image OR "Document Title":radar data OR "Document Title":spaceborne image OR "Document Title":spaceborne data OR "Document Title":LiDAR OR "Document Title":SAR OR "Document Title":UAV OR "Document Title":Sentinel OR "Document Title":Landsat OR "Document Title":MODIS OR "Document Title":gaofen OR "Document Title":ceres) AND ("Document Title":"interpret* AI" OR "Document Title":"explain* model" OR "Document Title":"explain* deep learning" OR "Document Title":"explain* machine learning" OR "Document Title":"explain* artificial intelligence" OR "Document Title":"explain* dl" OR "Document Title":"explain* ml" OR "Document Title":"explain* AI")

-A2 Scopus

Scopus has the most powerful search engine for its database. Therefore, it was possible to specify the year of the publication and the language directly in the search query. The search term for abstract, title, and keywords is ”TITLE-ABS-KEY”.

⬇

TITLE-ABS-KEY((earth observation) OR (remote sensing) OR (earth science) OR (satellite data) OR (satellite image) OR (aerial image) OR (aerial data) OR (airborne image) OR (airborne data) OR (radar image) OR (radar data) OR (spaceborne image) OR (spaceborne data) OR (LiDAR) OR (SAR) OR (UAV) OR (Sentinel) OR (Landsat) OR (MODIS) OR (gaofen) OR (ceres)) AND TITLE-ABS-KEY("xai" OR "interpret* model" OR "interpret* deep learning" OR "interpret* machine learning" OR "interpret* artificial intelligence" OR "interpret* dl" OR "interpret* ml" OR "interpret* AI" OR "explain* model" OR "explain* deep learning" OR "explain* machine learning" OR "explain* artificial intelligence" OR "explain* dl" OR "explain* ml" OR "explain* AI") AND ( PUBYEAR > 2016 ) AND ( LIMIT-TO ( LANGUAGE,"English" ) )

-A3 Springer

The Springer link search could not be done only on abstract, title, and keywords. Hence, a full-text search was done. The search query cannot include the year or the language, so the results were filtered in the masks of their website.

⬇

(earth observation OR remote sensing OR earth science OR satellite data OR satellite image OR aerial image OR aerial data OR airborne image OR airborne data OR radar image OR radar data OR spaceborne image OR spaceborne data OR LiDAR OR SAR OR UAV OR Sentinel OR Landsat OR MODIS OR gaofen OR ceres) AND ("xai" OR "interpret* model" OR "interpret* deep learning" OR "interpret* machine learning" OR "interpret* artificial intelligence" OR "interpret* dl" OR "interpret* ml" OR "interpret* AI" OR "explain* model" OR "explain* deep learning" OR "explain* machine learning" OR "explain* artificial intelligence" OR "explain* dl" OR "explain* ml" OR "explain* AI")

-A4 Search query, ML in EO

This search query was executed in all databases for comparison purposes, its results are shown in Figure 1.

⬇

\Bigl{[}

earth observation OR remote sensing OR earth science OR

\bigl{(}

(satellite OR aerial OR airborne OR spaceborne OR radar) AND (image OR data)

\bigl{)}

OR LiDAR OR SAR OR UAV OR Sentinel OR Landsat OR MODIS OR gaofen OR ceres

\Bigl{]}

AND

\Bigl{[}

deep learning OR machine learning OR artificial intelligence OR dl OR ml OR ai

\Bigl{]}

-B Models, xAI categories and methods

Figure LABEL:fig:xai_models_categories_methods shows the number of times xAI categories, methods, and models are mentioned throughout this review.

-C Glossary

We have categorized related EO tasks into groups to provide a better overview. Here, we provide a glossary of these groups and the tasks they include.

Agricultural Monitoring:: All crop-related tasks, like crop yield prediction, crop type classification, irrigation scheme classification, and crop lodging detection.
Atmosphere Monitoring:: The prediction of atmospheric phenomena, like air quality, aerosol optical depth, and dust storm indices.
Building Map**:: All tasks related to buildings and urban structures, like building footprint classification and building damage map**.
Ecosystem Interactions:: All interactions of the ecosystem with other systems, e.g., the atmosphere and the hydrosphere. This includes the ecosystem CO2 exchange or the sun-induced fluorescence prediction.
Human Environment Interaction:: The monitoring of human structures and the environment, like human footprint estimation, socioeconomic status estimation, or well-being prediction.
Hydrology Monitoring:: Tasks related to hydrology, like runoff forecasting, water quality, streamflow prediction, and water segmentation, but excluding floods.
Landcover Map**:: The most common EO task includes mainly landcover classification but also related tasks like slum map**.
Natural Hazard Monitoring:: Monitoring of natural hazards, like landslides, wildfires, floods, earthquakes, and volcanos.
Soil Monitoring:: Monitoring soil properties, like soil texture, respiration, moisture, and salinity.
Surface Temperature Prediction:: The prediction of the Earth’s surface temperature.
Target Map**:: Tasks related to the map** of specific targets, like vehicles and objects.
Vegetation Monitoring:: Monitoring of vegetation, excluding crops, like vegetation regeneration, tree monitoring, tree classification, and tree map**.
Weather Climate Prediction:: Forecasting of weather and climate variables, like precipitation, temperature, and drought.
Other:: All the tasks which did not fit into the other groups. This includes change detection, urban mobility, sea ice classification, mosquito modeling, and satellite product quality tasks.

Acronyms

AI: Artificial Intelligence
ALE: Accumulated Local Effects
ASC: attribute scattering center
BP: Backpropagation
CAM: Class Activation Map**
CAP: Common Agricultural Policy
CNN: Convolutional Neural Network
CV: Computer Vision
DeepLIFT: Deep Learning Important FeaTures
DEM: Digital Elevation Model
DL: Deep Learning
DNN: Deep Neural Network
DSM: Digital Surface Model
DTD: Deep Taylor Decomposition
EBM: Explainable Boosting Machines
EG: Expected Gradients
EO: Earth Observation
FLS: Fuzzy Logic System
FNN: Feed Forward Neural Network
GAM: Generalized Additive Model
GAN: Generative Adversarial Network
GB: Gradient Boosting
GCN: Graph Convolutional Networks
GLM: Generalized Linear Model
GNN: Graph Neural Networks
GP: Gaussian Process
Grad-CAM: Gradient-weighted Class Activation Map**
GRU: Gated Recurrent Unit
I*G: Input*Gradient
IG: Integrated Gradients
IML: interpretable Machine Learning
KL: Kullback-Leiber
kNN: k-Nearest-Neighbor
LDA: Latent Dirichlet Allocation
LiDAR: Light Detection and Ranging
LIME: Local Interpretable Model-agnostic Explanation
LR: Linear Regression
LRP: Layer-wise Relevance Propagation
LSTM: Long Short-Term Memory neural network
MDI: Mean Decrease in Impurity
ML: Machine Learning
MLP: Multilayer Perceptron
MoRef: Most Relevant First
NDVI: Normalized Difference Vegetation Index
NN: Neural Network
OOD: Out-Of-Distribution
OWA: Ordered Weighted Averaging
PCA: Principal Component Analysis
PDP: Partial Dependence Plot
PFI: Permutation Feature Importance
PolSAR: Polarimetric SAR
RAM: Regression Activation Map**
RF: Random Forest
RGB: Red-Green-Blue
RNN: Recurrent Neural Network
RS: Remote Sensing
SAR: Synthetic Aperture Radar
SHAP: SHapley Additive exPlanations
SmoothGrad: Smooth Gradient
SOTA: state-of-the-art
SSL: Self-Supervised Learning
SST: Sea Surface Temperature
SVM: Support Vector Machine
TCAV: Testing with Concept Activation Vectors
TDA: topological data analysis
t-SNE: t-distributed Stochastic Neighbor Embedding
UAV: Unmanned Aerial Vehicles
UMAP: Uniform Manifold Approximation and Projection
ViT: Vision Transformer
xAI: explainable AI
XGBoost: eXtreme Gradient Boosting

References

[1] Gustau Camps-Valls, Devis Tuia, Xiao Xiang Zhu and Markus Reichstein “Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences” In Deep Learning for the Earth Sciences 1, 2021
[2] Xiao Xiang Zhu et al. “Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources” In IEEE Geoscience and Remote Sensing Magazine 5.4 Institute of Electrical and Electronics Engineers Inc., 2017, pp. 8–36 DOI: 10.1109/MGRS.2017.2762307
[3] Markus Reichstein et al. “Deep Learning and Process Understanding for Data-Driven Earth System Science” In Nature 566.7743 Nature Publishing Group, 2019, pp. 195–204 DOI: 10.1038/s41586-019-0912-1
[4] Hannah Ruschemeier “AI as a challenge for legal regulation–the scope of application of the artificial intelligence act proposal” In ERA Forum, 2023, pp. 1–16 Springer
[5] Sarvam P TerKonda and Eric M Fish “Artificial intelligence viewed through the lens of state regulation” In Intelligence-Based Medicine Elsevier, 2023, pp. 100088
[6] Devis Tuia et al. “Toward a Collective Agenda on AI for Earth Science Data Analysis” In IEEE Geoscience and Remote Sensing Magazine 9.2, 2021, pp. 88–104 DOI: 10.1109/MGRS.2020.3043504
[7] Esther Rolf, Konstantin Klemmer, Caleb Robinson and Hannah Kerner “Mission Critical – Satellite Data Is a Distinct Modality in Machine Learning” arXiv, 2024 arXiv:2402.01444 [cs]
[8] Michael F. Goodchild “Scale in GIS: An Overview” In Geomorphology 130.1, Scale Issues in Geomorphology, 2011, pp. 5–9 DOI: 10.1016/j.geomorph.2010.10.004
[9] Thomas Lillesand, Ralph W Kiefer and Jonathan Chipman “Remote sensing and image interpretation” John Wiley & Sons, 2015
[10] Gabrielle Ras, Ning Xie, Marcel Gerven and Derek Doran “Explainable Deep Learning: A Field Guide for the Uninitiated” In Journal of Artificial Intelligence Research 73, 2022, pp. 329–396 DOI: 10.1613/jair.1.13200
[11] Dang Minh, H Xiang Wang, Y Fen Li and Tan N Nguyen “Explainable artificial intelligence: a comprehensive review” In Artificial Intelligence Review Springer, 2022, pp. 1–66
[12] Timo Speith “A review of taxonomies of explainable artificial intelligence (XAI) methods” In 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 2239–2250
[13] Caroline M. Gevaert “Explainable AI for Earth Observation: A Review Including Societal and Regulatory Perspectives” In International Journal of Applied Earth Observation and Geoinformation 112, 2022, pp. 102869 DOI: 10.1016/j.jag.2022.102869
[14] Carolin Leluschko and Christoph Tholen “Goals and Stakeholder Involvement in XAI for Remote Sensing: A Structured Literature Review” In Artificial Intelligence XL, Lecture Notes in Computer Science Cham: Springer Nature Switzerland, 2023, pp. 519–525 DOI: 10.1007/978-3-031-47994-6˙47
[15] Ola Hall, Mattias Ohlsson and Thorsteinn Rögnvaldsson “A Review of Explainable AI in the Satellite Data, Deep Machine Learning, and Human Poverty Domain” In Patterns 3.10, 2022, pp. 100600 DOI: 10.1016/j.patter.2022.100600
[16] ** Xing and Renee Sieber “The Challenges of Integrating Explainable Artificial Intelligence into GeoAI” In Transactions in GIS 27.3, 2023, pp. 626–645 DOI: 10.1111/tgis.13045
[17] R. Roscher, B. Bohn, M.F. Duarte and J. Garcke “Explain It to Me-Facing Remote Sensing Challenges in the Bio-and Geosciences with Explainable Machine Learning” 5, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020, pp. 817–824 DOI: 10.5194/isprs-Annals-V-3-2020-817-2020
[18] Ribana Roscher, Bastian Bohn, Marco F. Duarte and Jochen Garcke “Explainable Machine Learning for Scientific Insights and Discoveries” In IEEE Access 8 Institute of Electrical and Electronics Engineers Inc., 2020, pp. 42200–42216 DOI: 10.1109/ACCESS.2020.2976199
[19] Christoph Molnar “Interpretable Machine Learning. A Guide for Making Black Box Models Explainable.” In Book, 2019 URL: https://christophm.github.io/interpretable-ml-book
[20] Anuj Karpatne et al. “Machine Learning for the Geosciences: Challenges and Opportunities” In IEEE Transactions on Knowledge and Data Engineering 31.8 IEEE Computer Society, 2019, pp. 1544–1554 DOI: 10.1109/TKDE.2018.2861006
[21] A. Mamalakis, I. Ebert-Uphoff and E.A. Barnes “Explainable Artificial Intelligence in Meteorology and Climate Science: Model Fine-Tuning, Calibrating Trust and Learning New Science” In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 13200 LNAI, 2022, pp. 315–339 DOI: 10.1007/978-3-031-04083-2˙16
[22] George Em Karniadakis et al. “Physics-Informed Machine Learning” In Nature Reviews Physics 3.6, 2021, pp. 422–440 DOI: 10.1038/s42254-021-00314-5
[23] Jakob Gawlikowski et al. “A Survey of Uncertainty in Deep Neural Networks” arXiv, 2022 DOI: 10.48550/arXiv.2107.03342
[24] Adrian Perez-Suay and Gustau Camps-Valls “Causal Inference in Geoscience and Remote Sensing from Observational Data” In IEEE Transactions on Geoscience and Remote Sensing 57.3 Institute of Electrical and Electronics Engineers Inc., 2019, pp. 1502–1513 DOI: 10.1109/TGRS.2018.2867002
[25] Jakob Runge et al. “Inferring Causation from Time Series in Earth System Sciences” In Nature Communications 10.1 Nature Publishing Group, 2019, pp. 2553 DOI: 10.1038/s41467-019-10105-3
[26] Matthew J. Page et al. “PRISMA 2020 Explanation and Elaboration: Updated Guidance and Exemplars for Reporting Systematic Reviews” In BMJ 372 British Medical Journal Publishing Group, 2021, pp. n160 DOI: 10.1136/bmj.n160
[27] Michael Gusenbauer and Neal Haddaway “Which Academic Search Systems Are Suitable for Systematic Reviews or Meta-Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 Other Resources [OPEN ACCESS]” In Research Synthesis Methods 11, 2020, pp. 181–217 DOI: 10.1002/jrsm.1378
[28] Giulia Vilone and Luca Longo “Explainable Artificial Intelligence: A Systematic Review” arXiv, 2020 arXiv:2006.00093 [cs]
[29] Hans Hersbach et al. “The ERA5 Global Reanalysis” In Quarterly Journal of the Royal Meteorological Society 146.730, 2020, pp. 1999–2049 DOI: 10.1002/qj.3803
[30] Amina Adadi and Mohammed Berrada “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)” In IEEE Access 6, 2018, pp. 52138–52160 DOI: 10.1109/ACCESS.2018.2870052
[31] Andreas Holzinger et al. “Explainable AI methods-a brief overview” In xxAI-Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, 2022, pp. 13–38 Springer
[32] Wojciech Samek et al. “Explaining deep neural networks and beyond: A review of methods and applications” In Proceedings of the IEEE 109.3 IEEE, 2021, pp. 247–278
[33] Scott M. Lundberg et al. “From Local Explanations to Global Understanding with Explainable AI for Trees” In Nature Machine Intelligence 2.1 Nature Publishing Group, 2020, pp. 56–67 DOI: 10.1038/s42256-019-0138-9
[34] Mattia Setzu et al. “GLocalX - From Local to Global Explanations of Black Box AI Models” In Artificial Intelligence 294, 2021, pp. 103457 DOI: 10.1016/j.artint.2021.103457
[35] Huan Liu “Feature Selection” In Encyclopedia of Machine Learning Boston, MA: Springer US, 2010, pp. 402–406 DOI: 10.1007/978-0-387-30164-8“˙306
[36] Dumitru Erhan, Aaron Courville, Yoshua Bengio and P O Box “Understanding Representations Learned in Deep Architectures”, 2010
[37] K Simonyan, A Vedaldi and A Zisserman “Deep inside convolutional networks: visualising image classification models and saliency maps” In Proceedings of the International Conference on Learning Representations (ICLR), 2014 ICLR
[38] Mukund Sundararajan, Ankur Taly and Qiqi Yan “Axiomatic Attribution for Deep Networks” In Proceedings of the 34th International Conference on Machine Learning PMLR, 2017, pp. 3319–3328
[39] Matthew D. Zeiler, Graham W. Taylor and Rob Fergus “Adaptive Deconvolutional Networks for Mid and High Level Feature Learning” In 2011 International Conference on Computer Vision, 2011, pp. 2018–2025 DOI: 10.1109/ICCV.2011.6126474
[40] Bolei Zhou et al. “Learning deep features for discriminative localization” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929
[41] Ramprasaath R. Selvaraju et al. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization” In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017
[42] Sebastian Bach et al. “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation” In PloS one 10.7 Public Library of Science San Francisco, CA USA, 2015, pp. e0130140
[43] Matthew D Zeiler and Rob Fergus “Visualizing and understanding convolutional networks” In Computer Vision–ECCV 2014, 2014, pp. 818–833 Springer
[44] Jerome H Friedman “Greedy function approximation: a gradient boosting machine” In Annals of statistics Institute of Mathematical Statistics, 2001, pp. 1189–1232
[45] Daniel W Apley and **gyu Zhu “Visualizing the effects of predictor variables in black box supervised learning models” In Journal of the Royal Statistical Society Series B: Statistical Methodology 82.4 Oxford University Press, 2020, pp. 1059–1086
[46] Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin “”Why Should I Trust You?”: Explaining the Predictions of Any Classifier” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining San Francisco California USA: ACM, 2016, pp. 1135–1144 DOI: 10.1145/2939672.2939778
[47] Scott M Lundberg and Su-In Lee “A Unified Approach to Interpreting Model Predictions” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017
[48] W.James Murdoch and Arthur Szlam “Automatic Rule Extraction from Long Short Term Memory Networks” In International Conference on Learning Representations, 2017 URL: https://openreview.net/forum?id=SJvYgH9xe
[49] Michael Harradon, Jeff Druce and Brian Ruttenberg “Causal learning and explanation of deep neural networks via autoencoded activations” In arXiv preprint arXiv:1802.00541, 2018
[50] Nicholas Frosst and Geoffrey Hinton “Distilling a neural network into a soft decision tree” In arXiv preprint arXiv:1711.09784, 2017
[51] Sarah Tan et al. “Considerations when learning additive explanations for black-box models” In arXiv preprint arXiv:1801.08640, 2018
[52] Quanshi Zhang, Yu Yang, Haotian Ma and Ying Nian Wu “Interpreting cnns via decision trees” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6261–6270
[53] Quanshi Zhang et al. “Interpreting CNN knowledge via an explanatory graph” In Proceedings of the AAAI conference on artificial intelligence 32.1, 2018
[54] Quanshi Zhang, Ruiming Cao, Ying Nian Wu and Song-Chun Zhu “Growing interpretable part graphs on convnets via multi-shot learning” In Proceedings of the AAAI Conference on Artificial Intelligence 31.1, 2017
[55] Trevor Hastie, Robert Tibshirani, Jerome H Friedman and Jerome H Friedman “The elements of statistical learning: data mining, inference, and prediction” Springer, 2009
[56] J Dobson Annette “Introduction to generalized linear models” Chapman & Hall CRC, 2018
[57] Trevor J Hastie “Generalized additive models” In Statistical models in S Routledge, 1992, pp. 249–307
[58] David M Blei, Andrew Y Ng and Michael I Jordan “Latent dirichlet allocation” In Journal of machine Learning research 3.Jan, 2003, pp. 993–1022
[59] Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio “Neural machine translation by jointly learning to align and translate” In arXiv preprint arXiv:1409.0473, 2014
[60] Ashish Vaswani et al. “Attention is all you need” In Advances in neural information processing systems 30, 2017
[61] Been Kim et al. “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)” In International conference on machine learning, 2018, pp. 2668–2677 PMLR
[62] Amirata Ghorbani, James Wexler, James Y Zou and Been Kim “Towards automatic concept-based explanations” In Advances in Neural Information Processing Systems 32, 2019
[63] Pang Wei Koh et al. “Concept bottleneck models” In International conference on machine learning, 2020, pp. 5338–5348 PMLR
[64] Diego Marcos, Sylvain Lobry and Devis Tuia “Semantically Interpretable Activation Maps: what-where-how explanations within CNNs” In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 4207–4215 IEEE
[65] Jacob Bien and Robert Tibshirani “Prototype selection for interpretable classification”, 2011
[66] Chaofan Chen et al. “This looks like that: deep learning for interpretable image recognition” In Advances in neural information processing systems 32, 2019
[67] Meike Nauta, Ron Van Bree and Christin Seifert “Neural prototype trees for interpretable fine-grained image recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14933–14943
[68] **kyu Kim et al. “Textual explanations for self-driving vehicles” In Proceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578
[69] Sandra Wachter, Brent Mittelstadt and Chris Russell “Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR” In Harvard Journal of Law & Technology (Harvard JOLT) 31, 2017, pp. 841
[70] Tomas Mikolov et al. “Distributed Representations of Words and Phrases and Their Compositionality” In Advances in Neural Information Processing Systems 26 Curran Associates, Inc., 2013
[71] Maximilian Kohlbrenner et al. “Towards best practice in explaining neural network decisions with LRP” In 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–7 IEEE
[72] Grégoire Montavon et al. “Layer-wise relevance propagation: an overview” In Explainable AI: interpreting, explaining and visualizing deep learning Springer, 2019, pp. 193–209
[73] Leila Arras et al. “Explaining and Interpreting LSTMs” In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11700 LNCS Springer Verlag, 2019, pp. 211–238 DOI: 10.1007/978-3-030-28954-6“˙11/FIGURES/7
[74] Aaron Fisher, Cynthia Rudin and Francesca Dominici “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously” In Journal of Machine Learning Research 20.177, 2019, pp. 1–81
[75] Lloyd S Shapley “A Value for n-Person Games” In Contributions to the Theory of Games II Princeton: Princeton University Press, 1953, pp. 307–317
[76] Scott M Lundberg, Gabriel G Erion and Su-In Lee “Consistent individualized feature attribution for tree ensembles” In arXiv preprint arXiv:1802.03888, 2018
[77] M.Gethsiyal Augasta and T. Kathirvalavakumar “Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems” In Neural Processing Letters 35.2, 2012, pp. 131–150 DOI: 10.1007/s11063-011-9207-8
[78] Zhi-Hua Zhou and Yuan Jiang “Medical Diagnosis with C4.5 Rule Preceded by Artificial Neural Network Ensemble” In IEEE Transactions on Information Technology in Biomedicine 7.1, 2003, pp. 37–42 DOI: 10.1109/TITB.2003.808498
[79] Xuan Liu, Xiaoguang Wang and Stan Matwin “Improving the Interpretability of Deep Neural Networks with Knowledge Distillation” In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018, pp. 905–912 DOI: 10.1109/ICDMW.2018.00132
[80] David Alvarez-Melis and Tommi S. Jaakkola “A Causal Framework for Explaining the Predictions of Black-Box Sequence-to-Sequence Models” arXiv, 2017 arXiv:1707.01943 [cs]
[81] Bernadette Bouchon-Meunier and Christophe Marsala “Learning Fuzzy Decision Rules” In Fuzzy Sets in Approximate Reasoning and Information Systems, The Handbooks of Fuzzy Sets Series Boston, MA: Springer US, 1999, pp. 279–304 DOI: 10.1007/978-1-4615-5243-7“˙5
[82] Lotfi A Zadeh “Fuzzy sets” In Information and control 8.3 Elsevier, 1965, pp. 338–353
[83] Leo Breiman, Jerome H Friedman, Richard A Olshen and Charles J Stone “Classification and regression trees” In Wadsworth, Belmont, CA, 1984
[84] Zachary C Lipton “The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability Is Both Important and Slippery.” In Queue 16.3 ACM New York, NY, USA, 2018, pp. 31–57
[85] Laurens Van der Maaten and Geoffrey Hinton “Visualizing data using t-SNE.” In Journal of machine learning research 9.11, 2008
[86] Leland McInnes, John Healy, Nathaniel Saul and Lukas Großberger “UMAP: Uniform Manifold Approximation and Projection” In Journal of Open Source Software 3.29, 2018
[87] Wieland Brendel and Matthias Bethge “Approximating CNNs with Bag-of-local-Features Models Works Surprisingly Well on ImageNet” In International Conference on Learning Representations, 2019
[88] Tim Miller “Contrastive Explanation: A Structural-Model Approach” In The Knowledge Engineering Review 36 Cambridge University Press, 2021, pp. e14 DOI: 10.1017/S0269888921000102
[89] Peng-Tao Jiang et al. “LayerCAM: Exploring Hierarchical Class Activation Maps for Localization” In IEEE Transactions on Image Processing 30, 2021, pp. 5875–5888 DOI: 10.1109/TIP.2021.3089943
[90] Avanti Shrikumar, Peyton Greenside and Anshul Kundaje “Learning important features through propagating activation differences” In International conference on machine learning, 2017, pp. 3145–3153 PMLR
[91] Gianni Brauwers and Flavius Frasincar “A general survey on attention mechanisms in deep learning” In IEEE Transactions on Knowledge and Data Engineering IEEE, 2021
[92] Alexey Dosovitskiy et al. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 OpenReview.net, 2021 URL: https://openreview.net/forum?id=YicbFdNTTy
[93] Petar Veličković et al. “Graph Attention Networks” In 6th International Conference on Learning Representations, 2017
[94] Kevin Clark, Urvashi Khandelwal, Omer Levy and Christopher D. Manning “What Does BERT Look at? An Analysis of BERT’s Attention” In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP Florence, Italy: Association for Computational Linguistics, 2019, pp. 276–286 DOI: 10.18653/v1/W19-4828
[95] Mathilde Caron et al. “Emerging properties in self-supervised vision transformers” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660
[96] Sneha Chaudhari, Varun Mithal, Gungor Polatkan and Rohan Ramanath “An Attentive Survey of Attention Models” In ACM Transactions on Intelligent Systems and Technology (TIST) 12.5 ACM New York, NY, 2021, pp. 1–32
[97] Finale Doshi-Velez and Been Kim “Towards a rigorous science of interpretable machine learning” In arXiv preprint arXiv:1702.08608, 2017
[98] Meike Nauta et al. “From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI” Just Accepted In ACM Comput. Surv. New York, NY, USA: Association for Computing Machinery, 2023 DOI: 10.1145/3583558
[99] Julius Adebayo et al. “Sanity Checks for Saliency Maps” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
[100] Umang Bhatt, Adrian Weller and José M.F. Moura “Evaluating and Aggregating Feature-based Model Explanations” Main track In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 3016–3022 DOI: 10.24963/ijcai.2020/417
[101] Anna Hedström et al. “Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond” In Journal of Machine Learning Research 24.34, 2023, pp. 1–11
[102] Chih-Kuan Yeh et al. “On the (in) fidelity and sensitivity of explanations” In Advances in Neural Information Processing Systems 32, 2019
[103] Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans and Been Kim “A benchmark for interpretability methods in deep neural networks” In Advances in neural information processing systems 32, 2019
[104] Yao Rong et al. “A consistent and efficient evaluation strategy for attribution methods” In arXiv preprint arXiv:2202.00449, 2022
[105] Harini Suresh, Kathleen M Lewis, John Guttag and Arvind Satyanarayan “Intuitively assessing ml model reliability through example-based explanations and editing model inputs” In 27th International Conference on Intelligent User Interfaces, 2022, pp. 767–781
[106] Ahmed Alqaraawi et al. “Evaluating saliency map explanations for convolutional neural networks: a user study” In Proceedings of the 25th International Conference on Intelligent User Interfaces, 2020, pp. 275–285
[107] Yao Rong et al. “Towards Human-Centered Explainable AI: A Survey of User Studies for Model Explanations” In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, pp. 1–20 DOI: 10.1109/TPAMI.2023.3331846
[108] Leander Weber, Sebastian Lapuschkin, Alexander Binder and Wojciech Samek “Beyond explaining: Opportunities and challenges of XAI-based model improvement” In Information Fusion 92 Elsevier, 2023, pp. 154–176
[109] Laura N. Sotomayor, Matthew J. Cracknell and Robert Musk “Supervised Machine Learning for Predicting and Interpreting Dynamic Drivers of Plantation Forest Productivity in Northern Tasmania, Australia” In Computers and Electronics in Agriculture 209, 2023 DOI: 10.1016/j.compag.2023.107804
[110] Haifei Chen, Li** Yang and Qiusheng Wu “Enhancing Land Cover Map** and Monitoring: An Interactive and Explainable Machine Learning Approach Using Google Earth Engine” In Remote Sensing 15.18, 2023 DOI: 10.3390/rs15184585
[111] T. Fisher et al. “Uncertainty-Aware Interpretable Deep Learning for Slum Map** and Monitoring” In Remote Sensing 14.13, 2022 DOI: 10.3390/rs14133072
[112] Sadeeka Layomi Jayasinghe and Lalit Kumar “Causes of Tea Land Dynamics in Sri Lanka between 1995 and 2030” In Regional Environmental Change 23.4, 2023, pp. 127 DOI: 10.1007/s10113-023-02123-1
[113] Z. Li et al. “Advancing Satellite Precipitation Retrievals with Data Driven Approaches: Is Black Box Model Explainable?” In Earth and Space Science 8.2, 2021 DOI: 10.1029/2020EA001423
[114] S.S. Matin and B. Pradhan “Earthquake-Induced Building-Damage Map** Using Explainable Ai (Xai)” In Sensors 21.13, 2021 DOI: 10.3390/s21134489
[115] Carles Milà et al. “Estimating Daily Air Temperature and Pollution in Catalonia: A Comprehensive Spatiotemporal Modelling of Multiple Exposures” In Environmental Pollution 337, 2023 DOI: 10.1016/j.envpol.2023.122501
[116] Oladimeji Mudele et al. “Modeling the Temporal Population Distribution of Ae. Aegypti Mosquito Using Big Earth Observation Data” In IEEE access : practical innovations, open solutions 8, 2020, pp. 14182–14194 DOI: 10.1109/ACCESS.2020.2966080
[117] S.J. Newman and R.T. Furbank “Explainable Machine Learning Models of Major Crop Traits from Satellite-Monitored Continent-Wide Field Trial Data” In Nature Plants 7.10, 2021, pp. 1354–1363 DOI: 10.1038/s41477-021-01001-0
[118] A. Orynbaikyzy, U. Gessner, B. Mack and C. Conrad “Crop Type Classification Using Fusion of Sentinel-1 and Sentinel-2 Data: Assessing the Impact of Feature Selection, Optical Data Availability, and Parcel Sizes on the Accuracies” In Remote Sensing 12.17, 2020 DOI: 10.3390/RS12172779
[119] J.N.S. Rubí and Paulo R.L. Gondim “A Performance Comparison of Machine Learning Models for Wildfire Occurrence Risk Prediction in the Brazilian Federal District Region” In Environment Systems and Decisions, 2023 DOI: 10.1007/s10669-023-09921-2
[120] Yarui Wu, Honglei Liu, Shuangyue Liu and Chunhui Lou “Estimate of Near-Surface NO2 Concentrations in Fenwei Plain, China, Based on TROPOMI Data and Random Forest Model” In Environmental Monitoring and Assessment 195.11, 2023, pp. 1379 DOI: 10.1007/s10661-023-11993-1
[121] J. Xu et al. “Towards Interpreting Multi-Temporal Deep Learning Models in Crop Map**” In Remote Sensing of Environment 264, 2021 DOI: 10.1016/j.rse.2021.112599
[122] X. Yan et al. “A Spatial-Temporal Interpretable Deep Learning Model for Improving Interpretability and Predictive Accuracy of Satellite-Based PM2.5” In Environmental Pollution 273, 2021 DOI: 10.1016/j.envpol.2021.116459
[123] B. Chen et al. “Estimation of Atmospheric PM10 Concentration in China Using an Interpretable Deep Learning Model and Top-of-the-Atmosphere Reflectance Data from China’s New Generation Geostationary Meteorological Satellite, FY-4A” In Journal of Geophysical Research: Atmospheres 127.9, 2022 DOI: 10.1029/2021JD036393
[124] Bin Chen et al. “Estimation of Near-Surface Ozone Concentration and Analysis of Main Weather Situation in China Based on Machine Learning Model and Himawari-8 TOAR Data” In Science of the Total Environment 864 Elsevier B.V., 2023 DOI: 10.1016/j.scitotenv.2022.160928
[125] Bin Chen et al. “Exploring High-Resolution near-Surface CO Concentrations Based on Himawari-8 Top-of-Atmosphere Radiation Data: Assessing the Distribution of City-Level CO Hotspots in China” In Atmospheric Environment 312, 2023 DOI: 10.1016/j.atmosenv.2023.120021
[126] Ozlem Sen and Hacer Yalim Keles “On the Evaluation of CNN Models in Remote-Sensing Scene Classification Domain” In PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science 88.6, 2020, pp. 477–492 DOI: 10.1007/s41064-020-00129-6
[127] Bhavan Vasu and Andreas Savakis “Resilience and Plasticity of Deep Network Interpretations for Aerial Imagery” In IEEE access : practical innovations, open solutions 8, 2020, pp. 127491–127506 DOI: 10.1109/ACCESS.2020.3008323
[128] Anastasios Temenos et al. “Interpretable Deep Learning Framework for Land Use and Land Cover Classification in Remote Sensing Using SHAP” In IEEE Geoscience and Remote Sensing Letters Institute of Electrical and Electronics Engineers Inc., 2023, pp. 1–1 DOI: 10.1109/LGRS.2023.3251652
[129] S.-C. Hung, H.-C. Wu and M.-H. Tseng “Remote Sensing Scene Classification and Explanation Using RSSCNet and LIME” In Applied Sciences (Switzerland) 10.18, 2020 DOI: 10.3390/app10186151
[130] S.N. Elliott, A.J.B. Shields, E.M. Klaehn and I. Tien “Identifying Critical Infrastructure in Imagery Data Using Explainable Convolutional Neural Networks” In Remote Sensing 14.21, 2022 DOI: 10.3390/rs14215331
[131] Antonio Manuel Burgueño et al. “Scalable Approach for High-Resolution Land Cover: A Case Study in the Mediterranean Basin” In Journal of Big Data 10.1, 2023 DOI: 10.1186/s40537-023-00770-z
[132] Alexander Brenning “Interpreting Machine-Learning Models in Transformed Feature Space with an Application to Remote-Sensing Classification” In Machine Learning 112.9, 2023, pp. 3455–3471 DOI: 10.1007/s10994-023-06327-8
[133] Cassio F. Dantas, Thalita F. Drumond, Diego Marcos and Dino Ienco “Counterfactual Explanations for Remote Sensing Time Series Data: An Application to Land Cover Classification” In Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track, Lecture Notes in Computer Science Cham: Springer Nature Switzerland, 2023, pp. 20–36 DOI: 10.1007/978-3-031-43430-3˙2
[134] Shin-nosuke Ishikawa et al. “Example-Based Explainable AI and Its Application for Remote Sensing Image Classification” In International Journal of Applied Earth Observation and Geoinformation 118 Elsevier B.V., 2023 DOI: 10.1016/j.jag.2023.103215
[135] Pattathal V. Arun and Arnon Karnieli “Learning of Physically Significant Features from Earth Observation Data: An Illustration for Crop Classification and Irrigation Scheme Detection” In Neural Computing and Applications 34.13, 2022, pp. 10929–10948 DOI: 10.1007/s00521-022-07019-5
[136] L. Han et al. “An Explainable XGBoost Model Improved by SMOTE-ENN Technique for Maize Lodging Detection Based on Multi-Source Unmanned Aerial Vehicle Images” In Computers and Electronics in Agriculture 194, 2022 DOI: 10.1016/j.compag.2022.106804
[137] Camilla Broms et al. “Combined Analysis of Satellite and Ground Data for Winter Wheat Yield Forecasting” In Smart Agricultural Technology 3, 2023, pp. 100107 DOI: 10.1016/j.atech.2022.100107
[138] Florian Huber, Artem Yushchenko, Benedikt Stratmann and Volker Steinhage “Extreme Gradient Boosting for Yield Estimation Compared with Deep Learning Approaches” In Computers and Electronics in Agriculture 202, 2022, pp. 107346 DOI: 10.1016/j.compag.2022.107346
[139] Edward J. Jones et al. “Identifying Causes of Crop Yield Variability with Interpretive Machine Learning” In Computers and Electronics in Agriculture 192, 2022, pp. 106632 DOI: 10.1016/j.compag.2021.106632
[140] Harpinder Singh, Ajay Roy, R.K. Setia and Brijendra Pateriya “Simulation of Multispectral Data Using Hyperspectral Data for Crop Stress Studies” In Lecture Notes in Electrical Engineering 970 Springer Science and Business Media Deutschland GmbH, 2023, pp. 43–52 DOI: 10.1007/978-981-19-7698-8˙5
[141] Patrick Filippi, Brett M. Whelan, R.Willem Vervoort and Thomas F.A. Bishop “Identifying Crop Yield Gaps with Site- and Season-Specific Data-Driven Models of Yield Potential” In Precision Agriculture 23.2, 2022, pp. 578–601 DOI: 10.1007/s11119-021-09850-7
[142] Hari Sankar Nayak et al. “Interpretable Machine Learning Methods to Explain On-Farm Yield Variability of High Productivity Wheat in Northwest India” In Field Crops Research 287, 2022, pp. 108640 DOI: 10.1016/j.fcr.2022.108640
[143] Manuel Campos-Taberner et al. “Understanding Deep Learning in Land Use Classification Based on Sentinel-2 Time Series” In Scientific Reports 10.1 Nature Publishing Group, 2020, pp. 17188 DOI: 10.1038/s41598-020-74215-5
[144] Anna Mateo-Sanchis et al. “Interpretable Long-Short Term Memory Networks for Crop Yield Estimation” In IEEE Geoscience and Remote Sensing Letters, 2023, pp. 1–1 DOI: 10.1109/LGRS.2023.3244064
[145] Dilli Paudel et al. “Interpretability of Deep Learning Models for Crop Yield Forecasting” In Computers and Electronics in Agriculture 206, 2023, pp. 107663 DOI: 10.1016/j.compag.2023.107663
[146] Ivica Obadic, Ribana Roscher, Dario Augusto Borges Oliveira and Xiao Xiang Zhu “Exploring Self-Attention for Crop-type Classification Explainability” arXiv, 2022 DOI: 10.48550/arXiv.2210.13167
[147] Marc Rußwurm and Marco Körner “Self-Attention for Raw Optical Satellite Time Series Classification” In ISPRS Journal of Photogrammetry and Remote Sensing 169, 2020, pp. 421–435 DOI: 10.1016/j.isprsjprs.2020.06.006
[148] Vivien Sainte Fare Garnot, Loic Landrieu, Sebastien Giordano and Nesrine Chehata “Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12322–12331 DOI: 10.1109/CVPR42600.2020.01234
[149] A. Wolanin et al. “Estimating and Understanding Crop Yields with Explainable Deep Learning in the Indian Wheat Belt” In Environmental Research Letters 15.2, 2020 DOI: 10.1088/1748-9326/ab68ac
[150] Laura Elena Cue La Rosa, Dario Augusto Borges Oliveira and Pedram Ghamisi “Learning Crop Type Map** from Regional Label Proportions in Large-Scale SAR and Optical Imagery” In IEEE Transactions on Geoscience and Remote Sensing, 2023, pp. 1–1 DOI: 10.1109/TGRS.2023.3321156
[151] A. Mateo-Sanchis et al. “Learning Main Drivers of Crop Progress and Failure in Europe with Interpretable Machine Learning” In International Journal of Applied Earth Observation and Geoinformation 104, 2021 DOI: 10.1016/j.jag.2021.102574
[152] Laura Martínez-Ferrer, Maria Piles and Gustau Camps-Valls “Crop Yield Estimation and Interpretability With Gaussian Processes” In IEEE Geoscience and Remote Sensing Letters 18.12, 2021, pp. 2043–2047 DOI: 10.1109/LGRS.2020.3016140
[153] Mehmet Furkan Celik et al. “Explainable Artificial Intelligence for Cotton Yield Prediction with Multisource Data” In IEEE Geoscience and Remote Sensing Letters 20, 2023, pp. 1–5 DOI: 10.1109/LGRS.2023.3303643
[154] Johnny Vega, Fabio Humberto Sepúlveda-Murillo and Melissa Parra “Landslide Modeling in a Tropical Mountain Basin Using Machine Learning Algorithms and Shapley Additive Explanations” In Air, Soil and Water Research 16, 2023, pp. 11786221231195824 DOI: 10.1177/11786221231195824
[155] H.A.H. Al-Najjar et al. “A Novel Method Using Explainable Artificial Intelligence (XAI)-Based Shapley Additive Explanations for Spatial Landslide Prediction Using Time-Series SAR Dataset” In Gondwana Research, 2022 DOI: 10.1016/j.gr.2022.08.004
[156] Saeed Alqadhi, Javed Mallick and Meshel Alkahtani “Integrated Deep Learning with Explainable Artificial Intelligence for Enhanced Landslide Management” In Natural Hazards, 2023 DOI: 10.1007/s11069-023-06260-y
[157] Ashok Dahal and Luigi Lombardo “Explainable Artificial Intelligence in Geoscience: A Glimpse into the Future of Landslide Susceptibility Modeling” Earth and Space Science Open Archive, 2022 DOI: 10.1002/essoar.10512130.1
[158] Muhammad Sakib Khan Inan and Istiakur Rahman “Explainable AI Integrated Feature Selection for Landslide Susceptibility Map** Using TreeSHAP” In SN Computer Science 4.5, 2023, pp. 482 DOI: 10.1007/s42979-023-01960-5
[159] Junyi Zhang et al. “Insights into Geospatial Heterogeneity of Landslide Susceptibility Based on the SHAP-XGBoost Model” In Journal of environmental management 332 NLM (Medline), 2023, pp. 117357 DOI: 10.1016/j.jenvman.2023.117357
[160] D. Sun et al. “Assessment of Landslide Susceptibility along Mountain Highways Based on Different Machine Learning Algorithms and Map** Units by Hybrid Factors Screening and Sample Optimization” In Gondwana Research, 2023 DOI: 10.1016/j.gr.2022.07.013
[161] A.E. Maxwell, M. Sharma and K.A. Donaldson “Explainable Boosting Machines for Slope Failure Spatial Predictive Modeling” In Remote Sensing 13.24, 2021 DOI: 10.3390/rs13244991
[162] Haoran Fang et al. “A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence” In Sustainability (Switzerland) 15.4 MDPI, 2023 DOI: 10.3390/su15043094
[163] Khaled Youssef, Kevin Shao, Seulgi Moon and Louis-Serge Bouchard “Landslide Susceptibility Modeling by Interpretable Neural Network” arXiv, 2022 DOI: 10.48550/arXiv.2201.06837
[164] Cheng Chen and Lei Fan “An Attribution Deep Learning Interpretation Model for Landslide Susceptibility Map** in the Three Gorges Reservoir Area” In IEEE Transactions on Geoscience and Remote Sensing, 2023, pp. 1–1 DOI: 10.1109/TGRS.2023.3323668
[165] Halit Enes Aydin and Muzaffer Can Iban “Predicting and Analyzing Flood Susceptibility Using Boosting-Based Ensemble Machine Learning Algorithms with SHapley Additive exPlanations” In Natural Hazards, 2022 DOI: 10.1007/s11069-022-05793-y
[166] Junfei Liu, Kai Liu and Ming Wang “A Residual Neural Network Integrated with a Hydrological Model for Global Flood Susceptibility Map** Based on Remote Sensing Datasets” In Remote Sensing 15.9, 2023 DOI: 10.3390/rs15092447
[167] Mo Wang et al. “An XGBoost-SHAP Approach to Quantifying Morphological Impact on Urban Flooding Susceptibility” In Ecological Indicators 156, 2023, pp. 111137 DOI: 10.1016/j.ecolind.2023.111137
[168] Ziyang Zhang et al. “An Interpretable Deep Semantic Segmentation Method for Earth Observation” In 2022 IEEE 11th International Conference on Intelligent Systems (IS), 2022, pp. 1–8 DOI: 10.1109/IS57118.2022.10019621
[169] Abolfazl Abdollahi and Biswajeet Pradhan “Explainable Artificial Intelligence (XAI) for Interpreting the Contributing Factors Feed into the Wildfire Susceptibility Prediction Model” In Science of the Total Environment 879, 2023 DOI: 10.1016/j.scitotenv.2023.163004
[170] Roberto Cilli et al. “Explainable Artificial Intelligence (XAI) Detects Wildfire Occurrence in the Mediterranean Countries of Southern Europe” In Scientific Reports 12.1 Nature Publishing Group, 2022, pp. 16349 DOI: 10.1038/s41598-022-20347-9
[171] Alan H. Taylor et al. “Spatial Patterns of Nineteenth Century Fire Severity Persist after Fire Exclusion and a Twenty-First Century Wildfire in a Mixed Conifer Forest Landscape, Southern Cascades, USA” In Landscape Ecology 35.12, 2020, pp. 2777–2790 DOI: 10.1007/s10980-020-01118-1
[172] Nandini Saini, Chiranjoy Chattopadhyay and Debasis Das “E2AlertNet: An Explainable, Efficient, and Lightweight Model for Emergency Alert from Aerial Imagery” In Remote Sensing Applications: Society and Environment 29 Elsevier B.V., 2023 DOI: 10.1016/j.rsase.2022.100896
[173] Teo Beker, Qian Song and Xiao Xiang Zhu “An Analysis of the Gap between Hybrid and Real Data for Volcanic Deformation Detection” In IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, 2023, pp. 825–828 DOI: 10.1109/IGARSS52108.2023.10281964
[174] Zhe Chen et al. “Tunnel Geothermal Disaster Susceptibility Evaluation Based on Interpretable Ensemble Learning: A Case Study in Ya’an–Changdu Section of the Sichuan–Tibet Traffic Corridor” In Engineering Geology 313 Elsevier B.V., 2023 DOI: 10.1016/j.enggeo.2023.106985
[175] Ratiranjan Jena et al. “Explainable Artificial Intelligence (XAI) Model for Earthquake Spatial Probability Assessment in Arabian Peninsula” In Remote Sensing 15.9, 2023 DOI: 10.3390/rs15092248
[176] A. Levering, D. Marcos and D. Tuia “On the Relation between Landscape Beauty and Land Cover: A Case Study in the U.K. at Sentinel-2 Resolution with Interpretable AI” In ISPRS Journal of Photogrammetry and Remote Sensing 177, 2021, pp. 194–203 DOI: 10.1016/j.isprsjprs.2021.04.020
[177] Alex Levering, Diego Marcos, Jasper van Vliet and Devis Tuia “Predicting the Liveability of Dutch Cities with Aerial Images and Semantic Intermediate Concepts” In Remote Sensing of Environment 287, 2023, pp. 113454 DOI: 10.1016/j.rse.2023.113454
[178] Songyan Zhu et al. “Investigating Impacts of Ambient Air Pollution on the Terrestrial Gross Primary Productivity (GPP) from Remote Sensing” In IEEE Geoscience and Remote Sensing Letters 19, 2022, pp. 1–5 DOI: 10.1109/LGRS.2022.3163775
[179] Callie B. Lambert, Lynn M. Resler, Yang Shao and David R. Butler “Vegetation Change as Related to Terrain Factors at Two Glacier Forefronts, Glacier National Park, Montana, U.S.A.” In Journal of Mountain Science 17.1, 2020, pp. 1–15 DOI: 10.1007/s11629-019-5603-8
[180] Huimin Zhou et al. “Relative Importance of Climatic Variables, Soil Properties and Plant Traits to Spatial Variability in Net CO2 Exchange across Global Forests and Grasslands” In Agricultural and Forest Meteorology 307, 2021, pp. 108506 DOI: 10.1016/j.agrformet.2021.108506
[181] Shujie Cheng et al. “Improved Understanding of How Catchment Properties Control Hydrological Partitioning Through Machine Learning” In Water Resources Research 58.4, 2022, pp. e2021WR031412 DOI: 10.1029/2021WR031412
[182] C. Karmakar, C.O. Dumitru, G. Schwarz and M. Datcu “Feature-Free Explainable Data Mining in SAR Images Using Latent Dirichlet Allocation” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2021, pp. 676–689 DOI: 10.1109/JSTARS.2020.3039012
[183] Chandrabali Karmakar and Mihai Datcu “A Framework for Interactive Visual Interpretation of Remote Sensing Data” In IEEE Geoscience and Remote Sensing Letters 19, 2022, pp. 1–5 DOI: 10.1109/LGRS.2022.3161959
[184] D. Stroppiana et al. “A Fully Automatic, Interpretable and Adaptive Machine Learning Approach to Map Burned Area from Remote Sensing” In ISPRS International Journal of Geo-Information 10.8, 2021 DOI: 10.3390/ijgi10080546
[185] Hugo Leon-Garza et al. “A Big Bang-Big Crunch Type-2 Fuzzy Logic System for Explainable Semantic Segmentation of Trees in Satellite Images Using HSV Color Space” In 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2020, pp. 1–7 DOI: 10.1109/FUZZ48607.2020.9177611
[186] Bryce Murray et al. “Explainable AI for Understanding Decisions and Data-Driven Optimization of the Choquet Integral” In 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018, pp. 1–8 DOI: 10.1109/FUZZ-IEEE.2018.8491501
[187] Muhammad Aminul Islam et al. “Enabling Explainable Fusion in Deep Learning With Fuzzy Integral Neural Networks” In IEEE Transactions on Fuzzy Systems 28.7, 2020, pp. 1291–1300 DOI: 10.1109/TFUZZ.2019.2917124
[188] B.J. Murray et al. “Information Fusion-2-Text: Explainable Aggregation via Linguistic Protoforms” In Communications in Computer and Information Science 1239 CCIS, 2020, pp. 114–127 DOI: 10.1007/978-3-030-50153-2˙9
[189] Derek T. Anderson et al. “Fuzzy Choquet Integration of Deep Convolutional Neural Networks for Remote Sensing” In Computational Intelligence for Pattern Recognition Cham: Springer International Publishing, 2018, pp. 1–28 DOI: 10.1007/978-3-319-89629-8˙1
[190] Yansheng Li, Yongjun Zhang, Xin Huang and Alan L. Yuille “Deep Networks under Scene-Level Supervision for Multi-Class Geospatial Object Detection from Remote Sensing Images” In ISPRS Journal of Photogrammetry and Remote Sensing 146, 2018, pp. 182–196 DOI: 10.1016/j.isprsjprs.2018.09.014
[191] Wei Xiong et al. “An Interpretable Fusion Siamese Network for Multi-modality Remote Sensing Ship Image Retrieval” In IEEE Transactions on Circuits and Systems for Video Technology, 2022, pp. 1–1 DOI: 10.1109/TCSVT.2022.3224068
[192] Xianpeng Guo et al. “Visual Explanations with Detailed Spatial Information for Remote Sensing Image Classification via Channel Saliency” In International Journal of Applied Earth Observation and Geoinformation 118, 2023, pp. 103244 DOI: 10.1016/j.jag.2023.103244
[193] Jia Deng et al. “Imagenet: A large-scale hierarchical image database” In 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255 Ieee
[194] Zhenpeng Feng, Mingzhe Zhu, Ljubiša Stanković and Hongbing Ji “Self-Matching CAM: A Novel Accurate Visual Explanation of CNNs for SAR Image Interpretation” In Remote Sensing 13.9 Multidisciplinary Digital Publishing Institute, 2021, pp. 1772 DOI: 10.3390/rs13091772
[195] Haofan Wang et al. “Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 24–25
[196] Seyed Mojtaba Marvasti-Zadeh, Devin Goodsman, Nilanjan Ray and Nadir Erbilgin “Crown-CAM: Interpretable Visual Explanations for Tree Crown Detection in Aerial Images” In IEEE Geoscience and Remote Sensing Letters 20, 2023, pp. 1–5 DOI: 10.1109/LGRS.2023.3271649
[197] G. De Lucia, M. Lapegna and D. Romano “Towards Explainable AI for Hyperspectral Image Classification in Edge Computing Environments” In Computers and Electrical Engineering 103, 2022 DOI: 10.1016/j.compeleceng.2022.108381
[198] Wei Song et al. “Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification” In MultiMedia Modeling Cham: Springer International Publishing, 2021, pp. 134–146
[199] Ji Ge et al. “Interpretable Deep Learning Method Combining Temporal Backscattering Coefficients and Interferometric Coherence for Rice Area Map**” In IEEE Geoscience and Remote Sensing Letters 20, 2023, pp. 1–5 DOI: 10.1109/LGRS.2023.3321770
[200] Elizabeth A. Barnes, Randal J. Barnes, Zane K. Martin and Jamin K. Rader “This Looks Like That There: Interpretable Neural Networks for Image Tasks When Location Matters” In Artificial Intelligence for the Earth Systems 1.3, 2022, pp. e220001 DOI: 10.1175/AIES-D-22-0001.1
[201] X. Huang et al. “Better Visual Interpretation for Remote Sensing Scene Classification” In IEEE Geoscience and Remote Sensing Letters 19, 2022 DOI: 10.1109/LGRS.2021.3132920
[202] Alexander Brenning “Spatial Machine-Learning Model Diagnostics: A Model-Agnostic Distance-Based Approach” In International Journal of Geographical Information Science 37.3, 2023, pp. 584–606 DOI: 10.1080/13658816.2022.2131789
[203] Giuseppina Andresini, Annalisa Appice and Donato Malerba “SILVIA: An eXplainable Framework to Map Bark Beetle Infestation in Sentinel-2 Images” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, pp. 1–17 DOI: 10.1109/JSTARS.2023.3312521
[204] J.L. Abitbol and M. Karsai “Interpretable Socioeconomic Status Inference from Aerial Imagery through Urban Patterns” In Nature Machine Intelligence 2.11, 2020, pp. 684–692 DOI: 10.1038/s42256-020-00243-5
[205] Xun Li et al. “Explainable Dimensionality Reduction (XDR) to Unbox AI ‘Black Box’ Models: A Study of AI Perspectives on the Ethnic Styles of Village Dwellings” In Humanities and Social Sciences Communications 10.1 Palgrave, 2023, pp. 1–13 DOI: 10.1057/s41599-023-01505-4
[206] Katalin Blix, Gustau Camps-Valls and Robert Jenssen “Gaussian Process Sensitivity Analysis for Oceanic Chlorophyll Estimation” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10.4, 2017, pp. 1265–1277 DOI: 10.1109/JSTARS.2016.2641583
[207] Wenyi Zhang et al. “Feature Importance Measure of a Multilayer Perceptron Based on the Presingle-Connection Layer” In Knowledge and Information Systems, 2023 DOI: 10.1007/s10115-023-01959-7
[208] I. Kakogeorgiou and K. Karantzalos “Evaluating Explainable Artificial Intelligence Methods for Multi-Label Deep Learning Classification Tasks in Remote Sensing” In International Journal of Applied Earth Observation and Geoinformation 103, 2021 DOI: 10.1016/j.jag.2021.102520
[209] Taewoo Kim et al. “Federated Onboard-Ground Station Computing with Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis” In IEEE access : practical innovations, open solutions 10, 2022, pp. 117315–117333 DOI: 10.1109/ACCESS.2022.3219879
[210] Jan-Peter Kucklick and Oliver Müller “Tackling the Accuracy-Interpretability Trade-off: Interpretable Deep Learning Models for Satellite Image-Based Real Estate Appraisal” In ACM Transactions on Management Information Systems 14.1, 2023 DOI: 10.1145/3567430
[211] Qi Su et al. “Which CAM Is Better for Extracting Geographic Objects? A Perspective From Principles and Experiments” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 2022, pp. 5623–5635 DOI: 10.1109/JSTARS.2022.3188493
[212] A. Abdollahi and B. Pradhan “Urban Vegetation Map** from Aerial Imagery Using Explainable AI (XAI)” In Sensors 21.14, 2021 DOI: 10.3390/s21144738
[213] Spyros Kondylatos et al. “Wildfire Danger Prediction and Understanding with Deep Learning” In Geophysical Research Letters n/a.n/a, 2022, pp. e2022GL099368 DOI: 10.1029/2022GL099368
[214] S. Biass et al. “Insights into the Vulnerability of Vegetation to Tephra Fallouts from Interpretable Machine Learning and Big Earth Observation Data” In Natural Hazards and Earth System Sciences 22.9, 2022, pp. 2829–2855 DOI: 10.5194/nhess-22-2829-2022
[215] Timo T. Stomberg, Taylor Stone, Johannes Leonhardt and Ribana Roscher “Exploring Wilderness Using Explainable Machine Learning in Satellite Imagery” In arXiv:2203.00379 [cs], 2022 arXiv:2203.00379 [cs]
[216] Sam J. Silva, Christoph A. Keller and Joseph Hardin “Using an Explainable Machine Learning Approach to Characterize Earth System Model Errors: Application of SHAP Analysis to Modeling Lightning Flash Occurrence” In Journal of Advances in Modeling Earth Systems 14.4, 2022, pp. e2021MS002881 DOI: 10.1029/2021MS002881
[217] B. Hosseiny, A.M. Abdi and S. Jamali “Urban Land Use and Land Cover Classification with Interpretable Machine Learning – A Case Study Using Sentinel-2 and Auxiliary Data” In Remote Sensing Applications: Society and Environment 28, 2022 DOI: 10.1016/j.rsase.2022.100843
[218] Jakob Gawlikowski, Patrick Ebel, Michael Schmitt and Xiao Xiang Zhu “Explaining the Effects of Clouds on Remote Sensing Scene Classification” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 2022, pp. 9976–9986 DOI: 10.1109/JSTARS.2022.3221788
[219] Neelesh Rampal et al. “High-Resolution Downscaling with Interpretable Deep Learning: Rainfall Extremes over New Zealand” In Weather and Climate Extremes 38, 2022, pp. 100525 DOI: 10.1016/j.wace.2022.100525
[220] Patrick W Keys, Elizabeth A Barnes and Neil H Carter “A Machine-Learning Approach to Human Footprint Index Estimation with Applications to Sustainable Development” In Environmental Research Letters 16.4, 2021, pp. 044061 DOI: 10.1088/1748-9326/abe00a
[221] Teo Beker et al. “Deep Learning for Subtle Volcanic Deformation Detection with InSAR Data in Central Volcanic Zone” In IEEE Transactions on Geoscience and Remote Sensing 61, 2023, pp. 1–20 DOI: 10.1109/TGRS.2023.3318469
[222] Zhitong Xiong, Sining Chen, Yilei Shi and Xiao Xiang Zhu “Disentangled Latent Transformer for Interpretable Monocular Height Estimation” arXiv, 2022 DOI: 10.48550/arXiv.2201.06357
[223] Miltiadis Iatrou, Christos Karydas, Xanthi Tseni and Spiros Mourelatos “Representation Learning with a Variational Autoencoder for Predicting Nitrogen Requirement in Rice” In Remote Sensing 14.23 MDPI, 2022 DOI: 10.3390/rs14235978
[224] Nana Luo et al. “Explainable and Spatial Dependence Deep Learning Model for Satellite-Based O3 Monitoring in China” In Atmospheric Environment, 2022, pp. 119370 DOI: 10.1016/j.atmosenv.2022.119370
[225] Julio J. Valdés and Antonio Pou “Explainable AI Applied to the Analysis of the Climatic Behavior of 11 Years of Meteosat Water Vapor Images” In 2022 IEEE Symposium Series on Computational Intelligence (SSCI), 2022, pp. 846–853 DOI: 10.1109/SSCI51031.2022.10022301
[226] X. Yan et al. “New Interpretable Deep Learning Model to Monitor Real-Time PM2.5 Concentrations from Satellite Data” In Environment International 144, 2020 DOI: 10.1016/j.envint.2020.106060
[227] X. Yan, Z. Zang, C. Zhao and L. Husi “Understanding Global Changes in Fine-Mode Aerosols during 2008–2017 Using Statistical Methods and Deep Learning Approach” In Environment International 149, 2021 DOI: 10.1016/j.envint.2021.106392
[228] Paolo Maranzano, Philipp Otto and Alessandro Fassò “Adaptive LASSO Estimation for Functional Hidden Dynamic Geostatistical Models” In Stochastic Environmental Research and Risk Assessment 37.9, 2023, pp. 3615–3637 DOI: 10.1007/s00477-023-02466-5
[229] Zohre Ebrahimi-Khusfi et al. “Determining the Contribution of Environmental Factors in Controlling Dust Pollution during Cold and Warm Months of Western Iran Using Different Data Mining Algorithms and Game Theory” In Ecological Indicators 132, 2021, pp. 108287 DOI: 10.1016/j.ecolind.2021.108287
[230] Yanchuan Shao et al. “Estimation of Daily NO2 with Explainable Machine Learning Model in China, 2007–2020” In Atmospheric Environment 314, 2023 DOI: 10.1016/j.atmosenv.2023.120111
[231] Shuai Wang et al. “Estimating Particulate Matter Concentrations and Meteorological Contributions in China during 2000–2020” In Chemosphere 330, 2023 DOI: 10.1016/j.chemosphere.2023.138742
[232] L. Zipfel, H. Andersen and J. Cermak “Machine-Learning Based Analysis of Liquid Water Path Adjustments to Aerosol Perturbations in Marine Boundary Layer Clouds Using Satellite Observations” In Atmosphere 13.4, 2022 DOI: 10.3390/atmos13040586
[233] Lu Liang et al. “Integrating Low-Cost Sensor Monitoring, Satellite Map**, and Geospatial Artificial Intelligence for Intra-Urban Air Pollution Predictions” In Environmental Pollution 331, 2023 DOI: 10.1016/j.envpol.2023.121832
[234] Yang Zhen and Guo** Shi “Evaluation of MACC Total Aerosol Optical Depth and Its Correction Model Based on the Random Forest Regression” In Theoretical and Applied Climatology 152.3-4, 2023, pp. 1243–1258 DOI: 10.1007/s00704-023-04455-8
[235] Julio J. Valdés and Antonio Pou “A Machine Learning - Explainable AI Approach to Tropospheric Dynamics Analysis Using Water Vapor Meteosat Images” In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021, pp. 1–8 DOI: 10.1109/SSCI50451.2021.9660188
[236] S. Zhang et al. “A Data-Augmentation Approach to Deriving Long-Term Surface SO2 across Northern China: Implications for Interpretable Machine Learning” In Science of the Total Environment 827, 2022 DOI: 10.1016/j.scitotenv.2022.154278
[237] C.-S. Cheng, A.H. Behzadan and A. Noshadravan “Uncertainty-Aware Convolutional Neural Network for Explainable Artificial Intelligence-Assisted Disaster Damage Assessment” In Structural Control and Health Monitoring 29.10, 2022 DOI: 10.1002/stc.3019
[238] Erfan Hasanpour Zaryabi et al. “Unboxing the Black Box of Attention Mechanisms in Remote Sensing Big Data Using XAI” In Remote Sensing 14.24 MDPI, 2022 DOI: 10.3390/rs14246254
[239] Seyd Teymoor Seydi, Mahdi Hasanlou, Jocelyn Chanussot and Pedram Ghamisi “BDD-Net+: A Building Damage Detection Framework Based on Modified Coat-Net” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16, 2023, pp. 4232–4247 DOI: 10.1109/JSTARS.2023.3267847
[240] Seyd Teymoor Seydi, Mahdi Hasanlou, Jocelyn Chanussot and Pedram Ghamisi “Leveraging Involution and Convolution in an Explainable Building Damage Detection Framework” In European Journal of Remote Sensing 0.0 Taylor & Francis, 2023, pp. 2252166 DOI: 10.1080/22797254.2023.2252166
[241] Adrià Descals et al. “Local Interpretation of Machine Learning Models in Remote Sensing with SHAP: The Case of Global Climate Constraints on Photosynthesis Phenology” In International Journal of Remote Sensing 44.10, 2023, pp. 3160–3173 DOI: 10.1080/01431161.2023.2217982
[242] L. Li et al. “Exploring the Individualized Effect of Climatic Drivers on MODIS Net Primary Productivity through an Explainable Machine Learning Framework” In Remote Sensing 14.17, 2022 DOI: 10.3390/rs14174401
[243] Songyan Zhu et al. “Explainable Machine Learning Confirms the Global Terrestrial CO2 Fertilization Effect from Space” In IEEE Geoscience and Remote Sensing Letters 20, 2023, pp. 1–5 DOI: 10.1109/LGRS.2023.3298373
[244] Sanja Scepanovic, Sagar Joglekar, Stephen Law and Daniele Quercia “Jane Jacobs in the Sky: Predicting Urban Vitality with Open Satellite Data” In Proceedings of the ACM on Human-Computer Interaction 5.CSCW1, 2021, pp. 1–25 DOI: 10.1145/3449257
[245] Liujia Chen et al. “Measuring Impacts of Urban Environmental Elements on Housing Prices Based on Multisource Data—A Case Study of Shanghai, China” In ISPRS International Journal of Geo-Information 9.2 Multidisciplinary Digital Publishing Institute, 2020, pp. 106 DOI: 10.3390/ijgi9020106
[246] Zhongao Ding et al. “Residential Greenness and Cardiac Conduction Abnormalities: Epidemiological Evidence and an Explainable Machine Learning Modeling Study” In Chemosphere 339, 2023 DOI: 10.1016/j.chemosphere.2023.139671
[247] M. Kim and G. Kim “Modeling and Predicting Urban Expansion in South Korea Using Explainable Artificial Intelligence (XAI) Model” In Applied Sciences (Switzerland) 12.18, 2022 DOI: 10.3390/app12189169
[248] Sanja Scepanovic et al. “MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery” In Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023
[249] A. Temenos et al. “Novel Insights in Spatial Epidemiology Utilizing Explainable AI (XAI) and Remote Sensing” In Remote Sensing 14.13, 2022 DOI: 10.3390/rs14133074
[250] Siqin Wang et al. “Unpacking the Inter- and Intra-Urban Differences of the Association between Health and Exposure to Heat and Air Quality in Australia Using Global and Local Machine Learning Models” In Science of The Total Environment 871, 2023, pp. 162005 DOI: 10.1016/j.scitotenv.2023.162005
[251] P. Wójcik and K. Andruszek “Predicting Intra-Urban Well-Being from Space with Nonlinear Machine Learning” In Regional Science Policy and Practice 14.4, 2022, pp. 891–913 DOI: 10.1111/rsp3.12478
[252] Salvin S. Prasad et al. “Enhanced Joint Hybrid Deep Neural Network Explainable Artificial Intelligence Model for 1-Hr Ahead Solar Ultraviolet Index Prediction” In Computer Methods and Programs in Biomedicine 241, 2023 DOI: 10.1016/j.cmpb.2023.107737
[253] Katalin Blix, Ana Belen Ruescas, Juan Emmanuel Johnson and Gustau Camps-Valls “Learning Relevant Features of Optical Water Types” In IEEE Geoscience and Remote Sensing Letters 19, 2022, pp. 1–5 DOI: 10.1109/LGRS.2021.3072049
[254] Lifu Chen et al. “Towards Transparent Deep Learning for Surface Water Detection from SAR Imagery” In International Journal of Applied Earth Observation and Geoinformation 118, 2023 DOI: 10.1016/j.jag.2023.103287
[255] S.M. Hong et al. “Monitoring the Vertical Distribution of HABs Using Hyperspectral Imagery and Deep Learning Models” In Science of the Total Environment 794, 2021 DOI: 10.1016/j.scitotenv.2021.148592
[256] Yumin Liu, Kate Duffy, Jennifer G. Dy and Auroop R. Ganguly “Explainable Deep Learning for Insights in El Niño and River Flows” In Nature Communications 14.1 Nature Research, 2023 DOI: 10.1038/s41467-023-35968-5
[257] Neelesh Rampal, Tom Shand, Adam Wooler and Christo Rautenbach “Interpretable Deep Learning Applied to Rip Current Detection and Localization” In Remote Sensing 14.23 MDPI, 2022 DOI: 10.3390/rs14236048
[258] Elif Ozlem Yilmaz, Hasan Tonbul and Taskin Kavzoglu “Marine Mucilage Map** with Explained Deep Learning Model Using Water-Related Spectral Indices: A Case Study of Dardanelles Strait, Turkey” In Stochastic Environmental Research and Risk Assessment, 2023 DOI: 10.1007/s00477-023-02560-8
[259] Xin **g, Jungang Luo, Ganggang Zuo and Xue Yang “Interpreting Runoff Forecasting of Long Short-Term Memory Network: An Investigation Using the Integrated Gradient Method on Runoff Data from the Han River Basin” In Journal of Hydrology: Regional Studies 50, 2023, pp. 101549 DOI: 10.1016/j.ejrh.2023.101549
[260] J. Pyo et al. “Cyanobacteria Cell Prediction Using Interpretable Deep Learning Model with Observed, Numerical, and Sensing Data Assemblage” In Water Research 203, 2021 DOI: 10.1016/j.watres.2021.117483
[261] Ziming Hu et al. “Water Storage Changes (2003–2020) in the Ordos Basin, China, Explained by GRACE Data and Interpretable Deep Learning” In Hydrogeology Journal, 2023 DOI: 10.1007/s10040-023-02713-7
[262] Vahideh Saeidi et al. “Water Depth Estimation from Sentinel-2 Imagery Using Advanced Machine Learning Methods and Explainable Artificial Intelligence” In Geomatics, Natural Hazards and Risk 14.1, 2023 DOI: 10.1080/19475705.2023.2225691
[263] S.-C. Hung, H.-C. Wu and M.-H. Tseng “Integrating Image Quality Enhancement Methods and Deep Learning Techniques for Remote Sensing Scene Classification” In Applied Sciences (Switzerland) 11.24, 2021 DOI: 10.3390/app112411659
[264] Minsu Jeon et al. “Recursive Visual Explanations Mediation Scheme Based on DropAttention Model with Multiple Episodes Pool” In IEEE access : practical innovations, open solutions 11 Institute of Electrical and Electronics Engineers Inc., 2023, pp. 4306–4321 DOI: 10.1109/ACCESS.2023.3235332
[265] Jianda Cheng et al. “PolSAR Image Land Cover Classification Based on Hierarchical Capsule Network” In Remote Sensing 13.16 MDPI AG, 2021 DOI: 10.3390/rs13163132
[266] Daniel Guidici and Matthew L. Clark “One-Dimensional Convolutional Neural Network Land-Cover Classification of Multi-Seasonal Hyperspectral Imagery in the San Francisco Bay Area, California” In Remote Sensing 9.6 Multidisciplinary Digital Publishing Institute, 2017, pp. 629 DOI: 10.3390/rs9060629
[267] Dino Ienco, Yawogan Jean Eudes Gbodjo, Raffaele Gaetano and Roberto Interdonato “Weakly Supervised Learning for Land Cover Map** of Satellite Image Time Series via Attention-Based CNN” In IEEE access : practical innovations, open solutions 8, 2020, pp. 179547–179560 DOI: 10.1109/ACCESS.2020.3024133
[268] N. Méger et al. “Explaining a Deep SpatioTemporal Land Cover Classifier with Attention and Redescription Mining” 43, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 2022, pp. 673–680 DOI: 10.5194/isprs-archives-XLIII-B3-2022-673-2022
[269] J. Feng, D. Wang and Z. Gu “Bidirectional Flow Decision Tree for Reliable Remote Sensing Image Scene Classification” In Remote Sensing 14.16, 2022 DOI: 10.3390/rs14163943
[270] Z. Huang et al. “Physically Explainable CNN for SAR Image Classification” In ISPRS Journal of Photogrammetry and Remote Sensing 190, 2022, pp. 25–37 DOI: 10.1016/j.isprsjprs.2022.05.008
[271] Siteng Ma et al. “Multicrop Fusion Strategy Based on Prototype Assignment for Remote Sensing Image Scene Classification” In IEEE Transactions on Geoscience and Remote Sensing 60, 2022, pp. 1–12 DOI: 10.1109/TGRS.2022.3216831
[272] Muskan Verma, Nayan Gupta, Bhavishya Tolani and Rishabh Kaushal “Explainable Custom CNN Architecture for Land Use Classification Using Satellite Images” In 2021 Sixth International Conference on Image Information Processing (ICIIP) 6, 2021, pp. 304–309 DOI: 10.1109/ICIIP53038.2021.9702698
[273] Muzaffer Can Iban and Suleyman Sefa Bilgilioglu “Snow Avalanche Susceptibility Map** Using Novel Tree-Based Machine Learning Algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable Artificial Intelligence (XAI) Approach” In Stochastic Environmental Research and Risk Assessment, 2023 DOI: 10.1007/s00477-023-02392-6
[274] Feini Huang et al. “Interpreting Conv-LSTM for Spatio-Temporal Soil Moisture Prediction in China” In Agriculture (Switzerland) 13.5, 2023 DOI: 10.3390/agriculture13050971
[275] Ravil I. Mukhamediev et al. “Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models” In Remote Sensing 15.17, 2023 DOI: 10.3390/rs15174269
[276] B. Pradhan et al. “A New Method to Evaluate Gold Mineralisation-Potential Map** Using Deep Learning and an Explainable Artificial Intelligence (XAI) Model” In Remote Sensing 14.18, 2022 DOI: 10.3390/rs14184486
[277] Yanan Zhou et al. “Identification of Soil Texture Classes under Vegetation Cover Based on Sentinel-2 Data with SVM and SHAP Techniques” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 2022, pp. 3758–3770 DOI: 10.1109/JSTARS.2022.3164140
[278] I.A. Smorkalov “Soil Respiration Variability: Contributions of Space and Time Estimated Using the Random Forest Algorithm” In Russian Journal of Ecology 53.4, 2022, pp. 295–307 DOI: 10.1134/S1067413622040051
[279] Z.M. Labe and E.A. Barnes “Predicting Slowdowns in Decadal Climate Warming Trends with Explainable Neural Networks” In Geophysical Research Letters 49.9, 2022 DOI: 10.1029/2022GL098173
[280] Yuhong Hu, Chaofan Wu, Michael E. Meadows and Meili Feng “Pixel Level Spatial Variability Modeling Using SHAP Reveals the Relative Importance of Factors Influencing LST” In Environmental Monitoring and Assessment 195.3, 2023, pp. 407 DOI: 10.1007/s10661-023-10950-2
[281] Minjun Kim, Dongbeom Kim and Geunhan Kim “Examining the Relationship between Land Use/Land Cover (LULC) and Land Surface Temperature (LST) Using Explainable Artificial Intelligence (XAI) Models: A Case Study of Seoul, South Korea” In International Journal of Environmental Research and Public Health 19.23 MDPI, 2022 DOI: 10.3390/ijerph192315926
[282] Pinyang Luo et al. “Understanding the Relationship between 2D/3D Variables and Land Surface Temperature in Plain and Mountainous Cities: Relative Importance and Interaction Effects” In Building and Environment, 2023, pp. 110959 DOI: 10.1016/j.buildenv.2023.110959
[283] Yanting Shen et al. “Using GeoAI to Reveal the Contribution of Urban Park Green Space Features to Mitigate the Heat Island Effect” 2, Proceedings of the International Conference on Education and Research in Computer Aided Architectural Design in Europe, 2023, pp. 49–58
[284] Zhonghao Li et al. “Automatic Bridge Detection of SAR Images Based on Interpretable Deep Learning Algorithm” 2562, Journal of Physics: Conference Series, 2023 DOI: 10.1088/1742-6596/2562/1/012013
[285] Ru Luo et al. “Glassboxing Deep Learning to Enhance Aircraft Detection from Sar Imagery” In Remote Sensing 13.18 MDPI, 2021 DOI: 10.3390/rs13183650
[286] Shoulin Yin et al. “G2Grad-CAMRL: An Object Detection and Interpretation Model Based on Gradient-Weighted Class Activation Map** and Reinforcement Learning in Remote Sensing Images” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, pp. 1–16 DOI: 10.1109/JSTARS.2023.3241405
[287] Peng Li, Cunqian Feng, Xiaowei Hu and Zixiang Tang “SAR-BagNet: An Ante-hoc Interpretable Recognition Model Based on Deep Network for SAR Image” In Remote Sensing 14.9 Multidisciplinary Digital Publishing Institute, 2022, pp. 2150 DOI: 10.3390/rs14092150
[288] Bowen Peng and Bo Peng “Clutter-Invariant Regularization for DNN-based SAR Target Recognition” In 2023 6th International Conference on Electronics Technology (ICET), 2023, pp. 1456–1461 DOI: 10.1109/ICET58434.2023.10211978
[289] Mingzhe Zhu et al. “LIME-Based Data Selection Method for SAR Images Generation Using GAN” In Remote Sensing 14.1 Multidisciplinary Digital Publishing Institute, 2022, pp. 204 DOI: 10.3390/rs14010204
[290] Sijia Feng et al. “PAN: Part Attention Network Integrating Electromagnetic Characteristics for Interpretable SAR Vehicle Target Recognition” In IEEE Transactions on Geoscience and Remote Sensing, 2023, pp. 1–1 DOI: 10.1109/TGRS.2023.3256399
[291] Dandan Guo, Bo Chen, Meixi Zheng and Hongwei Liu “SAR Automatic Target Recognition Based on Supervised Deep Variational Autoencoding Model” In IEEE Transactions on Aerospace and Electronic Systems 57.6, 2021, pp. 4313–4328 DOI: 10.1109/TAES.2021.3096868
[292] Peng Li et al. “SAR-AD-BagNet: An Interpretable Model for SAR Image Recognition Based on Adversarial Defense” In IEEE Geoscience and Remote Sensing Letters, 2022, pp. 1–1 DOI: 10.1109/LGRS.2022.3230243
[293] Min Zhou et al. “Local Attention Networks for Occluded Airplane Detection in Remote Sensing Images” In IEEE Geoscience and Remote Sensing Letters 17.3, 2020, pp. 381–385 DOI: 10.1109/LGRS.2019.2924822
[294] H. Kawauchi and T. Fuse “SHAP-Based Interpretable Object Detection Method for Satellite Imagery” In Remote Sensing 14.9, 2022 DOI: 10.3390/rs14091970
[295] Mandeep, Husanbir Singh Pannu and Avleen Malhi “Deep Learning-Based Explainable Target Classification for Synthetic Aperture Radar Images” In 2020 13th International Conference on Human System Interaction (HSI), 2020, pp. 34–39 DOI: 10.1109/HSI49210.2020.9142658
[296] Amir Hosein Oveis et al. “LIME-Assisted Automatic Target Recognition with SAR Images: Toward Incremental Learning and Explainability” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16, 2023, pp. 9175–9192 DOI: 10.1109/JSTARS.2023.3318675
[297] Sijia Feng et al. “SAR Target Classification Based on Integration of ASC Parts Model and Deep Learning Algorithm” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2021, pp. 10213–10225 DOI: 10.1109/JSTARS.2021.3116979
[298] Masanori Onishi and Takeshi Ise “Explainable Identification and Map** of Trees Using UAV RGB Image and Deep Learning” In Scientific Reports 11.1 Nature Publishing Group, 2021, pp. 903 DOI: 10.1038/s41598-020-79653-9
[299] Yunzhe Jia et al. “Studying and Exploiting the Relationship between Model Accuracy and Explanation Quality” In Machine Learning and Knowledge Discovery in Databases. Research Track Cham: Springer International Publishing, 2021, pp. 699–714
[300] T.-A. Nguyen, B. Kellenberger and D. Tuia “Map** Forest in the Swiss Alps Treeline Ecotone with Explainable Deep Learning” In Remote Sensing of Environment 281, 2022 DOI: 10.1016/j.rse.2022.113217
[301] Patricia Arrogante-Funes et al. “Assessment of the Regeneration of Landslides Areas Using Unsupervised and Supervised Methods and Explainable Machine Learning Models” In Landslides, 2023 DOI: 10.1007/s10346-023-02154-z
[302] Wantong Li et al. “Widespread Increasing Vegetation Sensitivity to Soil Moisture” In Nature Communications 13.1 Nature Publishing Group, 2022, pp. 3959 DOI: 10.1038/s41467-022-31667-9
[303] Chuanpeng Zhao et al. “Toward a Better Understanding of Coastal Salt Marsh Map**: A Case from China Using Dual-Temporal Images” In Remote Sensing of Environment 295, 2023 DOI: 10.1016/j.rse.2023.113664
[304] Chuanpeng Zhao et al. “Identifying Mangroves through Knowledge Extracted from Trained Random Forest Models: An Interpretable Mangrove Map** Approach (IMMA)” In ISPRS Journal of Photogrammetry and Remote Sensing 201, 2023, pp. 209–225 DOI: 10.1016/j.isprsjprs.2023.05.025
[305] Josepha Schiller, Clemens Jänicke, Moritz Reckling and Masahiro Ryo “Higher Crop Diversity in Less Diverse Landscapes”, 2023 DOI: 10.21203/rs.3.rs-3410387/v1
[306] M. Müller et al. “Features Predisposing Forest to Bark Beetle Outbreaks and Their Dynamics during Drought” In Forest Ecology and Management 523, 2022, pp. 120480 DOI: 10.1016/j.foreco.2022.120480
[307] Kyle A. Hilburn, Imme Ebert-Uphoff and Steven D. Miller “Development and Interpretation of a Neural-Network-Based Synthetic Radar Reflectivity Estimator Using GOES-R Satellite Observations” In Journal of Applied Meteorology and Climatology 60.1 American Meteorological Society, 2020, pp. 3–21 DOI: 10.1175/JAMC-D-20-0084.1
[308] Zane K. Martin, Elizabeth A. Barnes and Eric Maloney “Using Simple, Explainable Neural Networks to Predict the Madden-Julian Oscillation” In Journal of Advances in Modeling Earth Systems 14.5, 2022 DOI: 10.1029/2021MS002774
[309] Kirsten J. Mayer and Elizabeth A. Barnes “Subseasonal Forecasts of Opportunity Identified by an Explainable Neural Network” In Geophysical Research Letters 48.10, 2021 DOI: 10.1029/2020GL092092
[310] Dangfu Yang et al. “Predictor Selection for CNN-based Statistical Downscaling of Monthly Precipitation” In Advances in Atmospheric Sciences, 2023 DOI: 10.1007/s00376-022-2119-x
[311] S.A. Upadhyaya, P.-E. Kirstetter, R.J. Kuligowski and M. Searls “Classifying Precipitation from GEO Satellite Observations: Diagnostic Model” In Quarterly Journal of the Royal Meteorological Society 147.739, 2021, pp. 3318–3334 DOI: 10.1002/qj.4130
[312] Jacob Mardian, Catherine Champagne, Barrie Bonsal and Aaron Berg “Understanding the Drivers of Drought Onset and Intensification in the Canadian Prairies: Insights from Explainable Artificial Intelligence (XAI)” In Journal of Hydrometeorology -1.aop American Meteorological Society, 2023 DOI: 10.1175/JHM-D-23-0036.1
[313] Nafsika Antoniadou et al. “Comparison of Data-Driven Methods for Linking Extreme Precipitation Events to Local and Large-Scale Meteorological Variables” In Stochastic Environmental Research and Risk Assessment 37.11, 2023, pp. 4337–4357 DOI: 10.1007/s00477-023-02511-3
[314] E.S. Maddy and S.A. Boukabara “MIIDAPS-AI: An Explainable Machine-Learning Algorithm for Infrared and Microwave Remote Sensing and Data Assimilation Preprocessing - Application to LEO and GEO Sensors” In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2021, pp. 8566–8576 DOI: 10.1109/JSTARS.2021.3104389
[315] L. Bergamasco, S. Saha, F. Bovolo and L. Bruzzone “AN EXPLAINABLE CONVOLUTIONAL AUTOENCODER MODEL for UNSUPERVISED CHANGE DETECTION” 43, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 2020, pp. 1513–1519 DOI: 10.5194/isprs-archives-XLIII-B2-2020-1513-2020
[316] Hyunglok Kim et al. “True Global Error Maps for SMAP, SMOS, and ASCAT Soil Moisture Data Based on Machine Learning and Triple Collocation Analysis” In Remote Sensing of Environment 298, 2023 DOI: 10.1016/j.rse.2023.113776
[317] Dae-Seong Lee, Da-Yeong Lee and Young-Seuk Park “Interpretable Machine Learning Approach to Analyze the Effects of Landscape and Meteorological Factors on Mosquito Occurrences in Seoul, South Korea” In Environmental Science and Pollution Research, 2022 DOI: 10.1007/s11356-022-22099-5
[318] Felix Wagner et al. “Using Explainable Machine Learning to Understand How Urban Form Shapes Sustainable Mobility” In Transportation Research Part D: Transport and Environment 111, 2022, pp. 103442 DOI: 10.1016/j.trd.2022.103442
[319] P. Taconet et al. “Data-Driven and Interpretable Machine-Learning Modeling to Explore the Fine-Scale Environmental Determinants of Malaria Vectors Biting Rates in Rural Burkina Faso” In Parasites and Vectors 14.1, 2021 DOI: 10.1186/s13071-021-04851-x
[320] Christoph Molnar et al. “Pitfalls to Avoid When Interpreting Machine Learning Models” In XXAI: Extending Explainable AI beyond Deep Models and Classifiers, ICML 2020 Workshop, 2020
[321] Satyapriya Krishna et al. “The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective” arXiv, 2022 DOI: 10.48550/arXiv.2202.01602
[322] Cynthia Rudin “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead” In Nature Machine Intelligence 1.5 Nature Publishing Group, 2019, pp. 206–215 DOI: 10.1038/s42256-019-0048-x
[323] Leon Sixt, Maximilian Granz and Tim Landgraf “When Explanations Lie: Why Many Modified BP Attributions Fail” In 37th International Conference on Machine Learning, ICML 2020 PartF16814 International Machine Learning Society (IMLS), 2020, pp. 8993–9004 DOI: 10.48550/arxiv.1912.09818
[324] Antonios Mamalakis, Elizabeth A. Barnes and Imme Ebert-Uphoff “Investigating the Fidelity of Explainable Artificial Intelligence Methods for Applications of Convolutional Neural Networks in Geoscience” In Artificial Intelligence for the Earth Systems 1.4, 2022, pp. e220012 DOI: 10.1175/AIES-D-22-0012.1
[325] Philine Bommer et al. “Finding the Right XAI Method – A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate Science” arXiv, 2023 DOI: 10.48550/arXiv.2303.00652
[326] Johannes Haug, Stefan Zürn, Peter El-Jiz and Gjergji Kasneci “On Baselines for Local Feature Attributions” arXiv, 2021 DOI: 10.48550/arXiv.2101.00905
[327] Antonios Mamalakis, Elizabeth A. Barnes and Imme Ebert-Uphoff “Carefully Choose the Baseline: Lessons Learned from Applying XAI Attribution Methods for Regression Tasks in Geoscience” In Artificial Intelligence for the Earth Systems 2.1 American Meteorological Society, 2023 DOI: 10.1175/AIES-D-22-0058.1
[328] Astrid Bertrand, Rafik Belloum, James R. Eagan and Winston Maxwell “How Cognitive Biases Affect XAI-assisted Decision-making: A Systematic Review” In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’22 New York, NY, USA: Association for Computing Machinery, 2022, pp. 78–91 DOI: 10.1145/3514094.3534164
[329] Raymond S. Nickerson “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises” In Review of General Psychology 2.2, 1998, pp. 175–220 DOI: 10.1037/1089-2680.2.2.175
[330] Markus Langer et al. “What Do We Want from Explainable Artificial Intelligence (XAI)? – A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research” In Artificial Intelligence 296, 2021, pp. 103473 DOI: 10.1016/j.artint.2021.103473
[331] Mohamed Karim Belaid, Eyke Hüllermeier, Maximilian Rabus and Ralf Krestel “Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark” arXiv, 2022 DOI: 10.48550/arXiv.2207.14160
[332] Chirag Agarwal et al. “OpenXAI: Towards a Transparent Evaluation of Model Explanations” arXiv, 2023 DOI: 10.48550/arXiv.2206.11104
[333] Wei Xu, Marvin J. Dainoff, Liezhong Ge and Zaifeng Gao “Transitioning to Human Interaction with AI Systems: New Challenges and Opportunities for HCI Professionals to Enable Human-Centered AI” In International Journal of Human–Computer Interaction 39.3 Taylor & Francis, 2023, pp. 494–518 DOI: 10.1080/10447318.2022.2041900
[334] Leila Arras, Ahmed Osman and Wojciech Samek “CLEVR-XAI: A Benchmark Dataset for the Ground Truth Evaluation of Neural Network Explanations” In Information Fusion 81, 2022, pp. 14–40 DOI: 10.1016/j.inffus.2021.11.008
[335] Antonios Mamalakis, Imme Ebert-Uphoff and Elizabeth A. Barnes “Neural Network Attribution Methods for Problems in Geoscience: A Novel Synthetic Benchmark Dataset” In Environmental Data Science 1, 2022, pp. e8 DOI: 10.1017/eds.2022.7
[336] Z.M. Labe and E.A. Barnes “Detecting Climate Signals Using Explainable AI with Single-Forcing Large Ensembles” In Journal of Advances in Modeling Earth Systems 13.6, 2021 DOI: 10.1029/2021MS002464
[337] Emily E. Berkson et al. “Synthetic Data Generation to Mitigate the Low/No-Shot Problem in Machine Learning” In 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), 2019, pp. 1–7 DOI: 10.1109/AIPR47015.2019.9174596
[338] Jacob Shermeyer et al. “RarePlanes: Synthetic Data Takes Flight” In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 207–217 DOI: 10.1109/WACV48630.2021.00025
[339] Sanghui Han et al. “Efficient Generation of Image Chips for Training Deep Learning Algorithms” In Automatic Target Recognition XXVII 10202 SPIE, 2017, pp. 15–23 DOI: 10.1117/12.2261702
[340] T. Hoeser and C. Kuenzer “SyntEO: Synthetic Dataset Generation for Earth Observation and Deep Learning – Demonstrated for Offshore Wind Farm Detection” In ISPRS Journal of Photogrammetry and Remote Sensing 189, 2022, pp. 163–184 DOI: 10.1016/j.isprsjprs.2022.04.029
[341] Juan** Zhao et al. “Contrastive-Regulated CNN in the Complex Domain: A Method to Learn Physical Scattering Signatures from Flexible PolSAR Images” In IEEE Transactions on Geoscience and Remote Sensing 57.12, 2019, pp. 10116–10135 DOI: 10.1109/TGRS.2019.2931620
[342] Charlie Marx, Yuyang Wang, Youngsuk Park and Stefano Ermon “But Are You Sure? An Uncertainty-Aware Perspective on Explainable AI”
[343] Dylan Slack, Anna Hilgard, Sameer Singh and Himabindu Lakkaraju “Reliable Post Hoc Explanations: Modeling Uncertainty in Explainability” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 9391–9404 URL: https://proceedings.neurips.cc/paper/2021/hash/4e246a381baf2ce038b3b0f82c7d6fb4-Abstract.html
[344] Zhongling Huang et al. “Uncertainty Exploration: Towards Explainable SAR Target Detection” In IEEE Transactions on Geoscience and Remote Sensing, 2023, pp. 1–1 DOI: 10.1109/TGRS.2023.3247898
[345] Nicholas Blomerus et al. “Feedback-Assisted Automatic Target and Clutter Discrimination Using a Bayesian Convolutional Neural Network for Improved Explainability in SAR Applications” In Remote Sensing 14.23 MDPI, 2022 DOI: 10.3390/rs14236096
[346] Judea Pearl “Causality” Cambridge university press, 2009
[347] Bernhard Scholkopf et al. “On causal and anticausal learning” In International Conference on Machine Learning, 2012 URL: https://api.semanticscholar.org/CorpusID:17675972
[348] Colorado J. Reed et al. “Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4088–4099
[349] Gunnar Carlsson “Topology and Data” In Bulletin of the American Mathematical Society 46.2, 2009, pp. 255–308 DOI: 10.1090/S0273-0979-09-01249-X
[350] Gunnar Carlsson and Rickard Brüel Gabrielsson “Topological Approaches to Deep Learning” In Topological Data Analysis 15 Cham: Springer International Publishing, 2020, pp. 119–146 DOI: 10.1007/978-3-030-43408-3˙5
[351] Felix Hensel, Michael Moor and Bastian Rieck “A Survey of Topological Machine Learning Methods” In Frontiers in Artificial Intelligence 4, 2021
[352] Ludovic Duponchel “When Remote Sensing Meets Topological Data Analysis” In Journal of Spectral Imaging, 2018, pp. a1 DOI: 10.1255/jsi.2018.a1
[353] Juan Ramirez, Tristan Armitage, Trevor Bihl and Ryan Kramer “Topological Learning for Semi-Supervised Anomaly Detection in Hyperspectral Imagery” 2019-July, Proceedings of the IEEE National Aerospace Electronics Conference, NAECON, 2019, pp. 560–564 DOI: 10.1109/NAECON46414.2019.9058127
[354] Aya Abdelsalam Ismail, Mohamed Gunady, Héctor Corrada Bravo and Soheil Feizi “Benchmarking Deep Learning Interpretability in Time Series Predictions” In Advances in Neural Information Processing Systems 2020-Decem.NeurIPS, 2020, pp. 1–12 arXiv:2010.13924
[355] Andreas Theissler, Francesco Spinnato, Udo Schlegel and Riccardo Guidotti “Explainable AI for Time Series Classification: A Review, Taxonomy and Research Directions” In IEEE Access 10, 2022, pp. 100700–100724 DOI: 10.1109/ACCESS.2022.3207765
[356] Reduan Achtibat et al. “From” where” to” what”: Towards human-understandable explanations through concept relevance propagation” In arXiv preprint arXiv:2206.03208, 2022
[357] Andreas Holzinger et al. “xxAI-beyond explainable artificial intelligence” In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, 2020, pp. 3–10 Springer
[358] Chih-Kuan Yeh et al. “On completeness-aware concept-based explanations in deep neural networks” In Advances in neural information processing systems 33, 2020, pp. 20554–20565
[359] Ying Ji, Yu Wang and Jien Kato “Spatial-temporal Concept based Explanation of 3D ConvNets” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15444–15453
[360] Vikram V Ramaswamy, Sunnie SY Kim, Ruth Fong and Olga Russakovsky “Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10932–10941
[361] Zhihan Gao et al. “Earthformer: Exploring Space-Time Transformers for Earth System Forecasting” arXiv, 2023 DOI: 10.48550/arXiv.2207.05833
[362] Michail Tarasiou, Erik Chavez and Stefanos Zafeiriou “ViTs for SITS: Vision Transformers for Satellite Image Time Series” arXiv, 2023 DOI: 10.48550/arXiv.2301.04944
[363] Anurag Arnab et al. “ViViT: A Video Vision Transformer” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6836–6846
[364] Haoqi Fan et al. “Multiscale Vision Transformers” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6824–6835
[365] Ze Liu et al. “Video Swin Transformer” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3202–3211
[366] Daniel Neimark, Omri Bar, Maya Zohar and Dotan Asselmann “Video Transformer Network” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3163–3172
[367] Petar Veličković “Everything is connected: Graph neural networks” In Current Opinion in Structural Biology 79 Elsevier, 2023, pp. 102538
[368] Jie Zhou et al. “Graph neural networks: A review of methods and applications” In AI open 1 Elsevier, 2020, pp. 57–81
[369] Thomas N Kipf and Max Welling “Semi-supervised classification with graph convolutional networks” In arXiv preprint arXiv:1609.02907, 2016
[370] Danfeng Hong et al. “Graph convolutional networks for hyperspectral image classification” In IEEE Transactions on Geoscience and Remote Sensing 59.7 IEEE, 2020, pp. 5966–5978
[371] Sheng Wan et al. “Dual interactive graph convolutional networks for hyperspectral image classification” Publisher Copyright: IEEE Copyright: Copyright 2021 Elsevier B.V., All rights reserved. In IEEE Transactions on Geoscience and Remote Sensing 60 IEEE, Institute of ElectricalElectronics Engineers, 2021 DOI: 10.1109/TGRS.2021.3075223
[372] Linzhou Yu et al. “Two-Branch Deeper Graph Convolutional Network for Hyperspectral Image Classification” In IEEE Transactions on Geoscience and Remote Sensing 61 IEEE, 2023, pp. 1–14
[373] Lichao Mou, Xiaoqiang Lu, Xuelong Li and Xiao Xiang Zhu “Nonlocal graph convolutional networks for hyperspectral image classification” In IEEE Transactions on Geoscience and Remote Sensing 58.12 IEEE, 2020, pp. 8246–8257
[374] Wentao Yu et al. “Hyperspectral Image Classification With Contrastive Graph Convolutional Network” In IEEE Transactions on Geoscience and Remote Sensing 61 IEEE, 2023, pp. 1–15
[375] Ninghao Liu, Qizhang Feng and Xia Hu “Interpretability in Graph Neural Networks” In Graph Neural Networks: Foundations, Frontiers, and Applications Singapore: Springer Singapore, 2022, pp. 121–147
[376] Hao Yuan, Haiyang Yu, Shurui Gui and Shuiwang Ji “Explainability in graph neural networks: A taxonomic survey” In IEEE transactions on pattern analysis and machine intelligence 45.5 IEEE, 2022, pp. 5782–5799
[377] Yi Wang et al. “Self-supervised learning in remote sensing: A review” In arXiv preprint arXiv:2206.13188, 2022
[378] Frederik Pahde, Maximilian Dreyer, Wojciech Samek and Sebastian Lapuschkin “Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models” In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, Lecture Notes in Computer Science Cham: Springer Nature Switzerland, 2023, pp. 596–606 DOI: 10.1007/978-3-031-43895-0˙56
[379] Sina Mohseni, Niloofar Zarei and Eric D. Ragan “A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems” In ACM Transactions on Interactive Intelligent Systems 11.3-4, 2021, pp. 24:1–24:45 DOI: 10.1145/3387166

Acknowledgments

The work of A. Höhl was funded by the project ML4Earth by the German Federal Ministry for Economic Affairs and Climate Action under grant number 50EE2201C. The work of I. Obadic is funded by the Munich Center for Machine Learning. M.Á. Fernández-Torres acknowledges the support from the European Research Council (ERC) under the ERC Synergy Grant USMILE (grant agreement 855187) and the European Union’s Horizon 2020 research and innovation program within the project ‘XAIDA: Extreme Events - Artificial Intelligence for Detection and Attribution,’ under grant agreement No 101003469. The work of M.Á. Fernández-Torres and X. Zhu is supported by the German Federal Ministry of Education and Research (BMBF) in the framework of the international future AI lab ”AI4EO – Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond” (grant number: 01DD20001). H. Najjar acknowledges support through a scholarship from the University of Kaiserslautern-Landau. The authors are responsible for the content of this publication.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.