Search | arXiv e-print repository

doi 10.1007/978-3-031-35891-3_13

The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research

Authors: Thomas Decker, Ralf Gross, Alexander Koebler, Michael Lebacher, Ronald Schnitzer, Stefan H. Weber

Abstract: In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expe… ▽ More In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expected relevance in the future. The interviews were conducted among a great variety of roles and key stakeholders from different industry sectors. On top of that, we outline the state of XAI research by providing a concise review of the relevant literature. This enables us to provide an encompassing overview covering the opinions of the surveyed persons as well as the current state of academic research. By comparing our interview results with the current research approaches we reveal several discrepancies. While a multitude of different XAI approaches exists, most of them are centered around the model evaluation phase and data scientists. Their versatile capabilities for other stages are currently either not sufficiently explored or not popular among practitioners. In line with existing work, our findings also confirm that more efforts are needed to enable also non-expert users' interpretation and understanding of opaque AI models with existing methods and frameworks. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: International Conference on Human-Computer Interaction 2023

arXiv:2305.18078 [pdf]

The mechanism underlying successful deep learning

Authors: Yarden Tzach, Yuval Meir, Ofek Tevet, Ronit D. Gross, Shiri Hodassman, Roni Vardi, Ido Kanter

Abstract: Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level fe… ▽ More Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level features required for refined classification. This article presents an efficient three-phase procedure for quantifying the mechanism underlying successful DL. First, a deep architecture is trained to maximize the success rate (SR). Next, the weights of the first several CLs are fixed and only the concatenated new FC layer connected to the output is trained, resulting in SRs that progress with the layers. Finally, the trained FC weights are silenced, except for those emerging from a single filter, enabling the quantification of the functionality of this filter using a correlation matrix between input labels and averaged output fields, hence a well-defined set of quantifiable features is obtained. Each filter essentially selects a single output label independent of the input label, which seems to prevent high SRs; however, it counterintuitively identifies a small subset of possible output labels. This feature is an essential part of the underlying DL mechanism and is progressively sharpened with layers, resulting in enhanced signal-to-noise ratios and SRs. Quantitatively, this mechanism is exemplified by the VGG-16, VGG-6, and AVGG-16. The proposed mechanism underlying DL provides an accurate tool for identifying each filter's quality and is expected to direct additional procedures to improve the SR, computational complexity, and latency of DL. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 33 pages, 8 figures

arXiv:2303.05800 [pdf]

doi 10.1038/S41598-023-40566-Y

Enhancing the accuracies by performing pooling decisions adjacent to the output layer

Authors: Yuval Meir, Yarden Tzach, Ronit D. Gross, Ofek Tevet, Roni Vardi, Ido Kanter

Abstract: Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936,… ▽ More Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8s' accuracy is superior to VGG16s', and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input-output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer. △ Less

Submitted 31 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: 29 pages, 3 figures, 1 table, and Supplementary Information

Journal ref: Sci Rep 13, 13385 (2023)

arXiv:2211.11106 [pdf]

doi 10.1038/s41598-023-32559-8

Efficient shallow learning as an alternative to deep learning

Authors: Yuval Meir, Ofek Tevet, Yarden Tzach, Shiri Hodassman, Ronit D. Gross, Ido Kanter

Abstract: The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it… ▽ More The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time-space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments. △ Less

Submitted 23 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 26 pages, 4 figures (improved figures resolution)

Report number: https://www.nature.com/articles/s41598-023-32559-8

Journal ref: Sci Rep 13, 5423 (2023)

arXiv:2201.10350 [pdf, other]

Noise sensitivity from fractional query algorithms and the axis-aligned Laplacian

Authors: Renan Gross

Abstract: We introduce the notion of classical fractional query algorithms, which generalize decision trees in the average-case setting, and can potentially perform better than them. We show that the limiting run-time complexity of a natural class of these algorithms obeys the non-linear partial differential equation $\min_{k}\partial^{2}u/\partial x_{k}^{2}=-2$, and that the individual bit revealment satis… ▽ More We introduce the notion of classical fractional query algorithms, which generalize decision trees in the average-case setting, and can potentially perform better than them. We show that the limiting run-time complexity of a natural class of these algorithms obeys the non-linear partial differential equation $\min_{k}\partial^{2}u/\partial x_{k}^{2}=-2$, and that the individual bit revealment satisfies the Schramm-Steif bound for Fourier weight, connecting noise sensitivity with PDEs. We discuss relations with other decision tree results. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 25 pages, 3 figures

arXiv:2107.13157 [pdf, other]

Retinal Microvasculature as Biomarker for Diabetes and Cardiovascular Diseases

Authors: Anusua Trivedi, Jocelyn Desbiens, Ron Gross, Sunil Gupta, Rahul Dodhia, Juan Lavista Ferres

Abstract: Purpose: To demonstrate that retinal microvasculature per se is a reliable biomarker for Diabetic Retinopathy (DR) and, by extension, cardiovascular diseases. Methods: Deep Learning Convolutional Neural Networks (CNN) applied to color fundus images for semantic segmentation of the blood vessels and severity classification on both vascular and full images. Vessel reconstruction through harmonic des… ▽ More Purpose: To demonstrate that retinal microvasculature per se is a reliable biomarker for Diabetic Retinopathy (DR) and, by extension, cardiovascular diseases. Methods: Deep Learning Convolutional Neural Networks (CNN) applied to color fundus images for semantic segmentation of the blood vessels and severity classification on both vascular and full images. Vessel reconstruction through harmonic descriptors is also used as a smoothing and de-noising tool. The mathematical background of the theory is also outlined. Results: For diabetic patients, at least 93.8% of DR No-Refer vs. Refer classification can be related to vasculature defects. As for the Non-Sight Threatening vs. Sight Threatening case, the ratio is as high as 96.7%. Conclusion: In the case of DR, most of the disease biomarkers are related topologically to the vasculature. Translational Relevance: Experiments conducted on eye blood vasculature reconstruction as a biomarker shows a strong correlation between vasculature shape and later stages of DR. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2007.03742 [pdf, other]

Meta-active Learning in Probabilistically-Safe Optimization

Authors: Mariah L. Schrum, Mark Connolly, Eric Cole, Mihir Ghetiya, Robert Gross, Matthew C. Gombolay

Abstract: Learning to control a safety-critical system with latent dynamics (e.g. for deep brain stimulation) requires taking calculated risks to gain information as efficiently as possible. To address this problem, we present a probabilistically-safe, meta-active learning approach to efficiently learn system dynamics and optimal configurations. We cast this problem as meta-learning an acquisition function,… ▽ More Learning to control a safety-critical system with latent dynamics (e.g. for deep brain stimulation) requires taking calculated risks to gain information as efficiently as possible. To address this problem, we present a probabilistically-safe, meta-active learning approach to efficiently learn system dynamics and optimal configurations. We cast this problem as meta-learning an acquisition function, which is represented by a Long-Short Term Memory Network (LSTM) encoding sampling history. This acquisition function is meta-learned offline to learn high quality sampling strategies. We employ a mixed-integer linear program as our policy with the final, linearized layers of our LSTM acquisition function directly encoded into the objective to trade off expected information gain (e.g., improvement in the accuracy of the model of system dynamics) with the likelihood of safe control. We set a new state-of-the-art in active learning for control of a high-dimensional system with altered dynamics (i.e., a damaged aircraft), achieving a 46% increase in information gain and a 20% speedup in computation time over baselines. Furthermore, we demonstrate our system's ability to learn the optimal parameter settings for deep brain stimulation in a rat's brain while avoiding unwanted side effects (i.e., triggering seizures), outperforming prior state-of-the-art approaches with a 58% increase in information gain. Additionally, our algorithm achieves a 97% likelihood of terminating in a safe state while losing only 15% of information gain. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 9 pages

arXiv:1909.12067 [pdf, other]

Concentration on the Boolean hypercube via pathwise stochastic analysis

Authors: Ronen Eldan, Renan Gross

Abstract: We develop a new technique for proving concentration inequalities which relate between the variance and influences of Boolean functions. Using this technique, we 1. Settle a conjecture of Talagrand [Tal97] proving that… ▽ More We develop a new technique for proving concentration inequalities which relate between the variance and influences of Boolean functions. Using this technique, we 1. Settle a conjecture of Talagrand [Tal97] proving that $$\int_{\left\{ -1,1\right\} ^{n}}\sqrt{h_{f}\left(x\right)}dμ\geq C\cdot\mathrm{var}\left(f\right)\cdot\left(\log\left(\frac{1}{\sum\mathrm{Inf}_{i}^{2}\left(f\right)}\right)\right)^{1/2},$$ where $h_{f}\left(x\right)$ is the number of edges at $x$ along which $f$ changes its value, and $\mathrm{Inf}_{i}\left(f\right)$ is the influence of the $i$-th coordinate. 2. Strengthen several classical inequalities concerning the influences of a Boolean function, showing that near-maximizers must have large vertex boundaries. An inequality due to Talagrand states that for a Boolean function $f$, $\mathrm{var}\left(f\right)\leq C\sum_{i=1}^{n}\frac{\mathrm{Inf}_{i}\left(f\right)}{1+\log\left(1/\mathrm{Inf}_{i}\left(f\right)\right)}$. We give a lower bound for the size of the vertex boundary of functions saturating this inequality. As a corollary, we show that for sets that satisfy the edge-isoperimetric inequality or the Kahn-Kalai-Linial inequality up to a constant, a constant proportion of the mass is in the inner vertex boundary. 3. Improve a quantitative relation between influences and noise stability given by Keller and Kindler. Our proofs rely on techniques based on stochastic calculus, and bypass the use of hypercontractivity common to previous proofs. △ Less

Submitted 12 March, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: 48 pages, 2 figures

arXiv:1901.05674 [pdf, other]

Easy to Fool? Testing the Anti-evasion Capabilities of PDF Malware Scanners

Authors: Saeed Ehteshamifar, Antonio Barresi, Thomas R. Gross, Michael Pradel

Abstract: Malware scanners try to protect users from opening malicious documents by statically or dynamically analyzing documents. However, malware developers may apply evasions that conceal the maliciousness of a document. Given the variety of existing evasions, systematically assessing the impact of evasions on malware scanners remains an open challenge. This paper presents a novel methodology for testing… ▽ More Malware scanners try to protect users from opening malicious documents by statically or dynamically analyzing documents. However, malware developers may apply evasions that conceal the maliciousness of a document. Given the variety of existing evasions, systematically assessing the impact of evasions on malware scanners remains an open challenge. This paper presents a novel methodology for testing the capability of malware scanners to cope with evasions. We apply the methodology to malicious Portable Document Format (PDF) documents and present an in-depth study of how current PDF evasions affect 41 state-of-the-art malware scanners. The study is based on a framework for creating malicious PDF documents that use one or more evasions. Based on such documents, we measure how effective different evasions are at concealing the maliciousness of a document. We find that many static and dynamic scanners can be easily fooled by relatively simple evasions and that the effectiveness of different evasions varies drastically. Our work not only is a call to arms for improving current malware scanners, but by providing a large-scale corpus of malicious PDF documents with evasions, we directly support the development of improved tools to detect document-based malware. Moreover, our methodology paves the way for a quantitative evaluation of evasions in other kinds of malware. △ Less

Submitted 22 January, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

Comments: 14 pages, 8 figures

arXiv:1806.07268 [pdf, other]

doi 10.1007/978-3-030-31978-6_7

Beyond Local Nash Equilibria for Adversarial Networks

Authors: Frans A. Oliehoek, Rahul Savani, Jose Gallego, Elise van der Pol, Roderich Groß

Abstract: Save for some special cases, current training methods for Generative Adversarial Networks (GANs) are at best guaranteed to converge to a `local Nash equilibrium` (LNE). Such LNEs, however, can be arbitrarily far from an actual Nash equilibrium (NE), which implies that there are no guarantees on the quality of the found generator or classifier. This paper proposes to model GANs explicitly as finite… ▽ More Save for some special cases, current training methods for Generative Adversarial Networks (GANs) are at best guaranteed to converge to a `local Nash equilibrium` (LNE). Such LNEs, however, can be arbitrarily far from an actual Nash equilibrium (NE), which implies that there are no guarantees on the quality of the found generator or classifier. This paper proposes to model GANs explicitly as finite games in mixed strategies, thereby ensuring that every LNE is an NE. With this formulation, we propose a solution method that is proven to monotonically converge to a resource-bounded Nash equilibrium (RB-NE): by increasing computational resources we can find better solutions. We empirically demonstrate that our method is less prone to typical GAN problems such as mode collapse, and produces solutions that are less exploitable than those produced by GANs and MGANs, and closely resemble theoretical predictions about NEs. △ Less

Submitted 26 July, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: Supersedes arXiv:1712.00679; v2 includes Fictitious GAN in the related work and refers to Danskin (1981)

Journal ref: Published in Benelearn/BANIC 2018

arXiv:1712.00679 [pdf, other]

GANGs: Generative Adversarial Network Games

Authors: Frans A. Oliehoek, Rahul Savani, Jose Gallego-Posada, Elise van der Pol, Edwin D. de Jong, Roderich Gross

Abstract: Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited game-theoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zero-sum game between a gener… ▽ More Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited game-theoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zero-sum game between a generator ($G$) and classifier ($C$) that use mixed strategies. The size of these games precludes exact solution methods, therefore we define resource-bounded best responses (RBBRs), and a resource-bounded Nash Equilibrium (RB-NE) as a pair of mixed strategies such that neither $G$ or $C$ can find a better RBBR. The RB-NE solution concept is richer than the notion of `local Nash equilibria' in that it captures not only failures of esca** local optima of gradient descent, but applies to any approximate best response computations, including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RB-NE. We compare our results to standard GAN setups, and demonstrate that our method deals well with typical GAN problems such as mode collapse, partial mode coverage and forgetting. △ Less

Submitted 17 December, 2017; v1 submitted 2 December, 2017; originally announced December 2017.

Comments: 9 pages, 5 figures

arXiv:1707.01227 [pdf, other]

Exponential random graphs behave like mixtures of stochastic block models

Authors: Ronen Eldan, Renan Gross

Abstract: We study the behavior of exponential random graphs in both the sparse and the dense regime. We show that exponential random graphs are approximate mixtures of graphs with independent edges whose probability matrices are critical points of an associated functional, thereby satisfying a certain matrix equation. In the dense regime, every solution to this equation is close to a block matrix, concludi… ▽ More We study the behavior of exponential random graphs in both the sparse and the dense regime. We show that exponential random graphs are approximate mixtures of graphs with independent edges whose probability matrices are critical points of an associated functional, thereby satisfying a certain matrix equation. In the dense regime, every solution to this equation is close to a block matrix, concluding that the exponential random graph behaves roughly like a mixture of stochastic block models. We also show existence and uniqueness of solutions to this equation for several families of exponential random graphs, including the case where the subgraphs are counted with positive weights and the case where all weights are small in absolute value. In particular, this generalizes some of the results in a paper by Chatterjee and Diaconis from the dense regime to the sparse regime and strengthens their bounds from the cut-metric to the one-metric. △ Less

Submitted 19 April, 2018; v1 submitted 5 July, 2017; originally announced July 2017.

arXiv:1603.04904 [pdf, other]

doi 10.1007/s11721-016-0126-1

Turing learning: a metric-free approach to inferring behavior and its application to swarms

Authors: Wei Li, Melvin Gauci, Roderich Gross

Abstract: We propose Turing Learning, a novel system identification method for inferring the behavior of natural or artificial systems. Turing Learning simultaneously optimizes two populations of computer programs, one representing models of the behavior of the system under investigation, and the other representing classifiers. By observing the behavior of the system as well as the behaviors produced by the… ▽ More We propose Turing Learning, a novel system identification method for inferring the behavior of natural or artificial systems. Turing Learning simultaneously optimizes two populations of computer programs, one representing models of the behavior of the system under investigation, and the other representing classifiers. By observing the behavior of the system as well as the behaviors produced by the models, two sets of data samples are obtained. The classifiers are rewarded for discriminating between these two sets, that is, for correctly categorizing data samples as either genuine or counterfeit. Conversely, the models are rewarded for 'tricking' the classifiers into categorizing their data samples as genuine. Unlike other methods for system identification, Turing Learning does not require predefined metrics to quantify the difference between the system and its models. We present two case studies with swarms of simulated robots and prove that the underlying behaviors cannot be inferred by a metric-based system identification method. By contrast, Turing Learning infers the behaviors with high accuracy. It also produces a useful by-product - the classifiers - that can be used to detect abnormal behavior in the swarm. Moreover, we show that Turing Learning also successfully infers the behavior of physical robot swarms. The results show that collective behaviors can be directly inferred from motion trajectories of individuals in the swarm, which may have significant implications for the study of animal collectives. Furthermore, Turing Learning could prove useful whenever a behavior is not easily characterizable using metrics, making it suitable for a wide range of applications. △ Less

Submitted 30 September, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

Comments: camera-ready version

ACM Class: I.2.4; I.2.6; I.2.9; I.2.11; I.5.1

Journal ref: Swarm Intelligence, 10(3):211-243, 2016

arXiv:1407.0549 [pdf, other]

doi 10.3929/ethz-a-010171214

Lockdown: Dynamic Control-Flow Integrity

Authors: Mathias Payer, Antonio Barresi, Thomas R. Gross

Abstract: Applications written in low-level languages without type or memory safety are especially prone to memory corruption. Attackers gain code execution capabilities through such applications despite all currently deployed defenses by exploiting memory corruption vulnerabilities. Control-Flow Integrity (CFI) is a promising defense mechanism that restricts open control-flow transfers to a static set of w… ▽ More Applications written in low-level languages without type or memory safety are especially prone to memory corruption. Attackers gain code execution capabilities through such applications despite all currently deployed defenses by exploiting memory corruption vulnerabilities. Control-Flow Integrity (CFI) is a promising defense mechanism that restricts open control-flow transfers to a static set of well-known locations. We present Lockdown, an approach to dynamic CFI that protects legacy, binary-only executables and libraries. Lockdown adaptively learns the control-flow graph of a running process using information from a trusted dynamic loader. The sandbox component of Lockdown restricts interactions between different shared objects to imported and exported functions by enforcing fine-grained CFI checks. Our prototype implementation shows that dynamic CFI results in low performance overhead. △ Less

Submitted 2 July, 2014; originally announced July 2014.

Comments: ETH Technical Report

arXiv:1106.6223 [pdf, ps, other]

doi 10.1007/s11047-012-9322-0

Why 'GSA: A Gravitational Search Algorithm' Is Not Genuinely Based on the Law of Gravity

Authors: Melvin Gauci, Tony J. Dodd, Roderich Gross

Abstract: Why 'GSA: A Gravitational Search Algorithm' Is Not Genuinely Based on the Law of Gravity Why 'GSA: A Gravitational Search Algorithm' Is Not Genuinely Based on the Law of Gravity △ Less

Submitted 30 June, 2011; originally announced June 2011.

Showing 1–15 of 15 results for author: Gross, R