Search | arXiv e-print repository

doi 10.1117/12.2679927

Multisensor Data Fusion for Automatized Insect Monitoring (KInsecta)

Authors: Martin Tschaikner, Danja Brandt, Henning Schmidt, Felix Bießmann, Teodor Chiaburu, Ilona Schrimpf, Thomas Schrimpf, Alexandra Stadel, Frank Haußer, Ingeborg Beckers

Abstract: Insect populations are declining globally, making systematic monitoring essential for conservation. Most classical methods involve death traps and counter insect conservation. This paper presents a multisensor approach that uses AI-based data fusion for insect classification. The system is designed as low-cost setup and consists of a camera module and an optical wing beat sensor as well as environ… ▽ More Insect populations are declining globally, making systematic monitoring essential for conservation. Most classical methods involve death traps and counter insect conservation. This paper presents a multisensor approach that uses AI-based data fusion for insect classification. The system is designed as low-cost setup and consists of a camera module and an optical wing beat sensor as well as environmental sensors to measure temperature, irradiance or daytime as prior information. The system has been tested in the laboratory and in the field. First tests on a small very unbalanced data set with 7 species show promising results for species classification. The multisensor system will support biodiversity and agriculture studies. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Journal ref: Remote Sensing for Agriculture, Ecosystems, and Hydrology XXV, SPIE 12727 (2023) 1272702

arXiv:2404.14830 [pdf, other]

CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision Models

Authors: Teodor Chiaburu, Frank Haußer, Felix Bießmann

Abstract: Mounting evidence in explainability for artificial intelligence (XAI) research suggests that good explanations should be tailored to individual tasks and should relate to concepts relevant to the task. However, building task specific explanations is time consuming and requires domain expertise which can be difficult to integrate into generic XAI methods. A promising approach towards designing usef… ▽ More Mounting evidence in explainability for artificial intelligence (XAI) research suggests that good explanations should be tailored to individual tasks and should relate to concepts relevant to the task. However, building task specific explanations is time consuming and requires domain expertise which can be difficult to integrate into generic XAI methods. A promising approach towards designing useful task specific explanations with domain experts is based on compositionality of semantic concepts. Here, we present a novel approach that enables domain experts to quickly create concept-based explanations for computer vision tasks intuitively via natural language. Leveraging recent progress in deep generative methods we propose to generate visual concept-based prototypes via text-to-image methods. These prototypes are then used to explain predictions of computer vision models via a simple k-Nearest-Neighbors routine. The modular design of CoProNN is simple to implement, it is straightforward to adapt to novel tasks and allows for replacing the classification and text-to-image models as more powerful models are released. The approach can be evaluated offline against the ground-truth of predefined prototypes that can be easily communicated also to domain experts as they are based on visual concepts. We show that our strategy competes very well with other concept-based XAI approaches on coarse grained image classification tasks and may even outperform those methods on more demanding fine grained tasks. We demonstrate the effectiveness of our method for human-machine collaboration settings in qualitative and quantitative user studies. All code and experimental data can be found in our GitHub $\href{https://github.com/TeodorChiaburu/beexplainable}{repository}$. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 24 pages, 9 figures, 2 tables, accepted at WCXAI 2024 Valletta

arXiv:2404.11461 [pdf, other]

Using Game Engines and Machine Learning to Create Synthetic Satellite Imagery for a Tabletop Verification Exercise

Authors: Johannes Hoster, Sara Al-Sayed, Felix Biessmann, Alexander Glaser, Kristian Hildebrand, Igor Moric, Tuong Vy Nguyen

Abstract: Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly availab… ▽ More Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly available satellite imagery. In this article, we demonstrate how modern game engines combined with advanced machine-learning techniques can be used to generate synthetic imagery of sites of interest with the ability to choose relevant parameters upon request; these include time of day, cloud cover, season, or level of activity onsite. At the same time, resolution and off-nadir angle can be adjusted to simulate different characteristics of the satellite. While there are several possible use-cases for synthetic imagery, here we focus on its usefulness to support tabletop exercises in which simple monitoring scenarios can be examined to better understand verification capabilities enabled by new satellite constellations and very short revisit times. △ Less

Submitted 23 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Annual Meeting of the Institute of Nuclear Materials Management (INMM), Vienna

arXiv:2404.07754 [pdf, other]

Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification

Authors: Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann

Abstract: Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in reali… ▽ More Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in realistic high-resolution image generation. Given these developments, issues of data authentication in monitoring and verification deserve a careful and systematic analysis: How realistic are synthetic images? How easily can they be generated? How useful are they for ML researchers, and what is their potential for Open Science? In this work, we use novel DL models to explore how synthetic satellite images can be created using conditioning mechanisms. We investigate the challenges of synthetic satellite image generation and evaluate the results based on authenticity and state-of-the-art metrics. Furthermore, we investigate how synthetic data can alleviate the lack of data in the context of ML methods for remote-sensing. Finally we discuss implications of synthetic satellite imagery in the context of monitoring and verification. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: https://resources.inmm.org/annual-meeting-proceedings/generating-synthetic-satellite-imagery-deep-learning-text-image-models

Journal ref: Presented at the Annual Meeting of the Institute of Nuclear Materials Management (INMM), Vienna, 2023

arXiv:2401.02465 [pdf, other]

Interpretable Time Series Models for Wastewater Modeling in Combined Sewer Overflows

Authors: Teodor Chiaburu, Felix Biessmann

Abstract: Climate change poses increasingly complex challenges to our society. Extreme weather events such as floods, wild fires or droughts are becoming more frequent, spontaneous and difficult to foresee or counteract. In this work we specifically address the problem of sewage water polluting surface water bodies after spilling over from rain tanks as a consequence of heavy rain events. We investigate to… ▽ More Climate change poses increasingly complex challenges to our society. Extreme weather events such as floods, wild fires or droughts are becoming more frequent, spontaneous and difficult to foresee or counteract. In this work we specifically address the problem of sewage water polluting surface water bodies after spilling over from rain tanks as a consequence of heavy rain events. We investigate to what extent state-of-the-art interpretable time series models can help predict such critical water level points, so that the excess can promptly be redistributed across the sewage network. Our results indicate that modern time series models can contribute to better waste water management and prevention of environmental pollution from sewer systems. All the code and experiments can be found in our repository: https://github.com/TeodorChiaburu/RIWWER_TimeSeries. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 8 pages, 5 figures, 2 tables, presented at iSCSi 2023 Lisbon

arXiv:2308.04444 [pdf, other]

Changes in Policy Preferences in German Tweets during the COVID Pandemic

Authors: Felix Biessmann

Abstract: Online social media have become an important forum for exchanging political opinions. In response to COVID measures citizens expressed their policy preferences directly on these platforms. Quantifying political preferences in online social media remains challenging: The vast amount of content requires scalable automated extraction of political preferences -- however fine grained political preferen… ▽ More Online social media have become an important forum for exchanging political opinions. In response to COVID measures citizens expressed their policy preferences directly on these platforms. Quantifying political preferences in online social media remains challenging: The vast amount of content requires scalable automated extraction of political preferences -- however fine grained political preference extraction is difficult with current machine learning (ML) technology, due to the lack of data sets. Here we present a novel data set of tweets with fine grained political preference annotations. A text classification model trained on this data is used to extract policy preferences in a German Twitter corpus ranging from 2019 to 2022. Our results indicate that in response to the COVID pandemic, expression of political opinions increased. Using a well established taxonomy of policy preferences we analyse fine grained political views and highlight changes in distinct political categories. These analyses suggest that the increase in policy preference expression is dominated by the categories pro-welfare, pro-education and pro-governmental administration efficiency. All training data and code used in this study are made publicly available to encourage other researchers to further improve automated policy preference extraction methods. We hope that our findings contribute to a better understanding of political statements in online social media and to a better assessment of how COVID measures impact political preferences. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: International Conference on Social Informatics, 2022

arXiv:2302.12139 [pdf, other]

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

Authors: Alexander Flick, Sebastian Jäger, Ivana Trajanovska, Felix Biessmann

Abstract: Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer… ▽ More Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: ECIR 2023 Demo Track

arXiv:2207.10733 [pdf, other]

GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Authors: Sebastian Jäger, Alexander Flick, Jessica Adriana Sanchez Garcia, Kaspar von den Driesch, Karl Brendel, Felix Biessmann

Abstract: The production, ship**, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product dat… ▽ More The production, ship**, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns. △ Less

Submitted 16 August, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: Presented at DataPerf Workshop at the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 2022

arXiv:2206.07497 [pdf, other]

Towards ML Methods for Biodiversity: A Novel Wild Bee Dataset and Evaluations of XAI Methods for ML-Assisted Rare Species Annotations

Authors: Teodor Chiaburu, Felix Biessmann, Frank Hausser

Abstract: Insects are a crucial part of our ecosystem. Sadly, in the past few decades, their numbers have worryingly decreased. In an attempt to gain a better understanding of this process and monitor the insects populations, Deep Learning may offer viable solutions. However, given the breadth of their taxonomy and the typical hurdles of fine grained analysis, such as high intraclass variability compared to… ▽ More Insects are a crucial part of our ecosystem. Sadly, in the past few decades, their numbers have worryingly decreased. In an attempt to gain a better understanding of this process and monitor the insects populations, Deep Learning may offer viable solutions. However, given the breadth of their taxonomy and the typical hurdles of fine grained analysis, such as high intraclass variability compared to low interclass variability, insect classification remains a challenging task. There are few benchmark datasets, which impedes rapid development of better AI models. The annotation of rare species training data, however, requires expert knowledge. Explainable Artificial Intelligence (XAI) could assist biologists in these annotation tasks, but choosing the optimal XAI method is difficult. Our contribution to these research challenges is threefold: 1) a dataset of thoroughly annotated images of wild bees sampled from the iNaturalist database, 2) a ResNet model trained on the wild bee dataset achieving classification scores comparable to similar state-of-the-art models trained on other fine-grained datasets and 3) an investigation of XAI methods to support biologists in annotation tasks. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: 6 pages, 7 figures, 1 table submitted to CVPR 2022 All the code and the link to the dataset can be found at https://github.com/TeodorChiaburu/beexplainable

arXiv:2205.02908 [pdf, other]

GreenDB: Toward a Product-by-Product Sustainability Database

Authors: Sebastian Jäger, Jessica Greene, Max Jakob, Ruben Korenke, Tilman Santarius, Felix Biessmann

Abstract: The production, ship**, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems. Thus, ML can potentially support efforts towards more sustainable consumption patterns, for example, by accounting for sustainability aspects in p… ▽ More The production, ship**, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems. Thus, ML can potentially support efforts towards more sustainable consumption patterns, for example, by accounting for sustainability aspects in product search or recommendations. However, leveraging ML potential for reaching sustainability goals requires data on sustainability. Unfortunately, no open and publicly available database integrates sustainability information on a product-by-product basis. In this work, we present the GreenDB, which fills this gap. Based on search logs of millions of users, we prioritize which products users care about most. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs to improve sustainability information available for search and recommendation experiences. We present our proof of concept implementation of a scra** system that creates the GreenDB dataset. △ Less

Submitted 29 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

arXiv:2204.10416 [pdf, other]

doi 10.1016/j.pmcj.2023.101779

CycleSense: Detecting Near Miss Incidents in Bicycle Traffic from Mobile Motion Sensors

Authors: Ahmet-Serdar Karakaya, Thomas Ritter, Felix Biessmann, David Bermbach

Abstract: In cities worldwide, cars cause health and traffic problems whichcould be partly mitigated through an increased modal share of bicycles. Many people, however, avoid cycling due to a lack of perceived safety. For city planners, addressing this is hard as they lack insights intowhere cyclists feel safe and where they do not. To gain such insights,we have in previous work proposed the crowdsourcing p… ▽ More In cities worldwide, cars cause health and traffic problems whichcould be partly mitigated through an increased modal share of bicycles. Many people, however, avoid cycling due to a lack of perceived safety. For city planners, addressing this is hard as they lack insights intowhere cyclists feel safe and where they do not. To gain such insights,we have in previous work proposed the crowdsourcing platform SimRa,which allows cyclists to record their rides and report near miss incidentsvia a smartphone app. In this paper, we present CycleSense, a combination of signal pro-cessing and Machine Learning techniques, which partially automatesthe detection of near miss incidents, thus making the reporting of nearmiss incidents easier. Using the SimRa data set, we evaluate CycleSenseby comparing it to a baseline method used by SimRa and show that itsignificantly improves incident detection. △ Less

Submitted 14 March, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2112.12444 [pdf, other]

More Than Words: Towards Better Quality Interpretations of Text Classifiers

Authors: Muhammad Bilal Zafar, Philipp Schmidt, Michele Donini, Cédric Archambeau, Felix Biessmann, Sanjiv Ranjan Das, Krishnaram Kenthapadi

Abstract: The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using differen… ▽ More The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2107.02033 [pdf, other]

Quality Metrics for Transparent Machine Learning With and Without Humans In the Loop Are Not Correlated

Authors: Felix Biessmann, Dionysius Refiano

Abstract: The field explainable artificial intelligence (XAI) has brought about an arsenal of methods to render Machine Learning (ML) predictions more interpretable. But how useful explanations provided by transparent ML methods are for humans remains difficult to assess. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotati… ▽ More The field explainable artificial intelligence (XAI) has brought about an arsenal of methods to render Machine Learning (ML) predictions more interpretable. But how useful explanations provided by transparent ML methods are for humans remains difficult to assess. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotation tasks we study the impact of different interpretability approaches on annotation accuracy and task time. We compare these quality metrics with classical XAI, automated quality metrics. Our results demonstrate that psychophysical experiments allow for robust quality assessment of transparency in machine learning. Interestingly the quality metrics computed without humans in the loop did not provide a consistent ranking of interpretability methods nor were they representative for how useful an explanation was for humans. These findings highlight the potential of methods from classical psychophysics for modern machine learning applications. We hope that our results provide convincing arguments for evaluating interpretability in its natural habitat, human-ML interaction, if the goal is to obtain an authentic assessment of interpretability. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Proceedings of the ICML Workshop on Theoretical Foundations, Criticism, and Application Trends of Explainable AI held in conjunction with the 38th International Conference on Machine Learning (ICML), a non-peer-reviewed longer version was previously published as preprint here arXiv:1912.05011

arXiv:2106.11394 [pdf, other]

A Turing Test for Transparency

Authors: Felix Biessmann, Viktor Treu

Abstract: A central goal of explainable artificial intelligence (XAI) is to improve the trust relationship in human-AI interaction. One assumption underlying research in transparent AI systems is that explanations help to better assess predictions of machine learning (ML) models, for instance by enabling humans to identify wrong predictions more efficiently. Recent empirical evidence however shows that expl… ▽ More A central goal of explainable artificial intelligence (XAI) is to improve the trust relationship in human-AI interaction. One assumption underlying research in transparent AI systems is that explanations help to better assess predictions of machine learning (ML) models, for instance by enabling humans to identify wrong predictions more efficiently. Recent empirical evidence however shows that explanations can have the opposite effect: When presenting explanations of ML predictions humans often tend to trust ML predictions even when these are wrong. Experimental evidence suggests that this effect can be attributed to how intuitive, or human, an AI or explanation appears. This effect challenges the very goal of XAI and implies that responsible usage of transparent AI methods has to consider the ability of humans to distinguish machine generated from human explanations. Here we propose a quantitative metric for XAI methods based on Turing's imitation game, a Turing Test for Transparency. A human interrogator is asked to judge whether an explanation was generated by a human or by an XAI method. Explanations of XAI methods that can not be detected by humans above chance performance in this binary classification task are passing the test. Detecting such explanations is a requirement for assessing and calibrating the trust relationship in human-AI interaction. We present experimental results on a crowd-sourced text classification task demonstrating that even for basic ML models and XAI approaches most participants were not able to differentiate human from machine generated explanations. We discuss ethical and practical implications of our results for applications of transparent ML. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: Published in Proceedings of the ICML Workshop on Theoretical Foundations, Criticism, and Application Trends of Explainable AI held in conjunction with the 38th International Conference on Machine Learning (ICML)

arXiv:2006.08368 [pdf]

Sensor Artificial Intelligence and its Application to Space Systems -- A White Paper

Authors: Anko Börner, Heinz-Wilhelm Hübers, Odej Kao, Florian Schmidt, Sören Becker, Joachim Denzler, Daniel Matolin, David Haber, Sergio Lucia, Wojciech Samek, Rudolph Triebel, Sascha Eichstädt, Felix Biessmann, Anna Kruspe, Peter Jung, Manon Kok, Guillermo Gallego, Ralf Berger

Abstract: Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in t… ▽ More Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in the area of interpretability of data in recent years. With growing experience, both, the potential and limitations of these new technologies are increasingly better understood. Typically, AI approaches start with the data from which information and directions for action are derived. However, the circumstances under which such data are collected and how they change over time are rarely considered. A closer look at the sensors and their physical properties within AI approaches will lead to more robust and widely applicable algorithms. This holistic approach which considers entire signal chains from the origin to a data product, "Sensor AI", is a highly relevant topic with great potential. It will play a decisive role in autonomous driving as well as in areas of automated production, predictive maintenance or space research. The goal of this white paper is to establish "Sensor AI" as a dedicated research topic. We want to exchange knowledge on the current state-of-the-art on Sensor AI, to identify synergies among research groups and thus boost the collaboration in this key technology for science and industry. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: 4 pages. 1st Workshop on Sensor Artificial Intelligence, Apr. 2020, Berlin, Germany

arXiv:1912.05011 [pdf, other]

A psychophysics approach for quantitative comparison of interpretable computer vision models

Authors: Felix Biessmann, Dionysius Irza Refiano

Abstract: The field of transparent Machine Learning (ML) has contributed many novel methods aiming at better interpretability for computer vision and ML models in general. But how useful the explanations provided by transparent ML methods are for humans remains difficult to assess. Most studies evaluate interpretability in qualitative comparisons, they use experimental paradigms that do not allow for direct… ▽ More The field of transparent Machine Learning (ML) has contributed many novel methods aiming at better interpretability for computer vision and ML models in general. But how useful the explanations provided by transparent ML methods are for humans remains difficult to assess. Most studies evaluate interpretability in qualitative comparisons, they use experimental paradigms that do not allow for direct comparisons amongst methods or they report only offline experiments with no humans in the loop. While there are clear advantages of evaluations with no humans in the loop, such as scalability, reproducibility and less algorithmic bias than with humans in the loop, these metrics are limited in their usefulness if we do not understand how they relate to other metrics that take human cognition into account. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotation tasks we study the impact of different interpretability approaches on annotation accuracy and task time. In order to relate these findings to quality measures for interpretability without humans in the loop we compare quality metrics with and without humans in the loop. Our results demonstrate that psychophysical experiments allow for robust quality assessment of transparency in machine learning. Interestingly the quality metrics computed without humans in the loop did not provide a consistent ranking of interpretability methods nor were they representative for how useful an explanation was for humans. These findings highlight the potential of methods from classical psychophysics for modern machine learning applications. We hope that our results provide convincing arguments for evaluating interpretability in its natural habitat, human-ML interaction, if the goal is to obtain an authentic assessment of interpretability. △ Less

Submitted 24 November, 2019; originally announced December 2019.

arXiv:1901.08558 [pdf]

Quantifying Interpretability and Trust in Machine Learning Systems

Authors: Philipp Schmidt, Felix Biessmann

Abstract: Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this context is that both the quality of interpretability methods as well as trust in ML predictions are difficult to measure. Yet evaluations, comparisons and improvem… ▽ More Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this context is that both the quality of interpretability methods as well as trust in ML predictions are difficult to measure. Yet evaluations, comparisons and improvements of trust and interpretability require quantifiable measures. Here we propose a quantitative measure for the quality of interpretability methods. Based on that we derive a quantitative measure of trust in ML decisions. Building on previous work we propose to measure intuitive understanding of algorithmic decisions using the information transfer rate at which humans replicate ML model predictions. We provide empirical evidence from crowdsourcing experiments that the proposed metric robustly differentiates interpretability methods. The proposed metric also demonstrates the value of interpretability for ML assisted human decision making: in our experiments providing explanations more than doubled productivity in annotation tasks. However unbiased human judgement is critical for doctors, judges, policy makers and others. Here we derive a trust metric that identifies when human decisions are overly biased towards ML predictions. Our results complement existing qualitative work on trust and interpretability by quantifiable measures that can serve as objectives for further improving methods in this field of research. △ Less

Submitted 20 January, 2019; originally announced January 2019.

Comments: In AAAI-19 Workshop on Network Interpretability for Deep Learning

arXiv:1609.00585 [pdf, other]

Doubly stochastic large scale kernel learning with the empirical kernel map

Authors: Nikolaas Steenbergen, Sebastian Schelter, Felix Bießmann

Abstract: With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again. The main problem with kernel methods is that the kernel matrix grows quadratically with the number of data points. Most attempts to scale up kernel methods solve this problem by discarding data points or basis functions of some approximation of the kernel map. Here we present a simple yet… ▽ More With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again. The main problem with kernel methods is that the kernel matrix grows quadratically with the number of data points. Most attempts to scale up kernel methods solve this problem by discarding data points or basis functions of some approximation of the kernel map. Here we present a simple yet effective alternative for scaling up kernel methods that takes into account the entire data set via doubly stochastic optimization of the emprical kernel map. The algorithm is straightforward to implement, in particular in parallel execution settings; it leverages the full power and versatility of classical kernel functions without the need to explicitly formulate a kernel map approximation. We provide empirical evidence that the algorithm works on large data sets. △ Less

Submitted 14 September, 2016; v1 submitted 2 September, 2016; originally announced September 2016.

arXiv:1608.02195 [pdf, other]

Automating Political Bias Prediction

Authors: Felix Biessmann

Abstract: Every day media generate large amounts of text. An unbiased view on media reports requires an understanding of the political bias of media content. Assistive technology for estimating the political bias of texts can be helpful in this context. This study proposes a simple statistical learning approach to predict political bias from text. Standard text features extracted from speeches and manifesto… ▽ More Every day media generate large amounts of text. An unbiased view on media reports requires an understanding of the political bias of media content. Assistive technology for estimating the political bias of texts can be helpful in this context. This study proposes a simple statistical learning approach to predict political bias from text. Standard text features extracted from speeches and manifestos of political parties are used to predict political bias in terms of political party affiliation and in terms of political views. Results indicate that political bias can be predicted with above chance accuracy. Mistakes of the model can be interpreted with respect to changes of policies of political actors. Two approaches are presented to make the results more interpretable: a) discriminative text features are related to the political orientation of a party and b) sentiment features of texts are correlated with a measure of political power. Political power appears to be strongly correlated with positive sentiment of a text. To highlight some potential use cases a web application shows how the model can be used for texts for which the political bias is not clear such as news articles. △ Less

Submitted 7 August, 2016; originally announced August 2016.

arXiv:1206.6388 [pdf]

Canonical Trends: Detecting Trend Setters in Web Data

Authors: Felix Biessmann, Jens-Michalis Papaioannou, Mikio Braun, Andreas Harth

Abstract: Much information available on the web is copied, reused or rephrased. The phenomenon that multiple web sources pick up certain information is often called trend. A central problem in the context of web data mining is to detect those web sources that are first to publish information which will give rise to a trend. We present a simple and efficient method for finding trends dominating a pool of web… ▽ More Much information available on the web is copied, reused or rephrased. The phenomenon that multiple web sources pick up certain information is often called trend. A central problem in the context of web data mining is to detect those web sources that are first to publish information which will give rise to a trend. We present a simple and efficient method for finding trends dominating a pool of web sources and identifying those web sources that publish the information relevant to a trend before others. We validate our approach on real data collected from influential technology news feeds. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Showing 1–20 of 20 results for author: Biessmann, F