-
Human-AI collectives produce the most accurate differential diagnoses
Authors:
N. Zöller,
J. Berger,
I. Lin,
N. Fu,
J. Komarneni,
G. Barabucci,
K. Laskowski,
V. Shia,
B. Harack,
E. A. Chu,
V. Trianni,
R. H. J. M. Kurvers,
S. M. Herzog
Abstract:
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied…
▽ More
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
How A/B testing changes the dynamics of information spreading on a social network
Authors:
Matteo Ottaviani,
Stefan M. Herzog,
Pietro Leonardo Nickl,
Philipp Lorenz-Spreen
Abstract:
A/B testing methodology is generally performed by private companies to increase user engagement and satisfaction about online features. Their usage is far from being transparent and may undermine user autonomy (e.g. polarizing individual opinions, mis- and dis- information spreading). For our analysis we leverage a crucial case study dataset (i.e. Upworthy) where news headlines were allocated to u…
▽ More
A/B testing methodology is generally performed by private companies to increase user engagement and satisfaction about online features. Their usage is far from being transparent and may undermine user autonomy (e.g. polarizing individual opinions, mis- and dis- information spreading). For our analysis we leverage a crucial case study dataset (i.e. Upworthy) where news headlines were allocated to users and reshuffled for optimizing clicks. Our centre of focus is to determine how and under which conditions A/B testing affects the distribution of content on the collective level, specifically on different social network structures. In order to achieve that, we set up an agent-based model reproducing social interaction and an individual decision-making model. Our preliminary results indicate that A/B testing has a substantial influence on the qualitative dynamics of information dissemination on a social network. Moreover, our modeling framework promisingly embeds conjecturing policy (e.g. nudging, boosting) interventions.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Aardvark Weather: end-to-end data-driven weather forecasting
Authors:
Anna Vaughan,
Stratis Markou,
Will Tebbutt,
James Requeima,
Wessel P. Bruinsma,
Tom R. Andersson,
Michael Herzog,
Nicholas D. Lane,
J. Scott Hosking,
Richard E. Turner
Abstract:
Machine learning is revolutionising medium-range weather prediction. However it has only been applied to specific and individual components of the weather prediction pipeline. Consequently these data-driven approaches are unable to be deployed without input from conventional operational numerical weather prediction (NWP) systems, which is computationally costly and does not support end-to-end opti…
▽ More
Machine learning is revolutionising medium-range weather prediction. However it has only been applied to specific and individual components of the weather prediction pipeline. Consequently these data-driven approaches are unable to be deployed without input from conventional operational numerical weather prediction (NWP) systems, which is computationally costly and does not support end-to-end optimisation. In this work, we take a radically different approach and replace the entire NWP pipeline with a machine learning model. We present Aardvark Weather, the first end-to-end data-driven forecasting system which takes raw observations as input and provides both global and local forecasts. These global forecasts are produced for 24 variables at multiple pressure levels at one-degree spatial resolution and 24 hour temporal resolution, and are skillful with respect to hourly climatology at five to seven day lead times. Local forecasts are produced for temperature, mean sea level pressure, and wind speed at a geographically diverse set of weather stations, and are skillful with respect to an IFS-HRES interpolation baseline at multiple lead-times. Aardvark, by virtue of its simplicity and scalability, opens the door to a new paradigm for performing accurate and efficient data-driven medium-range weather forecasting.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks
Authors:
Rafael Uetz,
Marco Herzog,
Louis Hackländer,
Simon Schwarz,
Martin Henze
Abstract:
Cyberattacks have grown into a major risk for organizations, with common consequences being data theft, sabotage, and extortion. Since preventive measures do not suffice to repel attacks, timely detection of successful intruders is crucial to stop them from reaching their final goals. For this purpose, many organizations utilize Security Information and Event Management (SIEM) systems to centrally…
▽ More
Cyberattacks have grown into a major risk for organizations, with common consequences being data theft, sabotage, and extortion. Since preventive measures do not suffice to repel attacks, timely detection of successful intruders is crucial to stop them from reaching their final goals. For this purpose, many organizations utilize Security Information and Event Management (SIEM) systems to centrally collect security-related events and scan them for attack indicators using expert-written detection rules. However, as we show by analyzing a set of widespread SIEM detection rules, adversaries can evade almost half of them easily, allowing them to perform common malicious actions within an enterprise network without being detected. To remedy these critical detection blind spots, we propose the idea of adaptive misuse detection, which utilizes machine learning to compare incoming events to SIEM rules on the one hand and known-benign events on the other hand to discover successful evasions. Based on this idea, we present AMIDES, an open-source proof-of-concept adaptive misuse detection system. Using four weeks of SIEM events from a large enterprise network and more than 500 hand-crafted evasions, we show that AMIDES successfully detects a majority of these evasions without any false alerts. In addition, AMIDES eases alert analysis by assessing which rules were evaded. Its computational efficiency qualifies AMIDES for real-world operation and hence enables organizations to significantly reduce detection blind spots with moderate effort.
△ Less
Submitted 19 December, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grou**
Authors:
Ben Lonnqvist,
Zhengqing Wu,
Michael H. Herzog
Abstract:
Humans are able to segment images effortlessly without supervision using perceptual grou**. In this work, we propose a counter-intuitive computational approach to solving unsupervised perceptual grou** and segmentation: that they arise \textit{because} of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to se…
▽ More
Humans are able to segment images effortlessly without supervision using perceptual grou**. In this work, we propose a counter-intuitive computational approach to solving unsupervised perceptual grou** and segmentation: that they arise \textit{because} of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects from each other; (2) that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels; and (3) that segmenting objects using noise results in segmentation performance that aligns with the perceptual grou** phenomena observed in humans, and is sample-efficient. We introduce the Good Gestalt (GG) datasets -- six datasets designed to specifically test perceptual grou**, and show that our DNN models reproduce many important phenomena in human perception, such as illusory contours, closure, continuity, proximity, and occlusion. Finally, we (4) show that our model improves performance on our GG datasets compared to other tested unsupervised models by $24.9\%$. Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grou**, and a novel potential benefit of neural noise.
△ Less
Submitted 15 April, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Frequency-Based Vulnerability Analysis of Deep Learning Models against Image Corruptions
Authors:
Harshitha Machiraju,
Michael H. Herzog,
Pascal Frossard
Abstract:
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MU…
▽ More
Deep learning models often face challenges when handling real-world image corruptions. In response, researchers have developed image corruption datasets to evaluate the performance of deep neural networks in handling such corruptions. However, these datasets have a significant limitation: they do not account for all corruptions encountered in real-life scenarios. To address this gap, we present MUFIA (Multiplicative Filter Attack), an algorithm designed to identify the specific types of corruptions that can cause models to fail. Our algorithm identifies the combination of image frequency components that render a model susceptible to misclassification while preserving the semantic similarity to the original image. We find that even state-of-the-art models trained to be robust against known common corruptions struggle against the low visibility-based corruptions crafted by MUFIA. This highlights the need for more comprehensive approaches to enhance model robustness against a wider range of real-world image corruptions.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
CLAD: A Contrastive Learning based Approach for Background Debiasing
Authors:
Ke Wang,
Harshitha Machiraju,
Oh-Hyeon Choung,
Michael Herzog,
Pascal Frossard
Abstract:
Convolutional neural networks (CNNs) have achieved superhuman performance in multiple vision tasks, especially image classification. However, unlike humans, CNNs leverage spurious features, such as background information to make decisions. This tendency creates different problems in terms of robustness or weak generalization performance. Through our work, we introduce a contrastive learning-based…
▽ More
Convolutional neural networks (CNNs) have achieved superhuman performance in multiple vision tasks, especially image classification. However, unlike humans, CNNs leverage spurious features, such as background information to make decisions. This tendency creates different problems in terms of robustness or weak generalization performance. Through our work, we introduce a contrastive learning-based approach (CLAD) to mitigate the background bias in CNNs. CLAD encourages semantic focus on object foregrounds and penalizes learning features from irrelavant backgrounds. Our method also introduces an efficient way of sampling negative samples. We achieve state-of-the-art results on the Background Challenge dataset, outperforming the previous benchmark with a margin of 4.1\%. Our paper shows how CLAD serves as a proof of concept for debiasing of spurious features, such as background and texture (in supplementary material).
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
A comment on Guo et al. [arXiv:2206.11228]
Authors:
Ben Lonnqvist,
Harshitha Machiraju,
Michael H. Herzog
Abstract:
In a recent article, Guo et al. [arXiv:2206.11228] report that adversarially trained neural representations in deep networks may already be as robust as corresponding primate IT neural representations. While we find the paper's primary experiment illuminating, we have doubts about the interpretation and phrasing of the results presented in the paper.
In a recent article, Guo et al. [arXiv:2206.11228] report that adversarially trained neural representations in deep networks may already be as robust as corresponding primate IT neural representations. While we find the paper's primary experiment illuminating, we have doubts about the interpretation and phrasing of the results presented in the paper.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Empirical Advocacy of Bio-inspired Models for Robust Image Recognition
Authors:
Harshitha Machiraju,
Oh-Hyeon Choung,
Michael H. Herzog,
Pascal Frossard
Abstract:
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. There are continuous attempts to use features of the human visual system to improve the robustness of neural networks to data perturbations. We provi…
▽ More
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. There are continuous attempts to use features of the human visual system to improve the robustness of neural networks to data perturbations. We provide a detailed analysis of such bio-inspired models and their properties. To this end, we benchmark the robustness of several bio-inspired models against their most comparable baseline DCNN models. We find that bio-inspired models tend to be adversarially robust without requiring any special data augmentation. Additionally, we find that bio-inspired models beat adversarially trained models in the presence of more real-world common corruptions. Interestingly, we also find that bio-inspired models tend to use both low and mid-frequency information, in contrast to other DCNN models. We find that this mix of frequency information makes them robust to both adversarial perturbations and common corruptions.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Bio-inspired Robustness: A Review
Authors:
Harshitha Machiraju,
Oh-Hyeon Choung,
Pascal Frossard,
Michael. H Herzog
Abstract:
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassifi…
▽ More
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassification of that object. But for humans, the noise is often invisible. If vulnerability to adversarial noise cannot be fixed, DCNNs cannot be taken as serious models of human vision. Many studies have tried to add features of the human visual system to DCNNs to make them robust against adversarial attacks. However, it is not fully clear whether human vision inspired components increase robustness because performance evaluations of these novel components in DCNNs are often inconclusive. We propose a set of criteria for proper evaluation and analyze different models according to these criteria. We finally sketch future efforts to make DCCNs one step closer to the model of human vision.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Training a Fast Object Detector for LiDAR Range Images Using Labeled Data from Sensors with Higher Resolution
Authors:
Manuel Herzog,
Klaus Dietmayer
Abstract:
In this paper, we describe a strategy for training neural networks for object detection in range images obtained from one type of LiDAR sensor using labeled data from a different type of LiDAR sensor. Additionally, an efficient model for object detection in range images for use in self-driving cars is presented. Currently, the highest performing algorithms for object detection from LiDAR measureme…
▽ More
In this paper, we describe a strategy for training neural networks for object detection in range images obtained from one type of LiDAR sensor using labeled data from a different type of LiDAR sensor. Additionally, an efficient model for object detection in range images for use in self-driving cars is presented. Currently, the highest performing algorithms for object detection from LiDAR measurements are based on neural networks. Training these networks using supervised learning requires large annotated datasets. Therefore, most research using neural networks for object detection from LiDAR point clouds is conducted on a very small number of publicly available datasets. Consequently, only a small number of sensor types are used. We use an existing annotated dataset to train a neural network that can be used with a LiDAR sensor that has a lower resolution than the one used for recording the annotated dataset. This is done by simulating data from the lower resolution LiDAR sensor based on the higher resolution dataset. Furthermore, improvements to models that use LiDAR range images for object detection are presented. The results are validated using both simulated sensor data and data from an actual lower resolution sensor mounted to a research vehicle. It is shown that the model can detect objects from 360° range images in real time.
△ Less
Submitted 5 December, 2019; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Why Do Developers Get Password Storage Wrong? A Qualitative Usability Study
Authors:
Alena Naiakshina,
Anastasia Danilova,
Christian Tiefenau,
Marco Herzog,
Sergej Dechand,
Matthew Smith
Abstract:
Passwords are still a mainstay of various security systems, as well as the cause of many usability issues. For end-users, many of these issues have been studied extensively, highlighting problems and informing design decisions for better policies and motivating research into alternatives. However, end-users are not the only ones who have usability problems with passwords! Developers who are tasked…
▽ More
Passwords are still a mainstay of various security systems, as well as the cause of many usability issues. For end-users, many of these issues have been studied extensively, highlighting problems and informing design decisions for better policies and motivating research into alternatives. However, end-users are not the only ones who have usability problems with passwords! Developers who are tasked with writing the code by which passwords are stored must do so securely. Yet history has shown that this complex task often fails due to human error with catastrophic results. While an end-user who selects a bad password can have dire consequences, the consequences of a developer who forgets to hash and salt a password database can lead to far larger problems. In this paper we present a first qualitative usability study with 20 computer science students to discover how developers deal with password storage and to inform research into aiding developers in the creation of secure password systems.
△ Less
Submitted 30 August, 2017; v1 submitted 29 August, 2017;
originally announced August 2017.
-
Machine and Component Residual Life Estimation through the Application of Neural Networks
Authors:
M. A. Herzog,
T. Marwala,
P. S. Heyns
Abstract:
This paper concerns the use of neural networks for predicting the residual life of machines and components. In addition, the advantage of using condition-monitoring data to enhance the predictive capability of these neural networks was also investigated. A number of neural network variations were trained and tested with the data of two different reliability-related datasets. The first dataset re…
▽ More
This paper concerns the use of neural networks for predicting the residual life of machines and components. In addition, the advantage of using condition-monitoring data to enhance the predictive capability of these neural networks was also investigated. A number of neural network variations were trained and tested with the data of two different reliability-related datasets. The first dataset represents the renewal case where the failed unit is repaired and restored to a good-as-new condition. Data was collected in the laboratory by subjecting a series of similar test pieces to fatigue loading with a hydraulic actuator. The average prediction error of the various neural networks being compared varied from 431 to 841 seconds on this dataset, where test pieces had a characteristic life of 8,971 seconds. The second dataset was collected from a group of pumps used to circulate a water and magnetite solution within a plant. The data therefore originated from a repaired system affected by reliability degradation. When optimized, the multi-layer perceptron neural networks trained with the Levenberg-Marquardt algorithm and the general regression neural network produced a sum-of-squares error within 11.1% of each other. The potential for using neural networks for residual life prediction and the advantage of incorporating condition-based data into the model were proven for both examples.
△ Less
Submitted 10 May, 2007;
originally announced May 2007.