Search | arXiv e-print repository

Vortex-capturing multiscale spaces for the Ginzburg-Landau equation

Authors: Maria Blum, Christian Döding, Patrick Henning

Abstract: This paper considers minimizers of the Ginzburg-Landau energy functional in particular multiscale spaces which are based on finite elements. The spaces are constructed by localized orthogonal decomposition techniques and their usage for solving the Ginzburg-Landau equation was first suggested in [Dörich, Henning, SINUM 2024]. In this work we further explore their approximation properties and give… ▽ More This paper considers minimizers of the Ginzburg-Landau energy functional in particular multiscale spaces which are based on finite elements. The spaces are constructed by localized orthogonal decomposition techniques and their usage for solving the Ginzburg-Landau equation was first suggested in [Dörich, Henning, SINUM 2024]. In this work we further explore their approximation properties and give an analytical explanation for why vortex structures of energy minimizers can be captured more accurately in these spaces. We quantify the necessary mesh resolution in terms of the Ginzburg-Landau parameter $κ$ and a stabilization parameter $β\ge 0$ that is used in the construction of the multiscale spaces. Furthermore, we analyze how $κ$ affects the necessary locality of the multiscale basis functions and we prove that the choice $β=0$ yields typically the highest accuracy. Our findings are supported by numerical experiments. △ Less

Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

MSC Class: 65N12; 65N15; 65N30; 35Q56

arXiv:2405.02593 [pdf]

An Interdisciplinary Perspective of the Built-Environment Microbiome

Authors: John S. McAlister, Michael J. Blum, Yana Bromberg, Nina H. Fefferman, Qiang He, Eric Lofgren, Debra L. Miller, Courtney Schreiner, K. Selcuk Candan, Heather Szabo-Rogers, J. Michael Reed

Abstract: The built environment provides an excellent setting for interdisciplinary research on the dynamics of microbial communities. The system is simplified compared to many natural settings, and to some extent the entire environment can be manipulated, from architectural design, to materials use, air flow, human traffic, and capacity to disrupt microbial communities through cleaning. Here we provide an… ▽ More The built environment provides an excellent setting for interdisciplinary research on the dynamics of microbial communities. The system is simplified compared to many natural settings, and to some extent the entire environment can be manipulated, from architectural design, to materials use, air flow, human traffic, and capacity to disrupt microbial communities through cleaning. Here we provide an overview of the ecology of the microbiome in the built environment. We address niche space and refugia, population and community (metagenomic) dynamics, spatial ecology within a building, including the major microbial transmission mechanisms, as well as evolution. We also address the landscape ecology connecting microbiomes between physically separated buildings. At each stage we pay particular attention to the actual and potential interface between disciplines, such as ecology, epidemiology, materials science, and human social behavior. We end by identifying some opportunities for future interdisciplinary research on the microbiome of the built environment. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 23 pages

arXiv:2403.17101 [pdf]

AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

Authors: Lenore Blum, Manuel Blum

Abstract: We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model aligns… ▽ More We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model aligns at a high level with many of the major scientific theories of human and animal consciousness, supporting our claim that machine consciousness is inevitable. △ Less

Submitted 10 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

MSC Class: 68T01 ACM Class: F.1; I.2

arXiv:2402.19076 [pdf, other]

Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

Authors: Gennaro Nolano, Moritz Blum, Basil Ell, Philipp Cimiano

Abstract: In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect… ▽ More In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect a model to identify the same relation independently of the entities involved in it. For example, consider the sentence "Leonardo da Vinci painted the Mona Lisa" expressing the created(Leonardo_da_Vinci, Mona_Lisa) relation. If we substiute "Leonardo da Vinci" with "Barack Obama", then the sentence still expresses the created relation. A robust model is supposed to detect the same relation in both cases. In this work, we describe several semantically-motivated strategies to generate adversarial examples by replacing entity mentions and investigate how state-of-the-art RE models perform under pressure. Our analyses show that the performance of these models significantly deteriorates on the modified datasets (avg. of -48.5% in F1), which indicates that these models rely to a great extent on shortcuts, such as surface forms (or patterns therein) of entities, without making full use of the information present in the sentences. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2311.02226 [pdf]

The Case for Controls: Identifying outbreak risk factors through case-control comparisons

Authors: Nina H. Fefferman, Michael J. Blum, Lydia Bourouiba, Nathaniel L. Gibson, Qiang He, Debra L. Miller, Monica Papes, Dana K. Pasquale, Connor Verheyen, Sadie J. Ryan

Abstract: Investigations of infectious disease outbreaks often focus on identifying place- and context-dependent factors responsible for emergence and spread, resulting in phenomenological narratives ill-suited to develo** generalizable predictive and preventive measures. We contend that case-control hypothesis testing is a more powerful framework for epidemiological investigation. The approach, widely us… ▽ More Investigations of infectious disease outbreaks often focus on identifying place- and context-dependent factors responsible for emergence and spread, resulting in phenomenological narratives ill-suited to develo** generalizable predictive and preventive measures. We contend that case-control hypothesis testing is a more powerful framework for epidemiological investigation. The approach, widely used in medical research, involves identifying counterfactuals, with case-control comparisons drawn to test hypotheses about the conditions that manifest outbreaks. Here we outline the merits of applying a case-control framework as epidemiological study design. We first describe a framework for iterative multidisciplinary interrogation to discover minimally sufficient sets of factors that can lead to disease outbreaks. We then lay out how case-control comparisons can respectively center on pathogen(s), factor(s), or landscape(s) with vignettes focusing on pathogen transmission. Finally, we consider how adopting case-control approaches can promote evidence-based decision making for responding to and preventing outbreaks. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2309.13130 [pdf, other]

Insights from an OTTR-centric Ontology Engineering Methodology

Authors: Moritz Blum, Basil Ell, Philipp Cimiano

Abstract: OTTR is a language for representing ontology modeling patterns, which enables to build ontologies or knowledge bases by instantiating templates. Thereby, particularities of the ontological representation language are hidden from the domain experts, and it enables ontology engineers to, to some extent, separate the processes of deciding about what information to model from deciding about how to mod… ▽ More OTTR is a language for representing ontology modeling patterns, which enables to build ontologies or knowledge bases by instantiating templates. Thereby, particularities of the ontological representation language are hidden from the domain experts, and it enables ontology engineers to, to some extent, separate the processes of deciding about what information to model from deciding about how to model the information, e.g., which design patterns to use. Certain decisions can thus be postponed for the benefit of focusing on one of these processes. To date, only few works on ontology engineering where ontology templates are applied are described in the literature. In this paper, we outline our methodology and report findings from our ontology engineering activities in the domain of Material Science. In these activities, OTTR templates play a key role. Our ontology engineering process is bottom-up, as we begin modeling activities from existing data that is then, via templates, fed into a knowledge graph, and it is top-down, as we first focus on which data to model and postpone the decision of how to model the data. We find, among other things, that OTTR templates are especially useful as a means of communication with domain experts. Furthermore, we find that because OTTR templates encapsulate modeling decisions, the engineering process becomes flexible, meaning that design decisions can be changed at little cost. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: Paper accepted at the 14th Workshop on Ontology Design and Patterns (WOP 2023)

arXiv:2308.10410 [pdf, other]

Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts

Authors: Fan Gao, Hang Jiang, Rui Yang, Qingcheng Zeng, **ghui Lu, Moritz Blum, Dairui Liu, Tianwei She, Yuang Jiang, Irene Li

Abstract: Educational materials such as survey articles in specialized fields like computer science traditionally require tremendous expert inputs and are therefore expensive to create and update. Recently, Large Language Models (LLMs) have achieved significant success across various general tasks. However, their effectiveness and limitations in the education domain are yet to be fully explored. In this wor… ▽ More Educational materials such as survey articles in specialized fields like computer science traditionally require tremendous expert inputs and are therefore expensive to create and update. Recently, Large Language Models (LLMs) have achieved significant success across various general tasks. However, their effectiveness and limitations in the education domain are yet to be fully explored. In this work, we examine the proficiency of LLMs in generating succinct survey articles specific to the niche field of NLP in computer science, focusing on a curated list of 99 topics. Automated benchmarks reveal that GPT-4 surpasses its predecessors, inluding GPT-3.5, PaLM2, and LLaMa2 by margins ranging from 2% to 20% in comparison to the established ground truth. We compare both human and GPT-based evaluation scores and provide in-depth analysis. While our findings suggest that GPT-created surveys are more contemporary and accessible than human-authored ones, certain limitations were observed. Notably, GPT-4, despite often delivering outstanding content, occasionally exhibited lapses like missing details or factual errors. At last, we compared the rating behavior between humans and GPT-4 and found systematic bias in using GPT evaluation. △ Less

Submitted 23 May, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

Journal ref: ACL 2024 Findings

arXiv:2304.11651 [pdf, other]

Evaluating Digital Library Search Systems by using Formal Process Modelling

Authors: Christin Katharina Kreutz, Martin Blum, Philipp Schaer, Ralf Schenkel, Benjamin Weyers

Abstract: Evaluations of digital library information systems are typically centred on users correctly, efficiently, and quickly performing predefined tasks. Additionally, users generally enjoy working with the evaluated system, and completed questionnaires show an interface's excellent user experience. However, such evaluations do not explicitly consider comparing or connecting user-specific information-see… ▽ More Evaluations of digital library information systems are typically centred on users correctly, efficiently, and quickly performing predefined tasks. Additionally, users generally enjoy working with the evaluated system, and completed questionnaires show an interface's excellent user experience. However, such evaluations do not explicitly consider comparing or connecting user-specific information-seeking behaviour with digital library system capabilities and thus overlook actual user needs or further system requirements. We aim to close this gap by introducing the usage of formalisations of users' task conduction strategies to compare their information needs with the capabilities of such information systems. We observe users' strategies in scope of expert finding and paper search. We propose and investigate using the business process model notation to formalise task conduction strategies and the SchenQL digital library interface as an example system. We conduct interviews in a qualitative evaluation with 13 participants from various backgrounds from which we derive models. We discovered that the formalisations are suitable and helpful to mirror the strategies back to users and to compare users' ideal task conductions with capabilities of information systems. We conclude using formal models for qualitative digital library studies being a suitable mean to identify current limitations and depict users' task conduction strategies. Our published dataset containing the evaluation data can be reused to investigate other digital library systems' fit for depicting users' ideal task solutions. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: 10 pages + references, publication accepted at JCDL'23

arXiv:2303.17075 [pdf]

Viewpoint: A Theoretical Computer Science Perspective on Consciousness and Artificial General Intelligence

Authors: Lenore Blum, Manuel Blum

Abstract: We have defined the Conscious Turing Machine (CTM) for the purpose of investigating a Theoretical Computer Science (TCS) approach to consciousness. For this, we have hewn to the TCS demand for simplicity and understandability. The CTM is consequently and intentionally a simple machine. It is not a model of the brain, though its design has greatly benefited - and continues to benefit - from neurosc… ▽ More We have defined the Conscious Turing Machine (CTM) for the purpose of investigating a Theoretical Computer Science (TCS) approach to consciousness. For this, we have hewn to the TCS demand for simplicity and understandability. The CTM is consequently and intentionally a simple machine. It is not a model of the brain, though its design has greatly benefited - and continues to benefit - from neuroscience and psychology. The CTM is a model of and for consciousness. Although it is developed to understand consciousness, the CTM offers a thoughtful and novel guide to the creation of an Artificial General Intelligence (AGI). For example, the CTM has an enormous number of powerful processors, some with specialized expertise, others unspecialized but poised to develop an expertise. For whatever problem must be dealt with, the CTM has an excellent way to utilize those processors that have the required knowledge, ability, and time to work on the problem, even if it is not aware of which ones these may be. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2209.10971 [pdf]

doi 10.1021/acs.jpcc.2c03851

Tuning SMSI Kinetics on Pt-loaded TiO$_2$(110) by Choosing the Pressure: A Combined UHV / Near-Ambient Pressure XPS Study

Authors: Philip Petzoldt, Moritz Eder, Sonia Mackewicz, Monika Blum, Tim Kratky, Sebastian Günther, Martin Tschurl, Ueli Heiz, Barbara A. J. Lechner

Abstract: Pt catalyst particles on reducible oxide supports often change their activity significantly at elevated temperatures due to the strong metal-support interaction (SMSI), which induces the formation of an encapsulation layer around the noble metal particles. However, the impact of oxidizing and reducing treatments at elevated pressures on this encapsulation layer remains controversial, partly due to… ▽ More Pt catalyst particles on reducible oxide supports often change their activity significantly at elevated temperatures due to the strong metal-support interaction (SMSI), which induces the formation of an encapsulation layer around the noble metal particles. However, the impact of oxidizing and reducing treatments at elevated pressures on this encapsulation layer remains controversial, partly due to the 'pressure gap' between surface science studies and applied catalysis. In the present work, we employ synchrotron-based near-ambient pressure X-ray photoelectron spectroscopy (NAP-XPS) to study the effect of O$_2$ and H$_2$ on the SMSI-state of well-defined Pt/TiO$_2$(110) catalysts at pressures of up to 0.1 Torr. By tuning the O$_2$ pressure, we can either selectively oxidize the TiO$_2$ support or both the support and the Pt particles. Catalyzed by metallic Pt, the encapsulating oxide overlayer grows rapidly in 1x10$^{-5}$ Torr O$_2$, but orders of magnitudes less effective at higher O$_2$ pressures, where Pt is in an oxidic state. While the oxidation/reduction of Pt particles is reversible, they remain embedded in the support once encapsulation has occurred. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2206.13942 [pdf]

A Theoretical Computer Science Perspective on Free Will

Authors: Manuel Blum, Lenore Blum

Abstract: We consider the paradoxical concept of free will from the perspective of Theoretical Computer Science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. We consider the paradoxical concept of free will from the perspective of Theoretical Computer Science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. △ Less

Submitted 15 May, 2024; v1 submitted 25 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2107.13704

MSC Class: 68-02 ACM Class: I.2; F.0

arXiv:2205.06513 [pdf, other]

doi 10.1145/3529372.3533282

SchenQL: A query language for bibliographic data with aggregations and domain-specific functions

Authors: Christin Katharina Kreutz, Martin Blum, Ralf Schenkel

Abstract: Current search interfaces of digital libraries are not suitable to satisfy complex or convoluted information needs directly, when it comes to cases such as "Find authors who only recently started working on a topic". They might offer possibilities to obtain this information only by requiring vast user interaction. We present SchenQL, a web interface of a domain-specific query language on bibliogra… ▽ More Current search interfaces of digital libraries are not suitable to satisfy complex or convoluted information needs directly, when it comes to cases such as "Find authors who only recently started working on a topic". They might offer possibilities to obtain this information only by requiring vast user interaction. We present SchenQL, a web interface of a domain-specific query language on bibliographic metadata, which offers information search and exploration by query formulation and navigation in the system. Our system focuses on supporting aggregation of data and providing specialised domain dependent functions while being suitable for domain experts as well as casual users of digital libraries. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: Accepted at JCDL'22 as a demo, 5 pages, 4 figures

arXiv:2107.13704 [pdf]

doi 10.1073/pnas.2115934119

A Theory of Consciousness from a Theoretical Computer Science Perspective: Insights from the Conscious Turing Machine

Authors: Lenore Blum, Manuel Blum

Abstract: The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. We examine consciousness from the perspective of theoretical computer science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of res… ▽ More The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. We examine consciousness from the perspective of theoretical computer science (TCS), a branch of mathematics concerned with understanding the underlying principles of computation and complexity, including the implications and surprising consequences of resource limitations. In the spirit of Alan Turing's simple yet powerful definition of a computer, the Turing Machine (TM), and perspective of computational complexity theory, we formalize a modified version of the Global Workspace Theory (GWT) of consciousness originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, Jean-Pierre Changeaux and others. We are not looking for a complex model of the brain nor of cognition, but for a simple computational model of (the admittedly complex concept of) consciousness. We do this by defining the Conscious Turing Machine (CTM), also called a conscious AI, and then we define consciousness and related notions in the CTM. While these are only mathematical (TCS) definitions, we suggest why the CTM has the feeling of consciousness. The TCS perspective provides a simple formal framework to employ tools from computational complexity theory and machine learning to help us understand consciousness and related concepts. Previously we explored high level explanations for the feelings of pain and pleasure in the CTM. Here we consider three examples related to vision (blindsight, inattentional blindness, and change blindness), followed by discussions of dreams, free will, and altered states of consciousness. △ Less

Submitted 5 July, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2011.09850

MSC Class: 68-02 ACM Class: I.2; F.0

arXiv:2101.02770 [pdf]

doi 10.1063/5.0044162

Simultaneous ambient pressure X-ray photoelectron spectroscopy and grazing incidence X-ray scattering in gas environments

Authors: H. Kersell, P. Chen, H. Martins, Q. Lu, F. Brausse, B. -H. Liu, M. Blum, S. Roy, B. Rude, A. Kilcoyne, H. Bluhm, S. Nemšák

Abstract: We have developed an experimental system to simultaneously observe surface structure, morphology, composition, chemical state, and chemical activity for samples in gas phase environments. This is accomplished by simultaneously measuring X-ray photoelectron spectroscopy (XPS) and grazing incidence X-ray scattering (GIXS) in gas pressures as high as the multi-Torr regime, while also recording mass s… ▽ More We have developed an experimental system to simultaneously observe surface structure, morphology, composition, chemical state, and chemical activity for samples in gas phase environments. This is accomplished by simultaneously measuring X-ray photoelectron spectroscopy (XPS) and grazing incidence X-ray scattering (GIXS) in gas pressures as high as the multi-Torr regime, while also recording mass spectrometry. Scattering patterns reflect near-surface sample structures from the nano- to the meso-scale. The grazing incidence geometry provides tunable depth sensitivity while scattered X-rays are detected across a broad range of angles using a newly designed pivoting-UHV-manipulator for detector positioning. At the same time, XPS and mass spectrometry can be measured, all from the same sample spot and in ambient conditions. To demonstrate the capabilities of this system, we measured the chemical state, composition, and structure of Ag-behenate on a Si(001) wafer in vacuum and in O$_2$ atmosphere at various temperatures. These simultaneous structural, chemical, and gas phase product probes enable detailed insights into the interplay between structure and chemical state for samples in gas phase environments. The compact size of our pivoting-UHV-manipulator makes it possible to retrofit this technique into existing spectroscopic instruments installed at synchrotron beamlines. Because many synchrotron facilities are planning or undergoing upgrades to diffraction limited storage rings with transversely coherent beams, a newly emerging set of coherent X-ray scattering experiments can greatly benefit from the concepts we present here. △ Less

Submitted 14 January, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: 21 pages, 4 figures

arXiv:2011.09850 [pdf]

A Theoretical Computer Science Perspective on Consciousness

Authors: Manuel Blum, Lenore Blum

Abstract: The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. This paper studies consciousness from the perspective of theoretical computer science. It formalizes the Global Workspace Theory (GWT) originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, and others. Our maj… ▽ More The quest to understand consciousness, once the purview of philosophers and theologians, is now actively pursued by scientists of many stripes. This paper studies consciousness from the perspective of theoretical computer science. It formalizes the Global Workspace Theory (GWT) originated by cognitive neuroscientist Bernard Baars and further developed by him, Stanislas Dehaene, and others. Our major contribution lies in the precise formal definition of a Conscious Turing Machine (CTM), also called a Conscious AI. We define the CTM in the spirit of Alan Turing's simple yet powerful definition of a computer, the Turing Machine (TM). We are not looking for a complex model of the brain nor of cognition but for a simple model of (the admittedly complex concept of) consciousness. After formally defining CTM, we give a formal definition of consciousness in CTM. We then suggest why the CTM has the feeling of consciousness. The reasonableness of the definitions and explanations can be judged by how well they agree with commonly accepted intuitive concepts of human consciousness, the breadth of related concepts that the model explains easily and naturally, and the extent of its agreement with scientific evidence. △ Less

Submitted 23 August, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

Comments: 33 pages; 10 figures

MSC Class: 68T01; 68T40; 68242; ACM Class: F.0; I.2

arXiv:1906.00029 [pdf, other]

Human-Usable Password Schemas: Beyond Information-Theoretic Security

Authors: Elan Rosenfeld, Santosh Vempala, Manuel Blum

Abstract: Password users frequently employ passwords that are too simple, or they just reuse passwords for multiple websites. A common complaint is that utilizing secure passwords is too difficult. One possible solution to this problem is to use a password schema. Password schemas are deterministic functions which map challenges (typically the website name) to responses (passwords). Previous work has been d… ▽ More Password users frequently employ passwords that are too simple, or they just reuse passwords for multiple websites. A common complaint is that utilizing secure passwords is too difficult. One possible solution to this problem is to use a password schema. Password schemas are deterministic functions which map challenges (typically the website name) to responses (passwords). Previous work has been done on develo** and analyzing publishable schemas, but these analyses have been information-theoretic, not complexity-theoretic; they consider an adversary with infinite computing power. We perform an analysis with respect to adversaries having currently achievable computing capabilities, assessing the realistic practical security of such schemas. We prove for several specific schemas that a computer is no worse off than an infinite adversary and that it can successfully extract all information from leaked challenges and their respective responses, known as challenge-response pairs. We also show that any schema that hopes to be secure against adversaries with bounded computation should obscure information in a very specific way, by introducing many possible constraints with each challenge-response pair. These surprising results put the analyses of password schemas on a more solid and practical footing. △ Less

Submitted 31 May, 2019; originally announced June 2019.

arXiv:1806.04549 [pdf, other]

doi 10.1109/IJCNN.2018.8489493

Early Seizure Detection with an Energy-Efficient Convolutional Neural Network on an Implantable Microcontroller

Authors: Maria Hügle, Simon Heller, Manuel Watter, Manuel Blum, Farrokh Manzouri, Matthias Dümpelmann, Andreas Schulze-Bonhage, Peter Woias, Joschka Boedecker

Abstract: Implantable, closed-loop devices for automated early detection and stimulation of epileptic seizures are promising treatment options for patients with severe epilepsy that cannot be treated with traditional means. Most approaches for early seizure detection in the literature are, however, not optimized for implementation on ultra-low power microcontrollers required for long-term implantation. In t… ▽ More Implantable, closed-loop devices for automated early detection and stimulation of epileptic seizures are promising treatment options for patients with severe epilepsy that cannot be treated with traditional means. Most approaches for early seizure detection in the literature are, however, not optimized for implementation on ultra-low power microcontrollers required for long-term implantation. In this paper we present a convolutional neural network for the early detection of seizures from intracranial EEG signals, designed specifically for this purpose. In addition, we investigate approximations to comply with hardware limits while preserving accuracy. We compare our approach to three previously proposed convolutional neural networks and a feature-based SVM classifier with respect to detection accuracy, latency and computational needs. Evaluation is based on a comprehensive database with long-term EEG recordings. The proposed method outperforms the other detectors with a median sensitivity of 0.96, false detection rate of 10.1 per hour and median detection delay of 3.7 seconds, while being the only approach suited to be realized on a low power microcontroller due to its parsimonious use of computational and memory resources. △ Less

Submitted 12 June, 2018; originally announced June 2018.

Comments: Accepted at IJCNN 2018

arXiv:1802.04887 [pdf]

doi 10.1287/deca.2015.0321

Probabilistic Warnings in National Security Crises: Pearl Harbor Revisited

Authors: David M. Blum, M. Elisabeth Pate-Cornell

Abstract: Imagine a situation where a group of adversaries is preparing an attack on the United States or U.S. interests. An intelligence analyst has observed some signals, but the situation is rapidly changing. The analyst faces the decision to alert a principal decision maker that an attack is imminent, or to wait until more is known about the situation. This warning decision is based on the analyst's obs… ▽ More Imagine a situation where a group of adversaries is preparing an attack on the United States or U.S. interests. An intelligence analyst has observed some signals, but the situation is rapidly changing. The analyst faces the decision to alert a principal decision maker that an attack is imminent, or to wait until more is known about the situation. This warning decision is based on the analyst's observation and evaluation of signals, independent or correlated, and on her updating of the prior probabilities of possible scenarios and their outcomes. The warning decision also depends on the analyst's assessment of the crisis' dynamics and perception of the preferences of the principal decision maker, as well as the lead time needed for an appropriate response. This article presents a model to support this analyst's dynamic warning decision. As with most problems involving warning, the key is to manage the tradeoffs between false positives and false negatives given the probabilities and the consequences of intelligence failures of both types. The model is illustrated by revisiting the case of the attack on Pearl Harbor in December 1941. It shows that the radio silence of the Japanese fleet carried considerable information (Sir Arthur Conan Doyle's "dog in the night" problem), which was misinterpreted at the time. Even though the probabilities of different attacks were relatively low, their consequences were such that the Bayesian dynamic reasoning described here may have provided valuable information to key decision makers. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Journal ref: Decision Analysis 13:1 (2015) 1-25

arXiv:1707.01254 [pdf, other]

Regression approaches for Approximate Bayesian Computation

Authors: Michael GB Blum

Abstract: This book chapter introduces regression approaches and regression adjustment for Approximate Bayesian Computation (ABC). Regression adjustment adjusts parameter values after rejection sampling in order to account for the imperfect match between simulations and observations. Imperfect match between simulations and observations can be more pronounced when there are many summary statistics, a phenome… ▽ More This book chapter introduces regression approaches and regression adjustment for Approximate Bayesian Computation (ABC). Regression adjustment adjusts parameter values after rejection sampling in order to account for the imperfect match between simulations and observations. Imperfect match between simulations and observations can be more pronounced when there are many summary statistics, a phenomenon coined as the curse of dimensionality. Because of this imperfect match, credibility intervals obtained with regression approaches can be inflated compared to true credibility intervals. The chapter presents the main concepts underlying regression adjustment. A theorem that compares theoretical properties of posterior distributions obtained with and without regression adjustment is presented. Last, a practical application of regression adjustment in population genetics shows that regression adjustment shrinks posterior distributions compared to rejection approaches, which is a solution to avoid inflated credibility intervals. △ Less

Submitted 5 July, 2017; originally announced July 2017.

Comments: Book chapter, published in Handbook of Approximate Bayesian Computation 2018

arXiv:1707.01204 [pdf]

The Complexity of Human Computation: A Concrete Model with an Application to Passwords

Authors: Manuel Blum, Santosh Vempala

Abstract: What can humans compute in their heads? We are thinking of a variety of Crypto Protocols, games like Sudoku, Crossword Puzzles, Speed Chess, and so on. The intent of this paper is to apply the ideas and methods of theoretical computer science to better understand what humans can compute in their heads. For example, can a person compute a function in their head so that an eavesdropper with a powerf… ▽ More What can humans compute in their heads? We are thinking of a variety of Crypto Protocols, games like Sudoku, Crossword Puzzles, Speed Chess, and so on. The intent of this paper is to apply the ideas and methods of theoretical computer science to better understand what humans can compute in their heads. For example, can a person compute a function in their head so that an eavesdropper with a powerful computer --- who sees the responses to random input --- still cannot infer responses to new inputs? To address such questions, we propose a rigorous model of human computation and associated measures of complexity. We apply the model and measures first and foremost to the problem of (1) humanly computable password generation, and then consider related problems of (2) humanly computable "one-way functions" and (3) humanly computable "pseudorandom generators". The theory of Human Computability developed here plays by different rules than standard computability, and this takes some getting used to. For reasons to be made clear, the polynomial versus exponential time divide of modern computability theory is irrelevant to human computation. In human computability, the step-counts for both humans and computers must be more concrete. Specifically, we restrict the adversary to at most 10^24 (Avogadro number of) steps. An alternate view of this work is that it deals with the analysis of algorithms and counting steps for the case that inputs are small as opposed to the usual case of inputs large-in-the-limit. △ Less

Submitted 4 July, 2017; originally announced July 2017.

arXiv:1601.04096 [pdf, other]

Goodness-of-fit statistics for approximate Bayesian computation

Authors: Louisiane Lemaire, Flora Jay, I-Hung Lee, Katalin Csilléry, Michael G. B. Blum

Abstract: Approximate Bayesian computation is a statistical framework that uses numerical simulations to calibrate and compare models. Instead of computing likelihood functions, Approximate Bayesian computation relies on numerical simulations, which makes it applicable to complex models in ecology and evolution. As usual for statistical modeling, evaluating goodness-of-fit is a fundamental step for Approxim… ▽ More Approximate Bayesian computation is a statistical framework that uses numerical simulations to calibrate and compare models. Instead of computing likelihood functions, Approximate Bayesian computation relies on numerical simulations, which makes it applicable to complex models in ecology and evolution. As usual for statistical modeling, evaluating goodness-of-fit is a fundamental step for Approximate Bayesian Computation. Here, we introduce a goodness-of-fit approach based on hypothesis-testing. We introduce two test statistics based on the mean distance between numerical summaries of the data and simulated ones. One test statistic relies on summaries simulated with the prior predictive distribution whereas the other one relies on simulations from the posterior predictive distribution. For different coalescent models, we find that the statistics are well calibrated, meaning that the type I error can be controlled. However, the statistical power of the two statistics is extremely variable across models ranging from 20% to 100%. The difference of power between the two statistics is negligible in models of demographic inference but substantial in an additional and purely statistical example. When analyzing resequencing data to evaluate models of human demography, the two statistics confirm that an out-of-Africa bottleneck cannot be rejected for Asiatic and European data. We also consider two speciation models in the context of a butterfly species complex. One goodness-of-fit statistic indicates a poor fit for both models, and the numerical summaries causing the poor fit were identified using posterior predictive checks. Statistical tests for goodness-of-fit should foster evaluation of model fit in Approximate Bayesian Computation. The test statistic based on simulations from the prior predictive distribution is implemented in the gfit function of the R abc package. △ Less

Submitted 15 January, 2016; originally announced January 2016.

arXiv:1504.04543 [pdf, other]

Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes data

Authors: Nicolas Duforet-Frebourg, Keurcien Luu, Guillaume Laval, Eric Bazin, Michael G. B. Blum

Abstract: To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the co… ▽ More To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) after removal of recently admixed individuals resulting in 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3X). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. △ Less

Submitted 18 November, 2015; v1 submitted 8 April, 2015; originally announced April 2015.

arXiv:1404.0024 [pdf, other]

Towards Human Computable Passwords

Authors: Jeremiah Blocki, Manuel Blum, Anupam Datta, Santosh Vempala

Abstract: An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computation… ▽ More An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computations correctly but does not provide confidentiality. Our schemes use a semi-trusted computer to store and display public challenges $C_i\in[n]^k$. The human user memorizes a random secret map** $σ:[n]\rightarrow\mathbb{Z}_d$ and authenticates by computing responses $f(σ(C_i))$ to a sequence of public challenges where $f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d$ is a function that is easy for the human to evaluate. We prove that any statistical adversary needs to sample $m=\tildeΩ(n^{s(f)})$ challenge-response pairs to recover $σ$, for a security parameter $s(f)$ that depends on two key properties of $f$. To obtain our results, we apply the general hypercontractivity theorem to lower bound the statistical dimension of the distribution over challenge-response pairs induced by $f$ and $σ$. Our lower bounds apply to arbitrary functions $f $ (not just to functions that are easy for a human to evaluate), and generalize recent results of Feldman et al. As an application, we propose a family of human computable password functions $f_{k_1,k_2}$ in which the user needs to perform $2k_1+2k_2+1$ primitive operations (e.g., adding two digits or remembering $σ(i)$), and we show that $s(f) = \min\{k_1+1, (k_2+1)/2\}$. For these schemes, we prove that forging passwords is equivalent to recovering the secret map**. Thus, our human computable password schemes can maintain strong security guarantees even after an adversary has observed the user login to many different accounts. △ Less

Submitted 9 September, 2016; v1 submitted 31 March, 2014; originally announced April 2014.

Comments: Fixed bug in definition of Q^{f,j} and modified proofs accordingly

arXiv:1402.5321 [pdf, other]

doi 10.1093/molbev/msu182

Genome scans for detecting footprints of local adaptation using a Bayesian factor model

Authors: N. Duforet-Frebourg, E. Bazin, M. G. B. Blum

Abstract: A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genoty** numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. Howev… ▽ More A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genoty** numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. However they are important caveats with approaches related to $F_{ST}$ because they require grou** individuals into populations and they additionally assume a particular model of population structure. Here we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. As outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that the factor model can achieve a 2-fold or more reduction of false discovery rate compared to the software BayeScan or compared to a $F_{ST}$ approach. We analyze the data of the Human Genome Diversity Panel to provide an example of how factor models can be used to detect local adaptation with a large number of SNPs. The Bayesian factor model is implemented in the open-source PCAdapt software. △ Less

Submitted 29 July, 2014; v1 submitted 21 February, 2014; originally announced February 2014.

Comments: This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01),Molecular Biology and Evolution 2014

MSC Class: 62P10

arXiv:1310.1137 [pdf, ps, other]

doi 10.1145/2517312.2517319

GOTCHA Password Hackers!

Authors: Jeremiah Blocki, Manuel Blum, Anupam Datta

Abstract: We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve.… ▽ More We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme. △ Less

Submitted 3 October, 2013; originally announced October 2013.

Comments: 2013 ACM Workshop on Artificial Intelligence and Security (AISec)

arXiv:1302.5122 [pdf, ps, other]

Naturally Rehearsing Passwords

Authors: Jeremiah Blocki, Manuel Blum, Anupam Datta

Abstract: We introduce quantitative usability and security models to guide the design of password management schemes --- systematic strategies to help users create and remember multiple passwords. In the same way that security proofs in cryptography are based on complexity-theoretic assumptions (e.g., hardness of factoring and discrete logarithm), we quantify usability by introducing usability assumptions.… ▽ More We introduce quantitative usability and security models to guide the design of password management schemes --- systematic strategies to help users create and remember multiple passwords. In the same way that security proofs in cryptography are based on complexity-theoretic assumptions (e.g., hardness of factoring and discrete logarithm), we quantify usability by introducing usability assumptions. In particular, password management relies on assumptions about human memory, e.g., that a user who follows a particular rehearsal schedule will successfully maintain the corresponding memory. These assumptions are informed by research in cognitive science and validated through empirical studies. Given rehearsal requirements and a user's visitation schedule for each account, we use the total number of extra rehearsals that the user would have to do to remember all of his passwords as a measure of the usability of the password scheme. Our usability model leads us to a key observation: password reuse benefits users not only by reducing the number of passwords that the user has to memorize, but more importantly by increasing the natural rehearsal rate for each password. We also present a security model which accounts for the complexity of password management with multiple accounts and associated threats, including online, offline, and plaintext password leak attacks. Observing that current password management schemes are either insecure or unusable, we present Shared Cues--- a new scheme in which the underlying secret is strategically shared across accounts to ensure that most rehearsal requirements are satisfied naturally while simultaneously providing strong security. The construction uses the Chinese Remainder Theorem to achieve these competing goals. △ Less

Submitted 9 September, 2013; v1 submitted 20 February, 2013; originally announced February 2013.

arXiv:1301.3166 [pdf, other]

Diagnostic tools of approximate Bayesian computation using the coverage property

Authors: D. Prangle, M. G. B. Blum, G. Popovic, S. A. Sisson

Abstract: Approximate Bayesian computation (ABC) is an approach for sampling from an approximate posterior distribution in the presence of a computationally intractable likelihood function. A common implementation is based on simulating model, parameter and dataset triples, (m,θ,y), from the prior, and then accepting as samples from the approximate posterior, those pairs (m,θ) for which y, or a summary of y… ▽ More Approximate Bayesian computation (ABC) is an approach for sampling from an approximate posterior distribution in the presence of a computationally intractable likelihood function. A common implementation is based on simulating model, parameter and dataset triples, (m,θ,y), from the prior, and then accepting as samples from the approximate posterior, those pairs (m,θ) for which y, or a summary of y, is "close" to the observed data. Closeness is typically determined though a distance measure and a kernel scale parameter, ε. Appropriate choice of εis important to producing a good quality approximation. This paper proposes diagnostic tools for the choice of εbased on assessing the coverage property, which asserts that credible intervals have the correct coverage levels. We provide theoretical results on coverage for both model and parameter inference, and adapt these into diagnostics for the ABC context. We re-analyse a study on human demographic history to determine whether the adopted posterior approximation was appropriate. R code implementing the proposed methodology is freely available in the package "abc." △ Less

Submitted 14 January, 2013; originally announced January 2013.

Comments: Figures 8-13 are Supplementary Information Figures S1-S6

arXiv:1209.5242 [pdf, other]

doi 10.1111/evo.12342

Non-stationary patterns of isolation-by-distance: inferring measures of local genetic differentiation with Bayesian kriging

Authors: Nicolas Duforet-Frebourg, Michael G. B. Blum

Abstract: Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density c… ▽ More Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density can generate non-stationary patterns of isolation-by-distance where the rate at which genetic differentiation accumulates varies across space. To characterize non-stationary patterns of isolation-by-distance, we infer local genetic differentiation based on Bayesian kriging. Local genetic differentiation for a sampled population is defined as the average genetic differentiation between the sampled population and fictive neighboring populations. To avoid defining populations in advance, the method can also be applied at the scale of individuals making it relevant for landscape genetics. Inference of local genetic differentiation relies on a matrix of pairwise similarity or dissimilarity between populations or individuals such as matrices of FST between pairs of populations. Simulation studies show that maps of local genetic differentiation can reveal barriers to gene flow but also other patterns such as continuous variations of gene flow across habitat. The potential of the method is illustrated with 2 data sets: genome-wide SNP data for human Swedish populations and AFLP markers for alpine plant species. The software LocalDiff implementing the method is available at http://membres-timc.imag.fr/Michael.Blum/LocalDiff.html △ Less

Submitted 7 January, 2014; v1 submitted 24 September, 2012; originally announced September 2012.

Comments: In press, Evolution 2014

MSC Class: 62P10

arXiv:1202.3819 [pdf, ps, other]

doi 10.1214/12-STS406

A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation

Authors: M. G. B. Blum, M. A. Nunes, D. Prangle, S. A. Sisson

Abstract: Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics fr… ▽ More Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets. △ Less

Submitted 11 June, 2013; v1 submitted 16 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS406

Journal ref: Statistical Science 2013, Vol. 28, No. 2, 189-208

arXiv:1106.2793 [pdf, other]

abc: an R package for Approximate Bayesian Computation (ABC)

Authors: Katalin Csilléry, Olivier François, Michael GB Blum

Abstract: Many recent statistical applications involve inference under complex models, where it is computationally prohibitive to calculate likelihoods but possible to simulate data. Approximate Bayesian Computation (ABC) is devoted to these complex models because it bypasses evaluations of the likelihood function using comparisons between observed and simulated summary statistics. We introduce the R abc pa… ▽ More Many recent statistical applications involve inference under complex models, where it is computationally prohibitive to calculate likelihoods but possible to simulate data. Approximate Bayesian Computation (ABC) is devoted to these complex models because it bypasses evaluations of the likelihood function using comparisons between observed and simulated summary statistics. We introduce the R abc package that implements several ABC algorithms for performing parameter estimation and model selection. In particular, the recently developed non-linear heteroscedastic regression methods for ABC are implemented. The abc package also includes a cross-validation tool for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities when performing model selection. The main functions are accompanied by appropriate summary and plotting tools. Considering an example of demographic inference with population genetics data, we show the potential of the R package. R is already widely used in bioinformatics and several fields of biology. The R abc package will make the ABC algorithms available to the large number of R users. abc is a freely available R package under the GPL license, and it can be downloaded at http://cran.r-project.org/web/packages/abc/index.html. △ Less

Submitted 14 June, 2011; originally announced June 2011.

arXiv:0904.0635 [pdf, ps, other]

Approximate Bayesian Computation: a nonparametric perspective

Authors: Michael Blum

Abstract: Approximate Bayesian Computation is a family of likelihood-free inference techniques that are well-suited to models defined in terms of a stochastic generating mechanism. In a nutshell, Approximate Bayesian Computation proceeds by computing summary statistics s_obs from the data and simulating summary statistics for different values of the parameter theta. The posterior distribution is then approx… ▽ More Approximate Bayesian Computation is a family of likelihood-free inference techniques that are well-suited to models defined in terms of a stochastic generating mechanism. In a nutshell, Approximate Bayesian Computation proceeds by computing summary statistics s_obs from the data and simulating summary statistics for different values of the parameter theta. The posterior distribution is then approximated by an estimator of the conditional density g(theta|s_obs). In this paper, we derive the asymptotic bias and variance of the standard estimators of the posterior distribution which are based on rejection sampling and linear adjustment. Additionally, we introduce an original estimator of the posterior distribution based on quadratic adjustment and we show that its bias contains a fewer number of terms than the estimator with linear adjustment. Although we find that the estimators with adjustment are not universally superior to the estimator based on rejection sampling, we find that they can achieve better performance when there is a nearly homoscedastic relationship between the summary statistics and the parameter of interest. To make this relationship as homoscedastic as possible, we propose to use transformations of the summary statistics. In different examples borrowed from the population genetics and epidemiological literature, we show the potential of the methods with adjustment and of the transformations of the summary statistics. Supplemental materials containing the details of the proofs are available online. △ Less

Submitted 31 May, 2010; v1 submitted 3 April, 2009; originally announced April 2009.

arXiv:0810.0896 [pdf, ps, other]

doi 10.1093/biostatistics/kxq022

HIV with contact-tracing: a case study in Approximate Bayesian Computation

Authors: Michael G. B. Blum, Viet Chi Tran

Abstract: Missing data is a recurrent issue in epidemiology where the infection process may be partially observed. Approximate Bayesian Computation, an alternative to data imputation methods such as Markov Chain Monte Carlo integration, is proposed for making inference in epidemiological models. It is a likelihood-free method that relies exclusively on numerical simulations. ABC consists in computing a dist… ▽ More Missing data is a recurrent issue in epidemiology where the infection process may be partially observed. Approximate Bayesian Computation, an alternative to data imputation methods such as Markov Chain Monte Carlo integration, is proposed for making inference in epidemiological models. It is a likelihood-free method that relies exclusively on numerical simulations. ABC consists in computing a distance between simulated and observed summary statistics and weighting the simulations according to this distance. We propose an original extension of ABC to path-valued summary statistics, corresponding to the cumulated number of detections as a function of time. For a standard compartmental model with Suceptible, Infectious and Recovered individuals (SIR), we show that the posterior distributions obtained with ABC and MCMC are similar. In a refined SIR model well-suited to the HIV contact-tracing data in Cuba, we perform a comparison between ABC with full and binned detection times. For the Cuban data, we evaluate the efficiency of the detection system and predict the evolution of the HIV-AIDS disease. In particular, the percentage of undetected infectious individuals is found to be of the order of 40%. △ Less

Submitted 31 May, 2010; v1 submitted 6 October, 2008; originally announced October 2008.

Journal ref: Biostatistics 11, 4 (2010) 644-660

arXiv:0809.4178 [pdf, ps, other]

doi 10.1007/s11222-009-9116-0

Non-linear regression models for Approximate Bayesian Computation

Authors: M. G. B. Blum, O. Francois

Abstract: Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior densi… ▽ More Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model. △ Less

Submitted 23 February, 2009; v1 submitted 24 September, 2008; originally announced September 2008.

Comments: 4 figures; version 3 minor changes; to appear in Statistics and Computing

Journal ref: Statistics and Computing, 20: 63-73 (2010)

arXiv:math/0702415 [pdf, ps, other]

doi 10.1214/105051606000000547

The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance

Authors: Michael G. B. Blum, Olivier François, Svante Janson

Abstract: For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two… ▽ More For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees. △ Less

Submitted 14 February, 2007; originally announced February 2007.

Comments: Published at http://dx.doi.org/10.1214/105051606000000547 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP0203 MSC Class: 05C05 (Primary) 60F05; 60C05; 92D15 (Secondary)

Journal ref: Annals of Applied Probability 2006, Vol. 16, No. 4, 2195-2214

arXiv:q-bio/0604016 [pdf, ps, other]

A mean-field analysis of community structure in social and kin networks

Authors: E. Durand, M. G. B Blum, O. Francois

Abstract: We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In p… ▽ More We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In particular, we show that the probability distribution of the community size is well-approximated by a power-law distribution with exponent two. We illustrate the robustness of the mean-field analysis by comparing its predictions on previously studied social networks and biological data. △ Less

Submitted 13 April, 2006; originally announced April 2006.

Comments: 31 pages, 4 figures

arXiv:cond-mat/0506701 [pdf, ps, other]

doi 10.1103/PhysRevLett.95.196102

Fluorescence and phosphorescence from individual C$_{60}$ molecules excited by local electron tunneling

Authors: Elizabeta Ćavar, Marie-Christine Blüm, Marina Pivetta, François Patthey, Majed Chergui, Wolf-Dieter Schneider

Abstract: Using the highly localized current of electrons tunneling through a double barrier Scanning Tunneling Microscope (STM) junction, we excite luminescence from a selected C$_{60}$ molecule in the surface layer of fullerene nanocrystals grown on an ultrathin NaCl film on Au(111). In the observed luminescence fluorescence and phosphorescence spectra, pure electronic as well as vibronically induced tr… ▽ More Using the highly localized current of electrons tunneling through a double barrier Scanning Tunneling Microscope (STM) junction, we excite luminescence from a selected C$_{60}$ molecule in the surface layer of fullerene nanocrystals grown on an ultrathin NaCl film on Au(111). In the observed luminescence fluorescence and phosphorescence spectra, pure electronic as well as vibronically induced transitions of an individual C$_{60}$ molecule are identified, leading to unambiguous chemical recognition on the single-molecular scale. △ Less

Submitted 27 June, 2005; originally announced June 2005.

Showing 1–36 of 36 results for author: Blum, M