Search | arXiv e-print repository

Breeding Programs Optimization with Reinforcement Learning

Authors: Omar G. Younis, Luca Corinzia, Ioannis N. Athanasiadis, Andreas Krause, Joachim M. Buhmann, Matteo Turchetta

Abstract: Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize… ▽ More Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize simulated crop breeding programs. RL agents are trained to make optimal crop selection and cross-breeding decisions based on genetic information. To benchmark RL-based breeding algorithms, we introduce a suite of Gym environments. The study demonstrates the superiority of RL techniques over standard practices in terms of genetic gain when simulated in silico using real-world genomic maize data. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning

arXiv:2405.19614 [pdf, other]

TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM

Authors: Peifeng Jiang, Hong Liu, Xia Li, Ti Wang, Fabian Zhang, Joachim M. Buhmann

Abstract: The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results,… ▽ More The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results, which increase the difficulty of 3DGS rendering convergence. Thus, a cutting-edge 3DGS-based SLAM system is introduced, leveraging the efficiency and flexibility of 3DGS to achieve real-time performance while remaining robust against sensor noise, motion blur, and the challenges posed by long-session SLAM. Central to this approach is the Fusion Bridge module, which seamlessly integrates tracking-centered ORB Visual Odometry with map**-centered online 3DGS. Precise pose initialization is enabled by this module through joint optimization of re-projection and rendering loss, as well as strategic view selection, enhancing rendering convergence in large-scale scenes. Extensive experiments demonstrate state-of-the-art rendering quality and localization accuracy, positioning this system as a promising solution for real-world robotics applications that require stable, near-real-time performance. Our project is available at https://ZeldaFromHeaven.github.io/TAMBRIDGE/ △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.12352 [pdf, other]

Point-In-Context: Understanding Point Cloud via In-Context Learning

Authors: Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

Abstract: With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context l… ▽ More With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning. We address the technical challenge of effectively extending masked point modeling to 3D point clouds by introducing a Joint Sampling module and proposing a vanilla version of PIC called Point-In-Context-Generalist (PIC-G). PIC-G is designed as a generalist model for various 3D point cloud tasks, with inputs and outputs modeled as coordinates. In this paradigm, the challenging segmentation task is achieved by assigning label points with XYZ coordinates for each category; the final prediction is then chosen based on the label point closest to the predictions. To break the limitation by the fixed label-coordinate assignment, which has poor generalization upon novel classes, we propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S), targeting improving dynamic context labeling and model training. By utilizing dynamic in-context labels and extra in-context pairs, PIC-S achieves enhanced performance and generalization capability in and across part segmentation datasets. PIC is a general framework so that other tasks or datasets can be seamlessly introduced into our PIC through a unified data format. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks and segmenting multi-datasets. Our PIC-S is capable of generalizing unseen datasets and performing novel part segmentation by customizing prompts. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Project page: https://fanglaosi.github.io/Point-In-Context_Pages. arXiv admin note: text overlap with arXiv:2306.08659

arXiv:2402.06974 [pdf, other]

Hypernetwork-Driven Model Fusion for Federated Domain Generalization

Authors: Marc Bartholet, Taehyeon Kim, Ami Beuret, Se-Young Yun, Joachim M. Buhmann

Abstract: Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (h… ▽ More Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (hFedF), using hypernetworks for non-linear aggregation, facilitating generalization to unseen domains. Our method employs client-specific embeddings and gradient alignment techniques to manage domain generalization effectively. Evaluated in both zero-shot and few-shot settings, hFedF demonstrates superior performance in handling domain shifts. Comprehensive comparisons on PACS, Office-Home, and VLCS datasets show that hFedF consistently achieves the highest in-domain and out-of-domain accuracy with reliable predictions. Our study contributes significantly to the under-explored field of Federated Domain Generalization (FDG), setting a new benchmark for performance in this area. △ Less

Submitted 28 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2312.14329 [pdf, other]

Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective

Authors: João B. S. Carvalho, Mengtao Zhang, Robin Geyer, Carlos Cotrini, Joachim M. Buhmann

Abstract: Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase… ▽ More Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness. Code is available at: https://github.com/JoaoCarv/invariant-anomaly-detection. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2308.09189 [pdf, ps, other]

Regularizing Adversarial Imitation Learning Using Causal Invariance

Authors: Ivan Ovinnikov, Joachim M. Buhmann

Abstract: Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbi… ▽ More Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbing spurious correlations present in the expert data. To alleviate this issue, we propose to use causal invariance as a regularization principle for adversarial training of these models. The regularization objective is applicable in a straightforward manner to existing adversarial imitation frameworks. We demonstrate the efficacy of the regularized formulation in an illustrative two-dimensional setting as well as a number of high-dimensional robot locomotion benchmark tasks. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Published at the ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability

arXiv:2306.09035 [pdf, other]

Improving Explainability of Disentangled Representations using Multipath-Attribution Map**s

Authors: Lukas Klein, João B. S. Carvalho, Mennatallah El-Assady, Paolo Penna, Joachim M. Buhmann, Paul F. Jaeger

Abstract: Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based clinical diagnostics, it is necessary to integrate explainable AI into these safety-critical systems. Current explanatory methods typically assign attribution scores… ▽ More Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based clinical diagnostics, it is necessary to integrate explainable AI into these safety-critical systems. Current explanatory methods typically assign attribution scores to pixel regions in the input image, indicating their importance for a model's decision. However, they fall short when explaining why a visual feature is used. We propose a framework that utilizes interpretable disentangled representations for downstream-task prediction. Through visualizing the disentangled representations, we enable experts to investigate possible causation effects by leveraging their domain knowledge. Additionally, we deploy a multi-path attribution map** for enriching and validating explanations. We demonstrate the effectiveness of our approach on a synthetic benchmark suite and two medical datasets. We show that the framework not only acts as a catalyst for causal relation extraction but also enhances model robustness by enabling shortcut detection without the need for testing under distribution shifts. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Journal ref: Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:689-712, 2022

arXiv:2306.08659 [pdf, other]

Explore In-Context Learning for 3D Point Cloud Understanding

Authors: Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu

Abstract: With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, di… ▽ More With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. △ Less

Submitted 27 December, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: Project page: https://github.com/fanglaosi/Point-In-Context

arXiv:2210.03649 [pdf, other]

How to Enable Uncertainty Estimation in Proximal Policy Optimization

Authors: Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim, Joachim M. Buhmann

Abstract: While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ens… ▽ More While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents. △ Less

Submitted 7 October, 2022; originally announced October 2022.

ACM Class: I.2; I.2.6; I.2.8; I.2.9; I.2.10

arXiv:2209.12590 [pdf, other]

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Authors: Đorđe Miladinović, Kumar Shridhar, Kushal Jain, Max B. Paulus, Joachim M. Buhmann, Mrinmaya Sachan, Carl Allen

Abstract: In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the pow… ▽ More In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the powerful decoder by applying uniformly random dropout to the decoder input. We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space. △ Less

Submitted 16 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2208.10481 [pdf, other]

BARReL: Bottleneck Attention for Adversarial Robustness in Vision-Based Reinforcement Learning

Authors: Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim, Joachim M. Buhmann

Abstract: Robustness to adversarial perturbations has been explored in many areas of computer vision. This robustness is particularly relevant in vision-based reinforcement learning, as the actions of autonomous agents might be safety-critic or impactful in the real world. We investigate the susceptibility of vision-based reinforcement learning agents to gradient-based adversarial attacks and evaluate a pot… ▽ More Robustness to adversarial perturbations has been explored in many areas of computer vision. This robustness is particularly relevant in vision-based reinforcement learning, as the actions of autonomous agents might be safety-critic or impactful in the real world. We investigate the susceptibility of vision-based reinforcement learning agents to gradient-based adversarial attacks and evaluate a potential defense. We observe that Bottleneck Attention Modules (BAM) included in CNN architectures can act as potential tools to increase robustness against adversarial attacks. We show how learned attention maps can be used to recover activations of a convolutional layer by restricting the spatial activations to salient regions. Across a number of RL environments, BAM-enhanced architectures show increased robustness during inference. Finally, we discuss potential future research directions. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 5 pages, 2 figures, 3 tables

ACM Class: I.2.6; I.2.8; I.2.9; I.2.10; I.5.4

arXiv:2206.12444 [pdf, other]

Gated Domain Units for Multi-source Domain Generalization

Authors: Simon Föll, Alina Dubatovka, Eugen Ernst, Siu Lun Chau, Martin Maritsch, Patrik Okanovic, Gudrun Thäter, Joachim M. Buhmann, Felix Wortmann, Krikamol Muandet

Abstract: The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary… ▽ More The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary Distributions (I.E.D) across different domains. This assumption implies an invariant structure in the solution space that enables knowledge transfer to unseen domains. To exploit this property for domain generalization, we introduce a modular neural network layer consisting of Gated Domain Units (GDUs) that learn a representation for each latent elementary distribution. During inference, a weighted ensemble of learning machines can be created by comparing new observations with the representations of each elementary distribution. Our flexible framework also accommodates scenarios where explicit domain information is not present. Extensive experiments on image, text, and graph data show consistent performance improvement on out-of-training target domains. These findings support the practicality of the I.E.D assumption and the effectiveness of GDUs for domain generalisation. △ Less

Submitted 16 May, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

arXiv:2103.11713 [pdf, other]

Spatially Dependent U-Nets: Highly Accurate Architectures for Medical Imaging Segmentation

Authors: João B. S. Carvalho, João A. Santinha, Đorđe Miladinović, Joachim M. Buhmann

Abstract: In clinical practice, regions of interest in medical imaging often need to be identified through a process of precise image segmentation. The quality of this image segmentation step critically affects the subsequent clinical assessment of the patient status. To enable high accuracy, automatic image segmentation, we introduce a novel deep neural network architecture that exploits the inherent spati… ▽ More In clinical practice, regions of interest in medical imaging often need to be identified through a process of precise image segmentation. The quality of this image segmentation step critically affects the subsequent clinical assessment of the patient status. To enable high accuracy, automatic image segmentation, we introduce a novel deep neural network architecture that exploits the inherent spatial coherence of anatomical structures and is well equipped to capture long-range spatial dependencies in the segmented pixel/voxel space. In contrast to the state-of-the-art solutions based on convolutional layers, our approach leverages on recently introduced spatial dependency layers that have an unbounded receptive field and explicitly model the inductive bias of spatial coherence. Our method performs favourably to commonly used U-Net and U-Net++ architectures as demonstrated by improved Dice and Jaccardscore in three different medical segmentation tasks: nuclei segmentation in microscopy images, polyp segmentation in colonoscopy videos, and liver segmentation in abdominal CT scans. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2103.08877 [pdf, other]

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling

Authors: Đorđe Miladinović, Aleksandar Stanić, Stefan Bauer, Jürgen Schmidhuber, Joachim M. Buhmann

Abstract: How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechani… ▽ More How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings. The accompanying source code is given at https://github.com/djordjemila/sdn. △ Less

Submitted 16 March, 2021; originally announced March 2021.

Journal ref: International Conference on Learning Representations (2021);

arXiv:2101.09994 [pdf, ps, other]

On maximum-likelihood estimation in the all-or-nothing regime

Authors: Luca Corinzia, Paolo Penna, Wojciech Szpankowski, Joachim M. Buhmann

Abstract: We study the problem of estimating a rank-1 additive deformation of a Gaussian tensor according to the \emph{maximum-likelihood estimator} (MLE). The analysis is carried out in the sparse setting, where the underlying signal has a support that scales sublinearly with the total number of dimensions. We show that for Bernoulli distributed signals, the MLE undergoes an \emph{all-or-nothing} (AoN) pha… ▽ More We study the problem of estimating a rank-1 additive deformation of a Gaussian tensor according to the \emph{maximum-likelihood estimator} (MLE). The analysis is carried out in the sparse setting, where the underlying signal has a support that scales sublinearly with the total number of dimensions. We show that for Bernoulli distributed signals, the MLE undergoes an \emph{all-or-nothing} (AoN) phase transition, already established for the minimum mean-square-error estimator (MMSE) in the same problem. The result follows from two main technical points: (i) the connection established between the MLE and the MMSE, using the first and second-moment methods in the constrained signal space, (ii) a recovery regime for the MMSE stricter than the simple error vanishing characterization given in the standard AoN, that is here proved as a general result. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2011.11500 [pdf, other]

Statistical and computational thresholds for the planted $k$-densest sub-hypergraph problem

Authors: Luca Corinzia, Paolo Penna, Wojciech Szpankowski, Joachim M. Buhmann

Abstract: In this work, we consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight \emph{information-theoretic} upper and lower bounds for the exact recovery threshol… ▽ More In this work, we consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight \emph{information-theoretic} upper and lower bounds for the exact recovery threshold by the maximum-likelihood estimator, as well as \emph{algorithmic} bounds based on approximate message passing algorithms. The problem exhibits a typical statistical-to-computational gap observed in analogous sparse settings that widen with increasing sparsity of the problem. The bounds show that the signal structure impacts the location of the statistical and computational phase transition that the known existing bounds for the tensor-PCA model do not capture. This effect is due to the generic planted signal prior that this latter model addresses. △ Less

Submitted 28 January, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

arXiv:2008.05867 [pdf, other]

Neural collaborative filtering for unsupervised mitral valve segmentation in echocardiography

Authors: Luca Corinzia, Fabian Laumer, Alessandro Candreva, Maurizio Taramasso, Francesco Maisano, Joachim M. Buhmann

Abstract: The segmentation of the mitral valve annulus and leaflets specifies a crucial first step to establish a machine learning pipeline that can support physicians in performing multiple tasks, e.g.\ diagnosis of mitral valve diseases, surgical planning, and intraoperative procedures. Current methods for mitral valve segmentation on 2D echocardiography videos require extensive interaction with annotator… ▽ More The segmentation of the mitral valve annulus and leaflets specifies a crucial first step to establish a machine learning pipeline that can support physicians in performing multiple tasks, e.g.\ diagnosis of mitral valve diseases, surgical planning, and intraoperative procedures. Current methods for mitral valve segmentation on 2D echocardiography videos require extensive interaction with annotators and perform poorly on low-quality and noisy videos. We propose an automated and unsupervised method for the mitral valve segmentation based on a low dimensional embedding of the echocardiography videos using neural network collaborative filtering. The method is evaluated in a collection of echocardiography videos of patients with a variety of mitral valve diseases, and additionally on an independent test cohort. It outperforms state-of-the-art \emph{unsupervised} and \emph{supervised} methods on low-quality videos or in the case of sparse annotation. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2006.13474 [pdf, other]

Continuous Submodular Function Maximization

Authors: Yatao Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Continuous submodular functions are a category of generally non-convex/non-concave functions with a wide spectrum of applications. The celebrated property of this class of functions - continuous submodularity - enables both exact minimization and approximate maximization in poly. time. Continuous submodularity is obtained by generalizing the notion of submodularity from discrete domains to continu… ▽ More Continuous submodular functions are a category of generally non-convex/non-concave functions with a wide spectrum of applications. The celebrated property of this class of functions - continuous submodularity - enables both exact minimization and approximate maximization in poly. time. Continuous submodularity is obtained by generalizing the notion of submodularity from discrete domains to continuous domains. It intuitively captures a repulsive effect amongst different dimensions of the defined multivariate function. In this paper, we systematically study continuous submodularity and a class of non-convex optimization problems: continuous submodular function maximization. We start by a thorough characterization of the class of continuous submodular functions, and show that continuous submodularity is equivalent to a weak version of the diminishing returns (DR) property. Thus we also derive a subclass of continuous submodular functions, termed continuous DR-submodular functions, which enjoys the full DR property. Then we present operations that preserve continuous (DR-)submodularity, thus yielding general rules for composing new submodular functions. We establish intriguing properties for the problem of constrained DR-submodular maximization, such as the local-global relation. We identify several applications of continuous submodular optimization, ranging from influence maximization, MAP inference for DPPs to provable mean field inference. For these applications, continuous submodularity formalizes valuable domain knowledge relevant for optimizing this class of objectives. We present inapproximability results and provable algorithms for two problem settings: constrained monotone DR-submodular maximization and constrained non-monotone DR-submodular maximization. Finally, we extensively evaluate the effectiveness of the proposed algorithms. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 64 pages

arXiv:2006.01293 [pdf, other]

From Sets to Multisets: Provable Variational Inference for Probabilistic Integer Submodular Models

Authors: Aytunc Sahin, Yatao Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Submodular functions have been studied extensively in machine learning and data mining. In particular, the optimization of submodular functions over the integer lattice (integer submodular functions) has recently attracted much interest, because this domain relates naturally to many practical problem settings, such as multilabel graph cut, budget allocation and revenue maximization with discrete a… ▽ More Submodular functions have been studied extensively in machine learning and data mining. In particular, the optimization of submodular functions over the integer lattice (integer submodular functions) has recently attracted much interest, because this domain relates naturally to many practical problem settings, such as multilabel graph cut, budget allocation and revenue maximization with discrete assignments. In contrast, the use of these functions for probabilistic modeling has received surprisingly little attention so far. In this work, we firstly propose the Generalized Multilinear Extension, a continuous DR-submodular extension for integer submodular functions. We study central properties of this extension and formulate a new probabilistic model which is defined through integer submodular functions. Then, we introduce a block-coordinate ascent algorithm to perform approximate inference for those class of models. Finally, we demonstrate its effectiveness and viability on several real-world social connection graph datasets with integer submodular objectives. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:1906.06268 [pdf, other]

Variational Federated Multi-Task Learning

Authors: Luca Corinzia, Ami Beuret, Joachim M. Buhmann

Abstract: In federated learning, a central server coordinates the training of a single model on a massively distributed network of devices. This setting can be naturally extended to a multi-task learning framework, to handle real-world federated datasets that typically show strong statistical heterogeneity among devices. Despite federated multi-task learning being shown to be an effective paradigm for real-… ▽ More In federated learning, a central server coordinates the training of a single model on a massively distributed network of devices. This setting can be naturally extended to a multi-task learning framework, to handle real-world federated datasets that typically show strong statistical heterogeneity among devices. Despite federated multi-task learning being shown to be an effective paradigm for real-world datasets, it has been applied only on convex models. In this work, we introduce VIRTUAL, an algorithm for federated multi-task learning for general non-convex models. In VIRTUAL the federated network of the server and the clients is treated as a star-shaped Bayesian network, and learning is performed on the network using approximated variational inference. We show that this method is effective on real-world federated datasets, outperforming the current state-of-the-art for federated learning, and concurrently allowing sparser gradient updates. △ Less

Submitted 4 February, 2021; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1906.03255 [pdf, other]

Disentangled State Space Representations

Authors: Đorđe Miladinović, Muhammad Waleed Gondal, Bernhard Schölkopf, Joachim M. Buhmann, Stefan Bauer

Abstract: Sequential data often originates from diverse domains across which statistical regularities and domain specifics exist. To specifically learn cross-domain sequence representations, we introduce disentangled state space models (DSSM) -- a class of SSM in which domain-invariant state dynamics is explicitly disentangled from domain-specific information governing that dynamics. We analyze how such sep… ▽ More Sequential data often originates from diverse domains across which statistical regularities and domain specifics exist. To specifically learn cross-domain sequence representations, we introduce disentangled state space models (DSSM) -- a class of SSM in which domain-invariant state dynamics is explicitly disentangled from domain-specific information governing that dynamics. We analyze how such separation can improve knowledge transfer to new domains, and enable robust prediction, sequence manipulation and domain characterization. We furthermore propose an unsupervised VAE-based training procedure to implement DSSM in form of Bayesian filters. In our experiments, we applied VAE-DSSM framework to achieve competitive performance in online ODE system identification and regression across experimental settings, and controlled generation and prediction of bouncing ball video sequences across varying gravitational influences. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1902.00981 [pdf, other]

doi 10.1609/aaai.v34i04.6014

Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

Authors: Patrick Schwab, Lorenz Linhardt, Stefan Bauer, Joachim M. Buhmann, Walter Karlen

Abstract: Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings… ▽ More Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response. △ Less

Submitted 10 December, 2020; v1 submitted 3 February, 2019; originally announced February 2019.

Comments: published at AAAI 2020

arXiv:1901.06799 [pdf, other]

Exact Recovery for a Family of Community-Detection Generative Models

Authors: Luca Corinzia, Paolo Penna, Luca Mondada, Joachim M. Buhmann

Abstract: Generative models for networks with communities have been studied extensively for being a fertile ground to establish information-theoretic and computational thresholds. In this paper we propose a new toy model for planted generative models called planted Random Energy Model (REM), inspired by Derrida's REM. For this model we provide the asymptotic behaviour of the probability of error for the max… ▽ More Generative models for networks with communities have been studied extensively for being a fertile ground to establish information-theoretic and computational thresholds. In this paper we propose a new toy model for planted generative models called planted Random Energy Model (REM), inspired by Derrida's REM. For this model we provide the asymptotic behaviour of the probability of error for the maximum likelihood estimator and hence the exact recovery threshold. As an application, we further consider the 2 non-equally sized community Weighted Stochastic Block Model (2-WSBM) on $h$-uniform hypergraphs, that is equivalent to the P-REM on both sides of the spectrum, for high and low edge cardinality $h$. We provide upper and lower bounds for the exact recoverability for any $h$, map** these problems to the aforementioned P-REM. To the best of our knowledge these are the first consistency results for the 2-WSBM on graphs and on hypergraphs with non-equally sized community. △ Less

Submitted 30 April, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

arXiv:1805.07482 [pdf, other]

Optimal DR-Submodular Maximization and Applications to Provable Mean Field Inference

Authors: An Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Mean field inference in probabilistic models is generally a highly nonconvex problem. Existing optimization methods, e.g., coordinate ascent algorithms, can only generate local optima. In this work we propose provable mean filed methods for probabilistic log-submodular models and its posterior agreement (PA) with strong approximation guarantees. The main algorithmic technique is a new Double Gre… ▽ More Mean field inference in probabilistic models is generally a highly nonconvex problem. Existing optimization methods, e.g., coordinate ascent algorithms, can only generate local optima. In this work we propose provable mean filed methods for probabilistic log-submodular models and its posterior agreement (PA) with strong approximation guarantees. The main algorithmic technique is a new Double Greedy scheme, termed DR-DoubleGreedy, for continuous DR-submodular maximization with box-constraints. It is a one-pass algorithm with linear time complexity, reaching the optimal 1/2 approximation ratio, which may be of independent interest. We validate the superior performance of our algorithms against baseline algorithms on both synthetic and real-world datasets. △ Less

Submitted 29 November, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

Comments: 28 pages

arXiv:1804.04378 [pdf, other]

Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs

Authors: Philippe Wenk, Alkis Gotovos, Stefan Bauer, Nico Gorbach, Andreas Krause, Joachim M. Buhmann

Abstract: Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing.… ▽ More Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing. We offer a novel interpretation which leads to a better understanding and improvements in state-of-the-art performance in terms of accuracy for nonlinear dynamical systems. △ Less

Submitted 1 March, 2019; v1 submitted 12 April, 2018; originally announced April 2018.

Comments: accepted at AISTATS 2019

arXiv:1711.02515 [pdf, other]

Continuous DR-submodular Maximization: Structure and Algorithms

Authors: An Bian, Kfir Y. Levy, Andreas Krause, Joachim M. Buhmann

Abstract: DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we… ▽ More DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we study the problem of maximizing non-monotone DR-submodular continuous functions under general down-closed convex constraints. We start by investigating geometric properties that underlie such objectives, e.g., a strong relation between (approximately) stationary points and global optimum is proved. These properties are then used to devise two optimization algorithms with provable guarantees. Concretely, we first devise a "two-phase" algorithm with $1/4$ approximation guarantee. This algorithm allows the use of existing methods for finding (approximately) stationary points as a subroutine, thus, harnessing recent progress in non-convex optimization. Then we present a non-monotone Frank-Wolfe variant with $1/e$ approximation guarantee and sublinear convergence rate. Finally, we extend our approach to a broader class of generalized DR-submodular continuous functions, which captures a wider spectrum of applications. Our theoretical findings are validated on synthetic and real-world problem instances. △ Less

Submitted 24 May, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

Comments: Published in NIPS 2017

arXiv:1703.02100 [pdf, other]

Guarantees for Greedy Maximization of Non-submodular Functions with Applications

Authors: Andrew An Bian, Joachim M. Buhmann, Andreas Krause, Sebastian Tschiatschek

Abstract: We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular… ▽ More We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular functions, e.g., the Bayesian A-optimality objective in experimental design. We prove theoretical guarantees supporting the empirical performance. Our guarantees are characterized by a combination of the (generalized) curvature $α$ and the submodularity ratio $γ$. In particular, we prove that Greedy enjoys a tight approximation guarantee of $\frac{1}α(1- e^{-γα})$ for cardinality constrained maximization. In addition, we bound the submodularity ratio and curvature for several important real-world objectives, including the Bayesian A-optimality objective, the determinantal function of a square submatrix and certain linear programs with combinatorial constraints. We experimentally validate our theoretical findings for both synthetic and real-world applications. △ Less

Submitted 14 May, 2019; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: published at ICML 2017. First author is now known as Yatao Bian <[email protected]>. ORCID: https://orcid.org/0000-0002-2368-4084

arXiv:1611.06652 [pdf, other]

Scalable Adaptive Stochastic Optimization Using Random Projections

Authors: Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen

Abstract: Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain bet… ▽ More Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain better performance, however in high dimensions it is computationally impractical. We present Ada-LR and RadaGrad two computationally efficient approximations to full-matrix AdaGrad based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix AdaGrad but at a much smaller computational cost. We show that the regret of Ada-LR is close to the regret of full-matrix AdaGrad which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task of training convolutional neural networks as well as recurrent neural networks, RadaGrad achieves faster convergence than diagonal AdaGrad. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)

arXiv:1609.00810 [pdf, other]

doi 10.1109/ITW.2015.7133122

Greedy MAXCUT Algorithms and their Information Content

Authors: Yatao Bian, Alexey Gronskiy, Joachim M. Buhmann

Abstract: MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These… ▽ More MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These algorithms have been studied by measuring their approximation ratios in the worst case setting but very little is known to characterize their robustness to noise contaminations of the input data in the average case. Adapting the framework of Approximation Set Coding, we present a method to exactly measure the cardinality of the algorithmic approximation sets of five greedy MAXCUT algorithms. Their information contents are explored for graph instances generated by two different noise models: the edge reversal model and Gaussian edge weights model. The results provide insights into the robustness of different greedy heuristics and techniques for MAXCUT, which can be used for algorithm design of general USM problems. △ Less

Submitted 3 September, 2016; originally announced September 2016.

Comments: This is a longer version of the paper published in 2015 IEEE Information Theory Workshop (ITW)

arXiv:1606.05615 [pdf, other]

Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains

Authors: Andrew An Bian, Baharan Mirzasoleiman, Joachim M. Buhmann, Andreas Krause

Abstract: Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous… ▽ More Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with $(1-1/e)$ approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with $1/3$ approximation guarantee. Submodular continuous functions naturally find applications in various real-world settings, including influence and revenue maximization with continuous assignments, sensor energy management, multi-resolution data summarization, facility location, etc. Experimental results show that the proposed algorithms efficiently generate superior solutions compared to baseline algorithms. △ Less

Submitted 6 May, 2019; v1 submitted 17 June, 2016; originally announced June 2016.

Comments: Appears in the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017

arXiv:1606.00897 [pdf, other]

Multi-Organ Cancer Classification and Survival Analysis

Authors: Stefan Bauer, Nicolas Carion, Peter Schüffler, Thomas Fuchs, Peter Wild, Joachim M. Buhmann

Abstract: Accurate and robust cell nuclei classification is the cornerstone for a wider range of tasks in digital and Computational Pathology. However, most machine learning systems require extensive labeling from expert pathologists for each individual problem at hand, with no or limited abilities for knowledge transfer between datasets and organ sites. In this paper we implement and evaluate a variety of… ▽ More Accurate and robust cell nuclei classification is the cornerstone for a wider range of tasks in digital and Computational Pathology. However, most machine learning systems require extensive labeling from expert pathologists for each individual problem at hand, with no or limited abilities for knowledge transfer between datasets and organ sites. In this paper we implement and evaluate a variety of deep neural network models and model ensembles for nuclei classification in renal cell cancer (RCC) and prostate cancer (PCa). We propose a convolutional neural network system based on residual learning which significantly improves over the state-of-the-art in cell nuclei classification. Finally, we show that the combination of tissue types during training increases not only classification accuracy but also overall survival analysis. △ Less

Submitted 2 December, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

arXiv:1604.06318 [pdf, other]

TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Authors: Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys

Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number… ▽ More In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under- or overfitting. The main reason for these drawbacks is that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines. △ Less

Submitted 22 September, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

Comments: Accepted at CVPR 2016. The first two authors assert equal contribution and joint first authorship

arXiv:1601.00027 [pdf, other]

doi 10.1016/j.compmedimag.2011.02.006

Computational Pathology: Challenges and Promises for Tissue Analysis

Authors: Thomas J. Fuchs, Joachim M. Buhmann

Abstract: The histological assessment of human tissue has emerged as the key challenge for detection and treatment of cancer. A plethora of different data sources ranging from tissue microarray data to gene expression, proteomics or metabolomics data provide a detailed overview of the health status of a patient. Medical doctors need to assess these information sources and they rely on data driven automatic… ▽ More The histological assessment of human tissue has emerged as the key challenge for detection and treatment of cancer. A plethora of different data sources ranging from tissue microarray data to gene expression, proteomics or metabolomics data provide a detailed overview of the health status of a patient. Medical doctors need to assess these information sources and they rely on data driven automatic analysis tools. Methods for classification, grou** and segmentation of heterogeneous data sources as well as regression of noisy dependencies and estimation of survival probabilities enter the processing workflow of a pathology diagnosis system at various stages. This paper reports on state-of-the-art of the design and effectiveness of computational pathology workflows and it discusses future research directions in this emergent field of medical informatics and diagnostic machine learning. △ Less

Submitted 31 December, 2015; originally announced January 2016.

Journal ref: Computerized Medical Imaging and Graphics, vol. 35, 7-8, p. 515-530, 2011

arXiv:1306.5554 [pdf, ps, other]

Correlated random features for fast semi-supervised learning

Authors: Brian McWilliams, David Balduzzi, Joachim M. Buhmann

Abstract: This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It… ▽ More This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It has been shown that, if the views contains accurate estimators, CCA regression can substantially reduce variance with a minimal increase in bias. Random views are justified by recent theoretical and empirical work showing that regression with random features closely approximates kernel regression, implying that random views can be expected to contain accurate estimators. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude. △ Less

Submitted 5 November, 2013; v1 submitted 24 June, 2013; originally announced June 2013.

Comments: 15 pages, 3 figures, 6 tables

arXiv:1212.4775 [pdf, other]

Role Mining with Probabilistic Models

Authors: Mario Frank, Joachim M. Buhmann, David Basin

Abstract: Role mining tackles the problem of finding a role-based access control (RBAC) configuration, given an access-control matrix assigning users to access permissions as input. Most role mining approaches work by constructing a large set of candidate roles and use a greedy selection strategy to iteratively pick a small subset such that the differences between the resulting RBAC configuration and the ac… ▽ More Role mining tackles the problem of finding a role-based access control (RBAC) configuration, given an access-control matrix assigning users to access permissions as input. Most role mining approaches work by constructing a large set of candidate roles and use a greedy selection strategy to iteratively pick a small subset such that the differences between the resulting RBAC configuration and the access control matrix are minimized. In this paper, we advocate an alternative approach that recasts role mining as an inference problem rather than a lossy compression problem. Instead of using combinatorial algorithms to minimize the number of roles needed to represent the access-control matrix, we derive probabilistic models to learn the RBAC configuration that most likely underlies the given matrix. Our models are generative in that they reflect the way that permissions are assigned to users in a given RBAC configuration. We additionally model how user-permission assignments that conflict with an RBAC configuration emerge and we investigate the influence of constraints on role hierarchies and on the number of assignments. In experiments with access-control matrices from real-world enterprises, we compare our proposed models with other role mining methods. Our results show that our probabilistic models infer roles that generalize well to new system users for a wide variety of data, while other models' generalization abilities depend on the dataset given. △ Less

Submitted 4 January, 2013; v1 submitted 19 December, 2012; originally announced December 2012.

Comments: accepted for publication at ACM Transactions on Information and System Security (TISSEC)

arXiv:1205.6210 [pdf, ps, other]

doi 10.1109/LSP.2012.2223757

Learning Dictionaries with Bounded Self-Coherence

Authors: Christian D. Sigg, Tomas Dikk, Joachim M. Buhmann

Abstract: Sparse coding in learned dictionaries has been established as a successful approach for signal denoising, source separation and solving inverse problems in general. A dictionary learning method adapts an initial dictionary to a particular signal class by iteratively computing an approximate factorization of a training data matrix into a dictionary and a sparse coding matrix. The learned dictionary… ▽ More Sparse coding in learned dictionaries has been established as a successful approach for signal denoising, source separation and solving inverse problems in general. A dictionary learning method adapts an initial dictionary to a particular signal class by iteratively computing an approximate factorization of a training data matrix into a dictionary and a sparse coding matrix. The learned dictionary is characterized by two properties: the coherence of the dictionary to observations of the signal class, and the self-coherence of the dictionary atoms. A high coherence to the signal class enables the sparse coding of signal observations with a small approximation error, while a low self-coherence of the atoms guarantees atom recovery and a more rapid residual error decay rate for the sparse coding algorithm. The two goals of high signal coherence and low self-coherence are typically in conflict, therefore one seeks a trade-off between them, depending on the application. We present a dictionary learning method with an effective control over the self-coherence of the trained dictionary, enabling a trade-off between maximizing the sparsity of codings and approximating an equiangular tight frame. △ Less

Submitted 17 October, 2012; v1 submitted 28 May, 2012; originally announced May 2012.

Comments: 4 pages, 2 figures; IEEE Signal Processing Letters, vol. 19, no. 12, 2012

arXiv:1102.3176 [pdf, other]

doi 10.1109/ISIT.2011.6033687

Selecting the rank of truncated SVD by Maximum Approximation Capacity

Authors: Mario Frank, Joachim M. Buhmann

Abstract: Truncated Singular Value Decomposition (SVD) calculates the closest rank-$k$ approximation of a given input matrix. Selecting the appropriate rank $k$ defines a critical model order choice in most applications of SVD. To obtain a principled cut-off criterion for the spectrum, we convert the underlying optimization problem into a noisy channel coding problem. The optimal approximation capacity of t… ▽ More Truncated Singular Value Decomposition (SVD) calculates the closest rank-$k$ approximation of a given input matrix. Selecting the appropriate rank $k$ defines a critical model order choice in most applications of SVD. To obtain a principled cut-off criterion for the spectrum, we convert the underlying optimization problem into a noisy channel coding problem. The optimal approximation capacity of this channel controls the appropriate strength of regularization to suppress noise. In simulation experiments, this information theoretic method to determine the optimal rank competes with state-of-the art model selection techniques. △ Less

Submitted 8 June, 2011; v1 submitted 15 February, 2011; originally announced February 2011.

Comments: 7 pages, 5 figures; Will be presented at the IEEE International Symposium on Information Theory (ISIT) 2011. The conference version has only 5 pages. This version has an extended appendix

Journal ref: Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 2011, pages 1036-1040

arXiv:1006.0375 [pdf, other]

Information theoretic model validation for clustering

Authors: Joachim M. Buhmann

Abstract: Model selection in clustering requires (i) to specify a suitable clustering principle and (ii) to control the model order complexity by choosing an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the set of data partitionings and, thereby, induces uncertainty in the solutio… ▽ More Model selection in clustering requires (i) to specify a suitable clustering principle and (ii) to control the model order complexity by choosing an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the set of data partitionings and, thereby, induces uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of fluctuations in the measurements than alternative models, is considered to be superior provided that the clustering solution is equally informative. This tradeoff between \emph{informativeness} and \emph{robustness} is used as a model selection criterion. The requirement that data partitionings should generalize from one data set to an equally probable second data set gives rise to a new notion of structure induced information. △ Less

Submitted 2 June, 2010; originally announced June 2010.

Comments: 9 pages, 2 figures, International Symposium on Information Theory 2010 (ISIT10 E-Mo-4.2), June 13-18 in Austin, TX}

Showing 1–38 of 38 results for author: Buhmann, J M