Search | arXiv e-print repository

Breeding Programs Optimization with Reinforcement Learning

Authors: Omar G. Younis, Luca Corinzia, Ioannis N. Athanasiadis, Andreas Krause, Joachim M. Buhmann, Matteo Turchetta

Abstract: Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize… ▽ More Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize simulated crop breeding programs. RL agents are trained to make optimal crop selection and cross-breeding decisions based on genetic information. To benchmark RL-based breeding algorithms, we introduce a suite of Gym environments. The study demonstrates the superiority of RL techniques over standard practices in terms of genetic gain when simulated in silico using real-world genomic maize data. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning

arXiv:2406.03394 [pdf, other]

Gaussian Representation for Deformable Image Registration

Authors: Jihe Li, Fabian Zhang, Xia Li, Tianhao Zhang, Ye Zhang, Joachim Buhmann

Abstract: Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3… ▽ More Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3D volumetric medical images, producing a displacement vector field (DVF) across all volumetric positions. The movement of individual voxels is derived using linear blend skinning (LBS) through localized interpolation of transformations associated with neighboring Gaussians. This interpolation strategy not only simplifies the determination of voxel motions but also acts as an effective regularization technique. Our approach incorporates a unified optimization process through backpropagation, enabling iterative learning of both the parameters of the 3D Gaussians and their transformations. Additionally, the density of Gaussians is adjusted adaptively during the learning phase to accommodate varying degrees of motion complexity. We validated our approach on the 4D-CT lung DIR-Lab and cardiac ACDC datasets, achieving an average target registration error (TRE) of 1.06 mm within a much-improved processing time of 2.43 seconds for the DIR-Lab dataset over existing methods, demonstrating significant advancements in both accuracy and efficiency. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.19614 [pdf, other]

TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM

Authors: Peifeng Jiang, Hong Liu, Xia Li, Ti Wang, Fabian Zhang, Joachim M. Buhmann

Abstract: The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results,… ▽ More The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results, which increase the difficulty of 3DGS rendering convergence. Thus, a cutting-edge 3DGS-based SLAM system is introduced, leveraging the efficiency and flexibility of 3DGS to achieve real-time performance while remaining robust against sensor noise, motion blur, and the challenges posed by long-session SLAM. Central to this approach is the Fusion Bridge module, which seamlessly integrates tracking-centered ORB Visual Odometry with map**-centered online 3DGS. Precise pose initialization is enabled by this module through joint optimization of re-projection and rendering loss, as well as strategic view selection, enhancing rendering convergence in large-scale scenes. Extensive experiments demonstrate state-of-the-art rendering quality and localization accuracy, positioning this system as a promising solution for real-world robotics applications that require stable, near-real-time performance. Our project is available at https://ZeldaFromHeaven.github.io/TAMBRIDGE/ △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.15385 [pdf, other]

CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation

Authors: Xia Li, Runzhao Yang, Xiangtai Li, Antony Lomax, Ye Zhang, Joachim Buhmann

Abstract: Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning and, thereby, enhances the capabilities of 3D image analysis. However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality. Frame interpolation emerges… ▽ More Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning and, thereby, enhances the capabilities of 3D image analysis. However, inherent physical and technical constraints of imaging hardware often necessitate a compromise between temporal resolution and image quality. Frame interpolation emerges as a pivotal solution to this challenge. Previous methods often suffer from discretion when they estimate the intermediate motion and execute the forward war**. In this study, we draw inspiration from fluid mechanics to propose a novel approach for continuously modeling patient anatomic motion using implicit neural representation. It ensures both spatial and temporal continuity, effectively bridging Eulerian and Lagrangian specifications together to naturally facilitate continuous frame interpolation. Our experiments across multiple datasets underscore the method's superior accuracy and speed. Furthermore, as a case-specific optimization (training-free) approach, it circumvents the need for extensive datasets and addresses model generalization issues. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14477 [pdf, other]

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencode… ▽ More Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM). △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.00430 [pdf]

Continuous sPatial-Temporal Deformable Image Registration (CPT-DIR) for motion modelling in radiotherapy: beyond classic voxel-based methods

Authors: Xia Li, Muheng Li, Antony Lomax, Joachim Buhmann, Ye Zhang

Abstract: Background and purpose: Deformable image registration (DIR) is a crucial tool in radiotherapy for extracting and modelling organ motion. However, when significant changes and sliding boundaries are present, it faces compromised accuracy and uncertainty, determining the subsequential contour propagation and dose accumulation procedures. Materials and methods: We propose an implicit neural represent… ▽ More Background and purpose: Deformable image registration (DIR) is a crucial tool in radiotherapy for extracting and modelling organ motion. However, when significant changes and sliding boundaries are present, it faces compromised accuracy and uncertainty, determining the subsequential contour propagation and dose accumulation procedures. Materials and methods: We propose an implicit neural representation (INR)-based approach modelling motion continuously in both space and time, named Continues-sPatial-Temporal DIR (CPT-DIR). This method uses a multilayer perception (MLP) network to map 3D coordinate (x,y,z) to its corresponding velocity vector (vx,vy,vz). The displacement vectors (dx,dy,dz) are then calculated by integrating velocity vectors over time. The MLP's parameters can rapidly adapt to new cases without pre-training, enhancing optimisation. The DIR's performance was tested on the DIR-Lab dataset of 10 lung 4DCT cases, using metrics of landmark accuracy (TRE), contour conformity (Dice) and image similarity (MAE). Results: The proposed CPT-DIR can reduce landmark TRE from 2.79mm to 0.99mm, outperforming B-splines' results for all cases. The MAE of the whole-body region improves from 35.46HU to 28.99HU. Furthermore, CPT-DIR surpasses B-splines for accuracy in the sliding boundary region, lowering MAE and increasing Dice coefficients for the ribcage from 65.65HU and 90.41% to 42.04HU and 90.56%, versus 75.40HU and 89.30% without registration. Meanwhile, CPT-DIR offers significant speed advantages, completing in under 15 seconds compared to a few minutes with the conventional B-splines method. Conclusion: Leveraging the continuous representations, the CPT-DIR method significantly enhances registration accuracy, automation and speed, outperforming traditional B-splines in landmark and contour precision, particularly in the challenging areas. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.12352 [pdf, other]

Point-In-Context: Understanding Point Cloud via In-Context Learning

Authors: Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

Abstract: With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context l… ▽ More With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning. We address the technical challenge of effectively extending masked point modeling to 3D point clouds by introducing a Joint Sampling module and proposing a vanilla version of PIC called Point-In-Context-Generalist (PIC-G). PIC-G is designed as a generalist model for various 3D point cloud tasks, with inputs and outputs modeled as coordinates. In this paradigm, the challenging segmentation task is achieved by assigning label points with XYZ coordinates for each category; the final prediction is then chosen based on the label point closest to the predictions. To break the limitation by the fixed label-coordinate assignment, which has poor generalization upon novel classes, we propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S), targeting improving dynamic context labeling and model training. By utilizing dynamic in-context labels and extra in-context pairs, PIC-S achieves enhanced performance and generalization capability in and across part segmentation datasets. PIC is a general framework so that other tasks or datasets can be seamlessly introduced into our PIC through a unified data format. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks and segmenting multi-datasets. Our PIC-S is capable of generalizing unseen datasets and performing novel part segmentation by customizing prompts. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Project page: https://fanglaosi.github.io/Point-In-Context_Pages. arXiv admin note: text overlap with arXiv:2306.08659

arXiv:2402.06974 [pdf, other]

Hypernetwork-Driven Model Fusion for Federated Domain Generalization

Authors: Marc Bartholet, Taehyeon Kim, Ami Beuret, Se-Young Yun, Joachim M. Buhmann

Abstract: Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (h… ▽ More Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (hFedF), using hypernetworks for non-linear aggregation, facilitating generalization to unseen domains. Our method employs client-specific embeddings and gradient alignment techniques to manage domain generalization effectively. Evaluated in both zero-shot and few-shot settings, hFedF demonstrates superior performance in handling domain shifts. Comprehensive comparisons on PACS, Office-Home, and VLCS datasets show that hFedF consistently achieves the highest in-domain and out-of-domain accuracy with reliable predictions. Our study contributes significantly to the under-explored field of Federated Domain Generalization (FDG), setting a new benchmark for performance in this area. △ Less

Submitted 28 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.05568 [pdf, other]

Neural Graphics Primitives-based Deformable Image Registration for On-the-fly Motion Extraction

Authors: Xia Li, Fabian Zhang, Muheng Li, Damien Weber, Antony Lomax, Joachim Buhmann, Ye Zhang

Abstract: Intra-fraction motion in radiotherapy is commonly modeled using deformable image registration (DIR). However, existing methods often struggle to balance speed and accuracy, limiting their applicability in clinical scenarios. This study introduces a novel approach that harnesses Neural Graphics Primitives (NGP) to optimize the displacement vector field (DVF). Our method leverages learned primitives… ▽ More Intra-fraction motion in radiotherapy is commonly modeled using deformable image registration (DIR). However, existing methods often struggle to balance speed and accuracy, limiting their applicability in clinical scenarios. This study introduces a novel approach that harnesses Neural Graphics Primitives (NGP) to optimize the displacement vector field (DVF). Our method leverages learned primitives, processed as splats, and interpolates within space using a shallow neural network. Uniquely, it enables self-supervised optimization at an ultra-fast speed, negating the need for pre-training on extensive datasets and allowing seamless adaptation to new cases. We validated this approach on the 4D-CT lung dataset DIR-lab, achieving a target registration error (TRE) of 1.15\pm1.15 mm within a remarkable time of 1.77 seconds. Notably, our method also addresses the sliding boundary problem, a common challenge in conventional DIR methods. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.07363 [pdf, other]

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits

Authors: Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

Abstract: The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets. So far studies have mainly focused on task-oriented or generic open-domain dialogs, and have not fully explored the ability of LLMs in following complicated prompts. In this work, we focus on personalization, and employ LLMs to curate a dataset which is difficult and costly to crow… ▽ More The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets. So far studies have mainly focused on task-oriented or generic open-domain dialogs, and have not fully explored the ability of LLMs in following complicated prompts. In this work, we focus on personalization, and employ LLMs to curate a dataset which is difficult and costly to crowd-source: PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset, but conditioned on both personas and (Big-5) personality traits. Evaluating models fine-tuned on this dataset, we show that the personality trait labels can be used for trait-based personalization of generative dialogue models. We also perform a head-to-head comparison between PersonalityChat and PersonaChat, and show that training on the distilled dataset results in more fluent and coherent dialog agents in the small-model regime. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: GEM workshop @ EMNLP23

arXiv:2312.14329 [pdf, other]

Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective

Authors: João B. S. Carvalho, Mengtao Zhang, Robin Geyer, Carlos Cotrini, Joachim M. Buhmann

Abstract: Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase… ▽ More Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness. Code is available at: https://github.com/JoaoCarv/invariant-anomaly-detection. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2310.17347 [pdf, other]

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffus… ▽ More While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffusion models that can increase generation diversity, especially at high guidance scales, with minimal loss of sample quality. Our sampling strategy anneals the conditioning signal by adding scheduled, monotonically decreasing Gaussian noise to the conditioning vector during inference to balance diversity and condition alignment. Our Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm, and we show that it boosts the diversity of diffusion models in various conditional generation tasks. Further, using an existing pretrained diffusion model, CADS achieves a new state-of-the-art FID of 1.70 and 2.31 for class-conditional ImageNet generation at 256$\times$256 and 512$\times$512 respectively. △ Less

Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Published as a conference paper at ICLR 2024

Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2308.09189 [pdf, ps, other]

Regularizing Adversarial Imitation Learning Using Causal Invariance

Authors: Ivan Ovinnikov, Joachim M. Buhmann

Abstract: Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbi… ▽ More Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbing spurious correlations present in the expert data. To alleviate this issue, we propose to use causal invariance as a regularization principle for adversarial training of these models. The regularization objective is applicable in a straightforward manner to existing adversarial imitation frameworks. We demonstrate the efficacy of the regularized formulation in an illustrative two-dimensional setting as well as a number of high-dimensional robot locomotion benchmark tasks. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Published at the ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability

arXiv:2306.09035 [pdf, other]

Improving Explainability of Disentangled Representations using Multipath-Attribution Map**s

Authors: Lukas Klein, João B. S. Carvalho, Mennatallah El-Assady, Paolo Penna, Joachim M. Buhmann, Paul F. Jaeger

Abstract: Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based clinical diagnostics, it is necessary to integrate explainable AI into these safety-critical systems. Current explanatory methods typically assign attribution scores… ▽ More Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based clinical diagnostics, it is necessary to integrate explainable AI into these safety-critical systems. Current explanatory methods typically assign attribution scores to pixel regions in the input image, indicating their importance for a model's decision. However, they fall short when explaining why a visual feature is used. We propose a framework that utilizes interpretable disentangled representations for downstream-task prediction. Through visualizing the disentangled representations, we enable experts to investigate possible causation effects by leveraging their domain knowledge. Additionally, we deploy a multi-path attribution map** for enriching and validating explanations. We demonstrate the effectiveness of our approach on a synthetic benchmark suite and two medical datasets. We show that the framework not only acts as a catalyst for causal relation extraction but also enhances model robustness by enabling shortcut detection without the need for testing under distribution shifts. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Journal ref: Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:689-712, 2022

arXiv:2306.08659 [pdf, other]

Explore In-Context Learning for 3D Point Cloud Understanding

Authors: Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu

Abstract: With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, di… ▽ More With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. △ Less

Submitted 27 December, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: Project page: https://github.com/fanglaosi/Point-In-Context

arXiv:2210.03649 [pdf, other]

How to Enable Uncertainty Estimation in Proximal Policy Optimization

Authors: Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim, Joachim M. Buhmann

Abstract: While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ens… ▽ More While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents. △ Less

Submitted 7 October, 2022; originally announced October 2022.

ACM Class: I.2; I.2.6; I.2.8; I.2.9; I.2.10

arXiv:2209.12590 [pdf, other]

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Authors: Đorđe Miladinović, Kumar Shridhar, Kushal Jain, Max B. Paulus, Joachim M. Buhmann, Mrinmaya Sachan, Carl Allen

Abstract: In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the pow… ▽ More In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the powerful decoder by applying uniformly random dropout to the decoder input. We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space. △ Less

Submitted 16 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.05185 [pdf, other]

Open-Domain Dialog Evaluation using Follow-Ups Likelihood

Authors: Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

Abstract: Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When… ▽ More Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: Accepted at COLING 2022

arXiv:2208.10481 [pdf, other]

BARReL: Bottleneck Attention for Adversarial Robustness in Vision-Based Reinforcement Learning

Authors: Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim, Joachim M. Buhmann

Abstract: Robustness to adversarial perturbations has been explored in many areas of computer vision. This robustness is particularly relevant in vision-based reinforcement learning, as the actions of autonomous agents might be safety-critic or impactful in the real world. We investigate the susceptibility of vision-based reinforcement learning agents to gradient-based adversarial attacks and evaluate a pot… ▽ More Robustness to adversarial perturbations has been explored in many areas of computer vision. This robustness is particularly relevant in vision-based reinforcement learning, as the actions of autonomous agents might be safety-critic or impactful in the real world. We investigate the susceptibility of vision-based reinforcement learning agents to gradient-based adversarial attacks and evaluate a potential defense. We observe that Bottleneck Attention Modules (BAM) included in CNN architectures can act as potential tools to increase robustness against adversarial attacks. We show how learned attention maps can be used to recover activations of a convolutional layer by restricting the spatial activations to salient regions. Across a number of RL environments, BAM-enhanced architectures show increased robustness during inference. Finally, we discuss potential future research directions. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 5 pages, 2 figures, 3 tables

ACM Class: I.2.6; I.2.8; I.2.9; I.2.10; I.5.4

arXiv:2206.12444 [pdf, other]

Gated Domain Units for Multi-source Domain Generalization

Authors: Simon Föll, Alina Dubatovka, Eugen Ernst, Siu Lun Chau, Martin Maritsch, Patrik Okanovic, Gudrun Thäter, Joachim M. Buhmann, Felix Wortmann, Krikamol Muandet

Abstract: The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary… ▽ More The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary Distributions (I.E.D) across different domains. This assumption implies an invariant structure in the solution space that enables knowledge transfer to unseen domains. To exploit this property for domain generalization, we introduce a modular neural network layer consisting of Gated Domain Units (GDUs) that learn a representation for each latent elementary distribution. During inference, a weighted ensemble of learning machines can be created by comparing new observations with the representations of each elementary distribution. Our flexible framework also accommodates scenarios where explicit domain information is not present. Extensive experiments on image, text, and graph data show consistent performance improvement on out-of-training target domains. These findings support the practicality of the I.E.D assumption and the effectiveness of GDUs for domain generalisation. △ Less

Submitted 16 May, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

arXiv:2110.02067 [pdf, other]

Teach Me What to Say and I Will Learn What to Pick: Unsupervised Knowledge Selection Through Response Generation with Pretrained Generative Models

Authors: Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

Abstract: Knowledge Grounded Conversation Models (KGCM) are usually based on a selection/retrieval module and a generation module, trained separately or simultaneously, with or without having access to a gold knowledge option. With the introduction of large pre-trained generative models, the selection and generation part have become more and more entangled, shifting the focus towards enhancing knowledge inc… ▽ More Knowledge Grounded Conversation Models (KGCM) are usually based on a selection/retrieval module and a generation module, trained separately or simultaneously, with or without having access to a gold knowledge option. With the introduction of large pre-trained generative models, the selection and generation part have become more and more entangled, shifting the focus towards enhancing knowledge incorporation (from multiple sources) instead of trying to pick the best knowledge option. These approaches however depend on knowledge labels and/or a separate dense retriever for their best performance. In this work we study the unsupervised selection abilities of pre-trained generative models (e.g. BART) and show that by adding a score-and-aggregate module between encoder and decoder, they are capable of learning to pick the proper knowledge through minimising the language modelling loss (i.e. without having access to knowledge labels). Trained as such, our model - K-Mine - shows competitive selection and generation performance against models that benefit from knowledge labels and/or separate dense retriever. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted at ConvAI workshop (EMNLP 2021)

arXiv:2109.12870 [pdf, other]

MFAQ: a Multilingual FAQ Dataset

Authors: Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

Abstract: In this paper, we present the first multilingual FAQ dataset publicly available. We collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics. We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi… ▽ More In this paper, we present the first multilingual FAQ dataset publicly available. We collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics. We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset. Our experiments reveal that a multilingual model based on XLM-RoBERTa achieves the best results, except for English. Lower resources languages seem to learn from one another as a multilingual model achieves a higher MRR than language-specific ones. Our qualitative analysis reveals the brittleness of the model on simple word changes. We publicly release our dataset, model and training script. △ Less

Submitted 5 October, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: Accepted at MRQA workshop (EMNLP 2021)

arXiv:2108.00719 [pdf, other]

ConveRT for FAQ Answering

Authors: Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

Abstract: Knowledgeable FAQ chatbots are a valuable resource to any organization. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages for which the same amount of training data is not available. In this paper, we propose a novel pre-training procedure to adapt ConveRT, an English conversational retriever model, to other languages with less trainin… ▽ More Knowledgeable FAQ chatbots are a valuable resource to any organization. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages for which the same amount of training data is not available. In this paper, we propose a novel pre-training procedure to adapt ConveRT, an English conversational retriever model, to other languages with less training data available. We apply it for the first time to the task of Dutch FAQ answering related to the COVID-19 vaccine. We show it performs better than an open-source alternative in both a low-data regime and a high-data regime. △ Less

Submitted 14 October, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted at bnaicbenelearn2021

arXiv:2103.11713 [pdf, other]

Spatially Dependent U-Nets: Highly Accurate Architectures for Medical Imaging Segmentation

Authors: João B. S. Carvalho, João A. Santinha, Đorđe Miladinović, Joachim M. Buhmann

Abstract: In clinical practice, regions of interest in medical imaging often need to be identified through a process of precise image segmentation. The quality of this image segmentation step critically affects the subsequent clinical assessment of the patient status. To enable high accuracy, automatic image segmentation, we introduce a novel deep neural network architecture that exploits the inherent spati… ▽ More In clinical practice, regions of interest in medical imaging often need to be identified through a process of precise image segmentation. The quality of this image segmentation step critically affects the subsequent clinical assessment of the patient status. To enable high accuracy, automatic image segmentation, we introduce a novel deep neural network architecture that exploits the inherent spatial coherence of anatomical structures and is well equipped to capture long-range spatial dependencies in the segmented pixel/voxel space. In contrast to the state-of-the-art solutions based on convolutional layers, our approach leverages on recently introduced spatial dependency layers that have an unbounded receptive field and explicitly model the inductive bias of spatial coherence. Our method performs favourably to commonly used U-Net and U-Net++ architectures as demonstrated by improved Dice and Jaccardscore in three different medical segmentation tasks: nuclei segmentation in microscopy images, polyp segmentation in colonoscopy videos, and liver segmentation in abdominal CT scans. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2103.08877 [pdf, other]

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling

Authors: Đorđe Miladinović, Aleksandar Stanić, Stefan Bauer, Jürgen Schmidhuber, Joachim M. Buhmann

Abstract: How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechani… ▽ More How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings. The accompanying source code is given at https://github.com/djordjemila/sdn. △ Less

Submitted 16 March, 2021; originally announced March 2021.

Journal ref: International Conference on Learning Representations (2021);

arXiv:2101.09994 [pdf, ps, other]

On maximum-likelihood estimation in the all-or-nothing regime

Authors: Luca Corinzia, Paolo Penna, Wojciech Szpankowski, Joachim M. Buhmann

Abstract: We study the problem of estimating a rank-1 additive deformation of a Gaussian tensor according to the \emph{maximum-likelihood estimator} (MLE). The analysis is carried out in the sparse setting, where the underlying signal has a support that scales sublinearly with the total number of dimensions. We show that for Bernoulli distributed signals, the MLE undergoes an \emph{all-or-nothing} (AoN) pha… ▽ More We study the problem of estimating a rank-1 additive deformation of a Gaussian tensor according to the \emph{maximum-likelihood estimator} (MLE). The analysis is carried out in the sparse setting, where the underlying signal has a support that scales sublinearly with the total number of dimensions. We show that for Bernoulli distributed signals, the MLE undergoes an \emph{all-or-nothing} (AoN) phase transition, already established for the minimum mean-square-error estimator (MMSE) in the same problem. The result follows from two main technical points: (i) the connection established between the MLE and the MMSE, using the first and second-moment methods in the constrained signal space, (ii) a recovery regime for the MMSE stricter than the simple error vanishing characterization given in the standard AoN, that is here proved as a general result. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2011.11500 [pdf, other]

Statistical and computational thresholds for the planted $k$-densest sub-hypergraph problem

Authors: Luca Corinzia, Paolo Penna, Wojciech Szpankowski, Joachim M. Buhmann

Abstract: In this work, we consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight \emph{information-theoretic} upper and lower bounds for the exact recovery threshol… ▽ More In this work, we consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight \emph{information-theoretic} upper and lower bounds for the exact recovery threshold by the maximum-likelihood estimator, as well as \emph{algorithmic} bounds based on approximate message passing algorithms. The problem exhibits a typical statistical-to-computational gap observed in analogous sparse settings that widen with increasing sparsity of the problem. The bounds show that the signal structure impacts the location of the statistical and computational phase transition that the known existing bounds for the tensor-PCA model do not capture. This effect is due to the generic planted signal prior that this latter model addresses. △ Less

Submitted 28 January, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

arXiv:2009.08371 [pdf, other]

Microtubule Tracking in Electron Microscopy Volumes

Authors: Nils Eckstein, Julia Buhmann, Matthew Cook, Jan Funke

Abstract: We present a method for microtubule tracking in electron microscopy volumes. Our method first identifies a sparse set of voxels that likely belong to microtubules. Similar to prior work, we then enumerate potential edges between these voxels, which we represent in a candidate graph. Tracks of microtubules are found by selecting nodes and edges in the candidate graph by solving a constrained optimi… ▽ More We present a method for microtubule tracking in electron microscopy volumes. Our method first identifies a sparse set of voxels that likely belong to microtubules. Similar to prior work, we then enumerate potential edges between these voxels, which we represent in a candidate graph. Tracks of microtubules are found by selecting nodes and edges in the candidate graph by solving a constrained optimization problem incorporating biological priors on microtubule structure. For this, we present a novel integer linear programming formulation, which results in speed-ups of three orders of magnitude and an increase of 53% in accuracy compared to prior art (evaluated on three 1.2 x 4 x 4$μ$m volumes of Drosophila neural tissue). We also propose a scheme to solve the optimization problem in a block-wise fashion, which allows distributed tracking and is necessary to process very large electron microscopy volumes. Finally, we release a benchmark dataset for microtubule tracking, here used for training, testing and validation, consisting of eight 30 x 1000 x 1000 voxel blocks (1.2 x 4 x 4$μ$m) of densely annotated microtubules in the CREMI data set (https://github.com/nilsec/micron). △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted at MICCAI 2020

arXiv:2008.05867 [pdf, other]

Neural collaborative filtering for unsupervised mitral valve segmentation in echocardiography

Authors: Luca Corinzia, Fabian Laumer, Alessandro Candreva, Maurizio Taramasso, Francesco Maisano, Joachim M. Buhmann

Abstract: The segmentation of the mitral valve annulus and leaflets specifies a crucial first step to establish a machine learning pipeline that can support physicians in performing multiple tasks, e.g.\ diagnosis of mitral valve diseases, surgical planning, and intraoperative procedures. Current methods for mitral valve segmentation on 2D echocardiography videos require extensive interaction with annotator… ▽ More The segmentation of the mitral valve annulus and leaflets specifies a crucial first step to establish a machine learning pipeline that can support physicians in performing multiple tasks, e.g.\ diagnosis of mitral valve diseases, surgical planning, and intraoperative procedures. Current methods for mitral valve segmentation on 2D echocardiography videos require extensive interaction with annotators and perform poorly on low-quality and noisy videos. We propose an automated and unsupervised method for the mitral valve segmentation based on a low dimensional embedding of the echocardiography videos using neural network collaborative filtering. The method is evaluated in a collection of echocardiography videos of patients with a variety of mitral valve diseases, and additionally on an independent test cohort. It outperforms state-of-the-art \emph{unsupervised} and \emph{supervised} methods on low-quality videos or in the case of sparse annotation. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2006.13474 [pdf, other]

Continuous Submodular Function Maximization

Authors: Yatao Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Continuous submodular functions are a category of generally non-convex/non-concave functions with a wide spectrum of applications. The celebrated property of this class of functions - continuous submodularity - enables both exact minimization and approximate maximization in poly. time. Continuous submodularity is obtained by generalizing the notion of submodularity from discrete domains to continu… ▽ More Continuous submodular functions are a category of generally non-convex/non-concave functions with a wide spectrum of applications. The celebrated property of this class of functions - continuous submodularity - enables both exact minimization and approximate maximization in poly. time. Continuous submodularity is obtained by generalizing the notion of submodularity from discrete domains to continuous domains. It intuitively captures a repulsive effect amongst different dimensions of the defined multivariate function. In this paper, we systematically study continuous submodularity and a class of non-convex optimization problems: continuous submodular function maximization. We start by a thorough characterization of the class of continuous submodular functions, and show that continuous submodularity is equivalent to a weak version of the diminishing returns (DR) property. Thus we also derive a subclass of continuous submodular functions, termed continuous DR-submodular functions, which enjoys the full DR property. Then we present operations that preserve continuous (DR-)submodularity, thus yielding general rules for composing new submodular functions. We establish intriguing properties for the problem of constrained DR-submodular maximization, such as the local-global relation. We identify several applications of continuous submodular optimization, ranging from influence maximization, MAP inference for DPPs to provable mean field inference. For these applications, continuous submodularity formalizes valuable domain knowledge relevant for optimizing this class of objectives. We present inapproximability results and provable algorithms for two problem settings: constrained monotone DR-submodular maximization and constrained non-monotone DR-submodular maximization. Finally, we extensively evaluate the effectiveness of the proposed algorithms. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 64 pages

arXiv:2006.01293 [pdf, other]

From Sets to Multisets: Provable Variational Inference for Probabilistic Integer Submodular Models

Authors: Aytunc Sahin, Yatao Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Submodular functions have been studied extensively in machine learning and data mining. In particular, the optimization of submodular functions over the integer lattice (integer submodular functions) has recently attracted much interest, because this domain relates naturally to many practical problem settings, such as multilabel graph cut, budget allocation and revenue maximization with discrete a… ▽ More Submodular functions have been studied extensively in machine learning and data mining. In particular, the optimization of submodular functions over the integer lattice (integer submodular functions) has recently attracted much interest, because this domain relates naturally to many practical problem settings, such as multilabel graph cut, budget allocation and revenue maximization with discrete assignments. In contrast, the use of these functions for probabilistic modeling has received surprisingly little attention so far. In this work, we firstly propose the Generalized Multilinear Extension, a continuous DR-submodular extension for integer submodular functions. We study central properties of this extension and formulate a new probabilistic model which is defined through integer submodular functions. Then, we introduce a block-coordinate ascent algorithm to perform approximate inference for those class of models. Finally, we demonstrate its effectiveness and viability on several real-world social connection graph datasets with integer submodular objectives. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:2003.09820 [pdf, other]

Rig-space Neural Rendering

Authors: Dominik Borer, Lu Yuhang, Laura Wuelfroth, Jakob Buhmann, Martin Guay

Abstract: Movie productions use high resolution 3d characters with complex proprietary rigs to create the highest quality images possible for large displays. Unfortunately, these 3d assets are typically not compatible with real-time graphics engines used for games, mixed reality and real-time pre-visualization. Consequently, the 3d characters need to be re-modeled and re-rigged for these new applications, r… ▽ More Movie productions use high resolution 3d characters with complex proprietary rigs to create the highest quality images possible for large displays. Unfortunately, these 3d assets are typically not compatible with real-time graphics engines used for games, mixed reality and real-time pre-visualization. Consequently, the 3d characters need to be re-modeled and re-rigged for these new applications, requiring weeks of work and artistic approval. Our solution to this problem is to learn a compact image-based rendering of the original 3d character, conditioned directly on the rig parameters. Our idea is to render the character in many different poses and views, and to train a deep neural network to render high resolution images, from the rig parameters directly. Many neural rendering techniques have been proposed to render from 2d skeletons, or geometry and UV maps. However these require manual work, and to do not remain compatible with the animator workflow of manipulating rig widgets, as well as the real-time game engine pipeline of interpolating rig parameters. We extend our architecture to support dynamic re-lighting and composition with other 3d objects in the scene. We designed a network that efficiently generates multiple scene feature maps such as normals, depth, albedo and mask, which are composed with other scene objects to form the final image. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:1906.06268 [pdf, other]

Variational Federated Multi-Task Learning

Authors: Luca Corinzia, Ami Beuret, Joachim M. Buhmann

Abstract: In federated learning, a central server coordinates the training of a single model on a massively distributed network of devices. This setting can be naturally extended to a multi-task learning framework, to handle real-world federated datasets that typically show strong statistical heterogeneity among devices. Despite federated multi-task learning being shown to be an effective paradigm for real-… ▽ More In federated learning, a central server coordinates the training of a single model on a massively distributed network of devices. This setting can be naturally extended to a multi-task learning framework, to handle real-world federated datasets that typically show strong statistical heterogeneity among devices. Despite federated multi-task learning being shown to be an effective paradigm for real-world datasets, it has been applied only on convex models. In this work, we introduce VIRTUAL, an algorithm for federated multi-task learning for general non-convex models. In VIRTUAL the federated network of the server and the clients is treated as a star-shaped Bayesian network, and learning is performed on the network using approximated variational inference. We show that this method is effective on real-world federated datasets, outperforming the current state-of-the-art for federated learning, and concurrently allowing sparser gradient updates. △ Less

Submitted 4 February, 2021; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1906.03255 [pdf, other]

Disentangled State Space Representations

Authors: Đorđe Miladinović, Muhammad Waleed Gondal, Bernhard Schölkopf, Joachim M. Buhmann, Stefan Bauer

Abstract: Sequential data often originates from diverse domains across which statistical regularities and domain specifics exist. To specifically learn cross-domain sequence representations, we introduce disentangled state space models (DSSM) -- a class of SSM in which domain-invariant state dynamics is explicitly disentangled from domain-specific information governing that dynamics. We analyze how such sep… ▽ More Sequential data often originates from diverse domains across which statistical regularities and domain specifics exist. To specifically learn cross-domain sequence representations, we introduce disentangled state space models (DSSM) -- a class of SSM in which domain-invariant state dynamics is explicitly disentangled from domain-specific information governing that dynamics. We analyze how such separation can improve knowledge transfer to new domains, and enable robust prediction, sequence manipulation and domain characterization. We furthermore propose an unsupervised VAE-based training procedure to implement DSSM in form of Bayesian filters. In our experiments, we applied VAE-DSSM framework to achieve competitive performance in online ODE system identification and regression across experimental settings, and controlled generation and prediction of bouncing ball video sequences across varying gravitational influences. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1904.03266 [pdf, other]

Domain Authoring Assistant for Intelligent Virtual Agents

Authors: Sepehr Janghorbani, Ashutosh Modi, Jakob Buhmann, Mubbasir Kapadia

Abstract: Develo** intelligent virtual characters has attracted a lot of attention in the recent years. The process of creating such characters often involves a team of creative authors who describe different aspects of the characters in natural language, and planning experts that translate this description into a planning domain. This can be quite challenging as the team of creative authors should dilige… ▽ More Develo** intelligent virtual characters has attracted a lot of attention in the recent years. The process of creating such characters often involves a team of creative authors who describe different aspects of the characters in natural language, and planning experts that translate this description into a planning domain. This can be quite challenging as the team of creative authors should diligently define every aspect of the character especially if it contains complex human-like behavior. Also a team of engineers has to manually translate the natural language description of a character's personality into the planning domain knowledge. This can be extremely time and resource demanding and can be an obstacle to author's creativity. The goal of this paper is to introduce an authoring assistant tool to automate the process of domain generation from natural language description of virtual characters, thus bridging between the creative authoring team and the planning domain experts. Moreover, the proposed tool also identifies possible missing information in the domain description and iteratively makes suggestions to the author. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: 8+1 pages, Accepted at 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019)

arXiv:1902.00981 [pdf, other]

doi 10.1609/aaai.v34i04.6014

Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

Authors: Patrick Schwab, Lorenz Linhardt, Stefan Bauer, Joachim M. Buhmann, Walter Karlen

Abstract: Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings… ▽ More Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response. △ Less

Submitted 10 December, 2020; v1 submitted 3 February, 2019; originally announced February 2019.

Comments: published at AAAI 2020

arXiv:1901.06799 [pdf, other]

Exact Recovery for a Family of Community-Detection Generative Models

Authors: Luca Corinzia, Paolo Penna, Luca Mondada, Joachim M. Buhmann

Abstract: Generative models for networks with communities have been studied extensively for being a fertile ground to establish information-theoretic and computational thresholds. In this paper we propose a new toy model for planted generative models called planted Random Energy Model (REM), inspired by Derrida's REM. For this model we provide the asymptotic behaviour of the probability of error for the max… ▽ More Generative models for networks with communities have been studied extensively for being a fertile ground to establish information-theoretic and computational thresholds. In this paper we propose a new toy model for planted generative models called planted Random Energy Model (REM), inspired by Derrida's REM. For this model we provide the asymptotic behaviour of the probability of error for the maximum likelihood estimator and hence the exact recovery threshold. As an application, we further consider the 2 non-equally sized community Weighted Stochastic Block Model (2-WSBM) on $h$-uniform hypergraphs, that is equivalent to the P-REM on both sides of the spectrum, for high and low edge cardinality $h$. We provide upper and lower bounds for the exact recoverability for any $h$, map** these problems to the aforementioned P-REM. To the best of our knowledge these are the first consistency results for the 2-WSBM on graphs and on hypergraphs with non-equally sized community. △ Less

Submitted 30 April, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

arXiv:1806.08205 [pdf, other]

Synaptic partner prediction from point annotations in insect brains

Authors: Julia Buhmann, Renate Krause, Rodrigo Ceballos Lentini, Nils Eckstein, Matthew Cook, Srinivas Turaga, Jan Funke

Abstract: High-throughput electron microscopy allows recording of lar- ge stacks of neural tissue with sufficient resolution to extract the wiring diagram of the underlying neural network. Current efforts to automate this process focus mainly on the segmentation of neurons. However, in order to recover a wiring diagram, synaptic partners need to be identi- fied as well. This is especially challenging in ins… ▽ More High-throughput electron microscopy allows recording of lar- ge stacks of neural tissue with sufficient resolution to extract the wiring diagram of the underlying neural network. Current efforts to automate this process focus mainly on the segmentation of neurons. However, in order to recover a wiring diagram, synaptic partners need to be identi- fied as well. This is especially challenging in insect brains like Drosophila melanogaster, where one presynaptic site is associated with multiple post- synaptic elements. Here we propose a 3D U-Net architecture to directly identify pairs of voxels that are pre- and postsynaptic to each other. To that end, we formulate the problem of synaptic partner identification as a classification problem on long-range edges between voxels to encode both the presence of a synaptic pair and its direction. This formulation allows us to directly learn from synaptic point annotations instead of more ex- pensive voxel-based synaptic cleft or vesicle annotations. We evaluate our method on the MICCAI 2016 CREMI challenge and improve over the current state of the art, producing 3% fewer errors than the next best method. △ Less

Submitted 16 July, 2018; v1 submitted 21 June, 2018; originally announced June 2018.

arXiv:1805.07482 [pdf, other]

Optimal DR-Submodular Maximization and Applications to Provable Mean Field Inference

Authors: An Bian, Joachim M. Buhmann, Andreas Krause

Abstract: Mean field inference in probabilistic models is generally a highly nonconvex problem. Existing optimization methods, e.g., coordinate ascent algorithms, can only generate local optima. In this work we propose provable mean filed methods for probabilistic log-submodular models and its posterior agreement (PA) with strong approximation guarantees. The main algorithmic technique is a new Double Gre… ▽ More Mean field inference in probabilistic models is generally a highly nonconvex problem. Existing optimization methods, e.g., coordinate ascent algorithms, can only generate local optima. In this work we propose provable mean filed methods for probabilistic log-submodular models and its posterior agreement (PA) with strong approximation guarantees. The main algorithmic technique is a new Double Greedy scheme, termed DR-DoubleGreedy, for continuous DR-submodular maximization with box-constraints. It is a one-pass algorithm with linear time complexity, reaching the optimal 1/2 approximation ratio, which may be of independent interest. We validate the superior performance of our algorithms against baseline algorithms on both synthetic and real-world datasets. △ Less

Submitted 29 November, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

Comments: 28 pages

arXiv:1804.04378 [pdf, other]

Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs

Authors: Philippe Wenk, Alkis Gotovos, Stefan Bauer, Nico Gorbach, Andreas Krause, Joachim M. Buhmann

Abstract: Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing.… ▽ More Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing. We offer a novel interpretation which leads to a better understanding and improvements in state-of-the-art performance in terms of accuracy for nonlinear dynamical systems. △ Less

Submitted 1 March, 2019; v1 submitted 12 April, 2018; originally announced April 2018.

Comments: accepted at AISTATS 2019

arXiv:1711.02515 [pdf, other]

Continuous DR-submodular Maximization: Structure and Algorithms

Authors: An Bian, Kfir Y. Levy, Andreas Krause, Joachim M. Buhmann

Abstract: DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we… ▽ More DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we study the problem of maximizing non-monotone DR-submodular continuous functions under general down-closed convex constraints. We start by investigating geometric properties that underlie such objectives, e.g., a strong relation between (approximately) stationary points and global optimum is proved. These properties are then used to devise two optimization algorithms with provable guarantees. Concretely, we first devise a "two-phase" algorithm with $1/4$ approximation guarantee. This algorithm allows the use of existing methods for finding (approximately) stationary points as a subroutine, thus, harnessing recent progress in non-convex optimization. Then we present a non-monotone Frank-Wolfe variant with $1/e$ approximation guarantee and sublinear convergence rate. Finally, we extend our approach to a broader class of generalized DR-submodular continuous functions, which captures a wider spectrum of applications. Our theoretical findings are validated on synthetic and real-world problem instances. △ Less

Submitted 24 May, 2019; v1 submitted 3 November, 2017; originally announced November 2017.

Comments: Published in NIPS 2017

arXiv:1703.02100 [pdf, other]

Guarantees for Greedy Maximization of Non-submodular Functions with Applications

Authors: Andrew An Bian, Joachim M. Buhmann, Andreas Krause, Sebastian Tschiatschek

Abstract: We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular… ▽ More We investigate the performance of the standard Greedy algorithm for cardinality constrained maximization of non-submodular nondecreasing set functions. While there are strong theoretical guarantees on the performance of Greedy for maximizing submodular functions, there are few guarantees for non-submodular ones. However, Greedy enjoys strong empirical performance for many important non-submodular functions, e.g., the Bayesian A-optimality objective in experimental design. We prove theoretical guarantees supporting the empirical performance. Our guarantees are characterized by a combination of the (generalized) curvature $α$ and the submodularity ratio $γ$. In particular, we prove that Greedy enjoys a tight approximation guarantee of $\frac{1}α(1- e^{-γα})$ for cardinality constrained maximization. In addition, we bound the submodularity ratio and curvature for several important real-world objectives, including the Bayesian A-optimality objective, the determinantal function of a square submatrix and certain linear programs with combinatorial constraints. We experimentally validate our theoretical findings for both synthetic and real-world applications. △ Less

Submitted 14 May, 2019; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: published at ICML 2017. First author is now known as Yatao Bian <[email protected]>. ORCID: https://orcid.org/0000-0002-2368-4084

arXiv:1611.06652 [pdf, other]

Scalable Adaptive Stochastic Optimization Using Random Projections

Authors: Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen

Abstract: Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain bet… ▽ More Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain better performance, however in high dimensions it is computationally impractical. We present Ada-LR and RadaGrad two computationally efficient approximations to full-matrix AdaGrad based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix AdaGrad but at a much smaller computational cost. We show that the regret of Ada-LR is close to the regret of full-matrix AdaGrad which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task of training convolutional neural networks as well as recurrent neural networks, RadaGrad achieves faster convergence than diagonal AdaGrad. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)

arXiv:1609.00810 [pdf, other]

doi 10.1109/ITW.2015.7133122

Greedy MAXCUT Algorithms and their Information Content

Authors: Yatao Bian, Alexey Gronskiy, Joachim M. Buhmann

Abstract: MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These… ▽ More MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These algorithms have been studied by measuring their approximation ratios in the worst case setting but very little is known to characterize their robustness to noise contaminations of the input data in the average case. Adapting the framework of Approximation Set Coding, we present a method to exactly measure the cardinality of the algorithmic approximation sets of five greedy MAXCUT algorithms. Their information contents are explored for graph instances generated by two different noise models: the edge reversal model and Gaussian edge weights model. The results provide insights into the robustness of different greedy heuristics and techniques for MAXCUT, which can be used for algorithm design of general USM problems. △ Less

Submitted 3 September, 2016; originally announced September 2016.

Comments: This is a longer version of the paper published in 2015 IEEE Information Theory Workshop (ITW)

arXiv:1606.05615 [pdf, other]

Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains

Authors: Andrew An Bian, Baharan Mirzasoleiman, Joachim M. Buhmann, Andreas Krause

Abstract: Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous… ▽ More Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with $(1-1/e)$ approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with $1/3$ approximation guarantee. Submodular continuous functions naturally find applications in various real-world settings, including influence and revenue maximization with continuous assignments, sensor energy management, multi-resolution data summarization, facility location, etc. Experimental results show that the proposed algorithms efficiently generate superior solutions compared to baseline algorithms. △ Less

Submitted 6 May, 2019; v1 submitted 17 June, 2016; originally announced June 2016.

Comments: Appears in the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017

arXiv:1606.00897 [pdf, other]

Multi-Organ Cancer Classification and Survival Analysis

Authors: Stefan Bauer, Nicolas Carion, Peter Schüffler, Thomas Fuchs, Peter Wild, Joachim M. Buhmann

Abstract: Accurate and robust cell nuclei classification is the cornerstone for a wider range of tasks in digital and Computational Pathology. However, most machine learning systems require extensive labeling from expert pathologists for each individual problem at hand, with no or limited abilities for knowledge transfer between datasets and organ sites. In this paper we implement and evaluate a variety of… ▽ More Accurate and robust cell nuclei classification is the cornerstone for a wider range of tasks in digital and Computational Pathology. However, most machine learning systems require extensive labeling from expert pathologists for each individual problem at hand, with no or limited abilities for knowledge transfer between datasets and organ sites. In this paper we implement and evaluate a variety of deep neural network models and model ensembles for nuclei classification in renal cell cancer (RCC) and prostate cancer (PCa). We propose a convolutional neural network system based on residual learning which significantly improves over the state-of-the-art in cell nuclei classification. Finally, we show that the combination of tissue types during training increases not only classification accuracy but also overall survival analysis. △ Less

Submitted 2 December, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

arXiv:1604.06318 [pdf, other]

TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Authors: Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys

Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number… ▽ More In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under- or overfitting. The main reason for these drawbacks is that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines. △ Less

Submitted 22 September, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

Comments: Accepted at CVPR 2016. The first two authors assert equal contribution and joint first authorship

arXiv:1601.00027 [pdf, other]

doi 10.1016/j.compmedimag.2011.02.006

Computational Pathology: Challenges and Promises for Tissue Analysis

Authors: Thomas J. Fuchs, Joachim M. Buhmann

Abstract: The histological assessment of human tissue has emerged as the key challenge for detection and treatment of cancer. A plethora of different data sources ranging from tissue microarray data to gene expression, proteomics or metabolomics data provide a detailed overview of the health status of a patient. Medical doctors need to assess these information sources and they rely on data driven automatic… ▽ More The histological assessment of human tissue has emerged as the key challenge for detection and treatment of cancer. A plethora of different data sources ranging from tissue microarray data to gene expression, proteomics or metabolomics data provide a detailed overview of the health status of a patient. Medical doctors need to assess these information sources and they rely on data driven automatic analysis tools. Methods for classification, grou** and segmentation of heterogeneous data sources as well as regression of noisy dependencies and estimation of survival probabilities enter the processing workflow of a pathology diagnosis system at various stages. This paper reports on state-of-the-art of the design and effectiveness of computational pathology workflows and it discusses future research directions in this emergent field of medical informatics and diagnostic machine learning. △ Less

Submitted 31 December, 2015; originally announced January 2016.

Journal ref: Computerized Medical Imaging and Graphics, vol. 35, 7-8, p. 515-530, 2011

arXiv:1411.6191 [pdf, other]

Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks

Authors: David Balduzzi, Hastagiri Vanchinathan, Joachim Buhmann

Abstract: Error backpropagation is an extremely effective algorithm for assigning credit in artificial neural networks. However, weight updates under Backprop depend on lengthy recursive computations and require separate output and error messages -- features not shared by biological neurons, that are perhaps unnecessary. In this paper, we revisit Backprop and the credit assignment problem. We first decompos… ▽ More Error backpropagation is an extremely effective algorithm for assigning credit in artificial neural networks. However, weight updates under Backprop depend on lengthy recursive computations and require separate output and error messages -- features not shared by biological neurons, that are perhaps unnecessary. In this paper, we revisit Backprop and the credit assignment problem. We first decompose Backprop into a collection of interacting learning algorithms; provide regret bounds on the performance of these sub-algorithms; and factorize Backprop's error signals. Using these results, we derive a new credit assignment algorithm for nonparametric regression, Kickback, that is significantly simpler than Backprop. Finally, we provide a sufficient condition for Kickback to follow error gradients, and show that Kickback matches Backprop's performance on real-world regression benchmarks. △ Less

Submitted 22 November, 2014; originally announced November 2014.

Comments: 7 pages. To appear, AAAI-15

arXiv:1306.5554 [pdf, ps, other]

Correlated random features for fast semi-supervised learning

Authors: Brian McWilliams, David Balduzzi, Joachim M. Buhmann

Abstract: This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It… ▽ More This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It has been shown that, if the views contains accurate estimators, CCA regression can substantially reduce variance with a minimal increase in bias. Random views are justified by recent theoretical and empirical work showing that regression with random features closely approximates kernel regression, implying that random views can be expected to contain accurate estimators. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude. △ Less

Submitted 5 November, 2013; v1 submitted 24 June, 2013; originally announced June 2013.

Comments: 15 pages, 3 figures, 6 tables

Showing 1–50 of 55 results for author: Buhmann, J