-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
ConeQuest: A Benchmark for Cone Segmentation on Mars
Authors:
Mirali Purohit,
Jacob Adler,
Hannah Kerner
Abstract:
Over the years, space scientists have collected terabytes of Mars data from satellites and rovers. One important set of features identified in Mars orbital images is pitted cones, which are interpreted to be mud volcanoes believed to form in regions that were once saturated in water (i.e., a lake or ocean). Identifying pitted cones globally on Mars would be of great importance, but expert geologis…
▽ More
Over the years, space scientists have collected terabytes of Mars data from satellites and rovers. One important set of features identified in Mars orbital images is pitted cones, which are interpreted to be mud volcanoes believed to form in regions that were once saturated in water (i.e., a lake or ocean). Identifying pitted cones globally on Mars would be of great importance, but expert geologists are unable to sort through the massive orbital image archives to identify all examples. However, this task is well suited for computer vision. Although several computer vision datasets exist for various Mars-related tasks, there is currently no open-source dataset available for cone detection/segmentation. Furthermore, previous studies trained models using data from a single region, which limits their applicability for global detection and map**. Motivated by this, we introduce ConeQuest, the first expert-annotated public dataset to identify cones on Mars. ConeQuest consists of >13k samples from 3 different regions of Mars. We propose two benchmark tasks using ConeQuest: (i) Spatial Generalization and (ii) Cone-size Generalization. We finetune and evaluate widely-used segmentation models on both benchmark tasks. Results indicate that cone segmentation is a challenging open problem not solved by existing segmentation models, which achieve an average IoU of 52.52% and 42.55% on in-distribution data for tasks (i) and (ii), respectively. We believe this new benchmark dataset will facilitate the development of more accurate and robust models for cone segmentation. Data and code are available at https://github.com/kerner-lab/ConeQuest.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Application-driven Validation of Posteriors in Inverse Problems
Authors:
Tim J. Adler,
Jan-Hinrich Nölke,
Annika Reinke,
Minu Dietlinde Tizabi,
Sebastian Gruber,
Dasha Trofimova,
Lynton Ardizzone,
Paul F. Jaeger,
Florian Buettner,
Ullrich Köthe,
Lena Maier-Hein
Abstract:
Current deep learning-based solutions for image analysis tasks are commonly incapable of handling problems to which multiple different plausible solutions exist. In response, posterior-based methods such as conditional Diffusion Models and Invertible Neural Networks have emerged; however, their translation is hampered by a lack of research on adequate validation. In other words, the way progress i…
▽ More
Current deep learning-based solutions for image analysis tasks are commonly incapable of handling problems to which multiple different plausible solutions exist. In response, posterior-based methods such as conditional Diffusion Models and Invertible Neural Networks have emerged; however, their translation is hampered by a lack of research on adequate validation. In other words, the way progress is measured often does not reflect the needs of the driving practical application. Closing this gap in the literature, we present the first systematic framework for the application-driven validation of posterior-based methods in inverse problems. As a methodological novelty, it adopts key principles from the field of object detection validation, which has a long history of addressing the question of how to locate and match multiple object instances in an image. Treating modes as instances enables us to perform mode-centric validation, using well-interpretable metrics from the application perspective. We demonstrate the value of our framework through instantiations for a synthetic toy example and two medical vision use cases: pose estimation in surgery and imaging-based quantification of functional tissue parameters for diagnostics. Our framework offers key advantages over common approaches to posterior validation in all three examples and could thus revolutionize performance assessment in inverse problems.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Why is the winner the best?
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Sharib Ali,
Vincent Andrearczyk,
Marc Aubreville,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano,
Jorge Bernal,
Sebastian Bodenstedt,
Alessandro Casella,
Veronika Cheplygina,
Marie Daum,
Marleen de Bruijne,
Adrien Depeursinge,
Reuben Dorent,
Jan Egger,
David G. Ellis,
Sandy Engelhardt,
Melanie Ganz
, et al. (100 additional authors not shown)
Abstract:
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre…
▽ More
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Unsupervised Domain Transfer with Conditional Invertible Neural Networks
Authors:
Kris K. Dreher,
Leonardo Ayala,
Melanie Schellenberg,
Marco Hübner,
Jan-Hinrich Nölke,
Tim J. Adler,
Silvia Seidlitz,
Jan Sellner,
Alexander Studier-Fischer,
Janek Gröhl,
Felix Nickel,
Ullrich Köthe,
Alexander Seitel,
Lena Maier-Hein
Abstract:
Synthetic medical image generation has evolved as a key technique for neural network training and validation. A core challenge, however, remains in the domain gap between simulations and real data. While deep learning-based domain transfer using Cycle Generative Adversarial Networks and similar architectures has led to substantial progress in the field, there are use cases in which state-of-the-ar…
▽ More
Synthetic medical image generation has evolved as a key technique for neural network training and validation. A core challenge, however, remains in the domain gap between simulations and real data. While deep learning-based domain transfer using Cycle Generative Adversarial Networks and similar architectures has led to substantial progress in the field, there are use cases in which state-of-the-art approaches still fail to generate training images that produce convincing results on relevant downstream tasks. Here, we address this issue with a domain transfer approach based on conditional invertible neural networks (cINNs). As a particular advantage, our method inherently guarantees cycle consistency through its invertible architecture, and network training can efficiently be conducted with maximum likelihood training. To showcase our method's generic applicability, we apply it to two spectral imaging modalities at different scales, namely hyperspectral imaging (pixel-level) and photoacoustic tomography (image-level). According to comprehensive experiments, our method enables the generation of realistic spectral data and outperforms the state of the art on two downstream classification tasks (binary and multi-class). cINN-based domain transfer could thus evolve as an important method for realistic synthetic data generation in the field of spectral imaging and beyond.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Continuous diffusion for categorical data
Authors:
Sander Dieleman,
Laurent Sartran,
Arman Roshannai,
Nikolay Savinov,
Yaroslav Ganin,
Pierre H. Richemond,
Arnaud Doucet,
Robin Strudel,
Chris Dyer,
Conor Durkan,
Curtis Hawthorne,
Rémi Leblond,
Will Grathwohl,
Jonas Adler
Abstract:
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous natur…
▽ More
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.
△ Less
Submitted 15 December, 2022; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Accountable Safety for Rollups
Authors:
Ertem Nusret Tas,
John Adler,
Mustafa Al-Bassam,
Ismail Khoffi,
David Tse,
Nima Vaziri
Abstract:
Accountability, the ability to provably identify protocol violators, gained prominence as the main economic argument for the security of proof-of-stake (PoS) protocols. Rollups, the most popular scaling solution for blockchains, typically use PoS protocols as their parent chain. We define accountability for rollups, and present an attack that shows the absence of accountability on existing designs…
▽ More
Accountability, the ability to provably identify protocol violators, gained prominence as the main economic argument for the security of proof-of-stake (PoS) protocols. Rollups, the most popular scaling solution for blockchains, typically use PoS protocols as their parent chain. We define accountability for rollups, and present an attack that shows the absence of accountability on existing designs. We provide an accountable rollup design and prove its security, both for the traditional `enshrined' rollups and for sovereign rollups, an emergent alternative built on lazy blockchains, tasked only with ordering and availability of the rollup data.
△ Less
Submitted 4 November, 2022; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Robust deep learning-based semantic organ segmentation in hyperspectral images
Authors:
Silvia Seidlitz,
Jan Sellner,
Jan Odenthal,
Berkin Özdemir,
Alexander Studier-Fischer,
Samuel Knödler,
Leonardo Ayala,
Tim J. Adler,
Hannes G. Kenngott,
Minu Tizabi,
Martin Wagner,
Felix Nickel,
Beat P. Müller-Stich,
Lena Maier-Hein
Abstract:
Semantic image segmentation is an important prerequisite for context-awareness and autonomous robotics in surgery. The state of the art has focused on conventional RGB video data acquired during minimally invasive surgery, but full-scene semantic segmentation based on spectral imaging data and obtained during open surgery has received almost no attention to date. To address this gap in the literat…
▽ More
Semantic image segmentation is an important prerequisite for context-awareness and autonomous robotics in surgery. The state of the art has focused on conventional RGB video data acquired during minimally invasive surgery, but full-scene semantic segmentation based on spectral imaging data and obtained during open surgery has received almost no attention to date. To address this gap in the literature, we are investigating the following research questions based on hyperspectral imaging (HSI) data of pigs acquired in an open surgery setting: (1) What is an adequate representation of HSI data for neural network-based fully automated organ segmentation, especially with respect to the spatial granularity of the data (pixels vs. superpixels vs. patches vs. full images)? (2) Is there a benefit of using HSI data compared to other modalities, namely RGB data and processed HSI data (e.g. tissue parameters like oxygenation), when performing semantic organ segmentation? According to a comprehensive validation study based on 506 HSI images from 20 pigs, annotated with a total of 19 classes, deep learning-based segmentation performance increases, consistently across modalities, with the spatial context of the input data. Unprocessed HSI data offers an advantage over RGB data or processed data from the camera provider, with the advantage increasing with decreasing size of the input to the neural network. Maximum performance (HSI applied to whole images) yielded a mean DSC of 0.90 ((standard deviation (SD)) 0.04), which is in the range of the inter-rater variability (DSC of 0.89 ((standard deviation (SD)) 0.07)). We conclude that HSI could become a powerful image modality for fully-automatic surgical scene understanding with many advantages over traditional imaging, including the ability to recover additional functional tissue information. Code and pre-trained models: https://github.com/IMSY-DKFZ/htc.
△ Less
Submitted 10 July, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs
Authors:
Dan Rosenbaum,
Marta Garnelo,
Michal Zielinski,
Charlie Beattie,
Ellen Clancy,
Andrea Huber,
Pushmeet Kohli,
Andrew W. Senior,
John Jumper,
Carl Doersch,
S. M. Ali Eslami,
Olaf Ronneberger,
Jonas Adler
Abstract:
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to…
▽ More
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to play a key role in protein function. Recent methods for capturing conformational heterogeneity in cryo-EM data model it in volume space, making recovery of continuous atomic structures challenging. Here we present a fully deep-learning-based approach using variational auto-encoders (VAEs) to recover a continuous distribution of atomic protein structures and poses directly from picked particle images and demonstrate its efficacy on realistic simulated data. We hope that methods built on this work will allow incorporation of stronger prior information about protein structure and enable better understanding of non-rigid protein structures.
△ Less
Submitted 26 June, 2021;
originally announced June 2021.
-
Invertible Neural Networks for Uncertainty Quantification in Photoacoustic Imaging
Authors:
Jan-Hinrich Nölke,
Tim Adler,
Janek Gröhl,
Thomas Kirchner,
Lynton Ardizzone,
Carsten Rother,
Ullrich Köthe,
Lena Maier-Hein
Abstract:
Multispectral photoacoustic imaging (PAI) is an emerging imaging modality which enables the recovery of functional tissue parameters such as blood oxygenation. However, the underlying inverse problems are potentially ill-posed, meaning that radically different tissue properties may - in theory - yield comparable measurements. In this work, we present a new approach for handling this specific type…
▽ More
Multispectral photoacoustic imaging (PAI) is an emerging imaging modality which enables the recovery of functional tissue parameters such as blood oxygenation. However, the underlying inverse problems are potentially ill-posed, meaning that radically different tissue properties may - in theory - yield comparable measurements. In this work, we present a new approach for handling this specific type of uncertainty by leveraging the concept of conditional invertible neural networks (cINNs). Specifically, we propose going beyond commonly used point estimates for tissue oxygenation and converting single-pixel initial pressure spectra to the full posterior probability density. This way, the inherent ambiguity of a problem can be encoded with multiple modes in the output. Based on the presented architecture, we demonstrate two use cases which leverage this information to not only detect and quantify but also to compensate for uncertainties: (1) photoacoustic device design and (2) optimization of photoacoustic image acquisition. Our in silico studies demonstrate the potential of the proposed methodology to become an important building block for uncertainty-aware reconstruction of physiological parameters with PAI.
△ Less
Submitted 23 November, 2020; v1 submitted 10 November, 2020;
originally announced November 2020.
-
On the unreasonable effectiveness of CNNs
Authors:
Andreas Hauptmann,
Jonas Adler
Abstract:
Deep learning methods using convolutional neural networks (CNN) have been successfully applied to virtually all imaging problems, and particularly in image reconstruction tasks with ill-posed and complicated imaging models. In an attempt to put upper bounds on the capability of baseline CNNs for solving image-to-image problems we applied a widely used standard off-the-shelf network architecture (U…
▽ More
Deep learning methods using convolutional neural networks (CNN) have been successfully applied to virtually all imaging problems, and particularly in image reconstruction tasks with ill-posed and complicated imaging models. In an attempt to put upper bounds on the capability of baseline CNNs for solving image-to-image problems we applied a widely used standard off-the-shelf network architecture (U-Net) to the "inverse problem" of XOR decryption from noisy data and show acceptable results.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room
Authors:
Lena Maier-Hein,
Martin Wagner,
Tobias Ross,
Annika Reinke,
Sebastian Bodenstedt,
Peter M. Full,
Hellena Hempe,
Diana Mindroc-Filimon,
Patrick Scholz,
Thuy Nuong Tran,
Pierangela Bruno,
Anna Kisilenko,
Benjamin Müller,
Tornike Davitashvili,
Manuela Capek,
Minu Tizabi,
Matthias Eisenmann,
Tim J. Adler,
Janek Gröhl,
Melanie Schellenberg,
Silvia Seidlitz,
T. Y. Emmy Lai,
Bünyamin Pekdemir,
Veith Roethlingshoefer,
Fabian Both
, et al. (8 additional authors not shown)
Abstract:
Image-based tracking of medical instruments is an integral part of surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the proposed methods still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper in…
▽ More
Image-based tracking of medical instruments is an integral part of surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the proposed methods still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper introduces the Heidelberg Colorectal (HeiCo) data set - the first publicly available data set enabling comprehensive benchmarking of medical instrument detection and segmentation algorithms with a specific emphasis on method robustness and generalization capabilities. Our data set comprises 30 laparoscopic videos and corresponding sensor data from medical devices in the operating room for three different types of laparoscopic surgery. Annotations include surgical phase labels for all video frames as well as information on instrument presence and corresponding instance-wise segmentation masks for surgical instruments (if any) in more than 10,000 individual frames. The data has successfully been used to organize international competitions within the Endoscopic Vision Challenges 2017 and 2019.
△ Less
Submitted 23 February, 2021; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Kernel of CycleGAN as a Principle homogeneous space
Authors:
Nikita Moriakov,
Jonas Adler,
Jonas Teuwen
Abstract:
Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for ap…
▽ More
Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for approximate solutions. We show theoretically that the exact solution space is invariant with respect to automorphisms of the underlying probability spaces, and, furthermore, that the group of automorphisms acts freely and transitively on the space of exact solutions. We examine the case of zero `pure' CycleGAN loss first in its generality, and, subsequently, expand our analysis to approximate solutions for `extended' CycleGAN loss where identity loss term is included. In order to demonstrate that these results are applicable, we show that under mild conditions nontrivial smooth automorphisms exist. Furthermore, we provide empirical evidence that neural networks can learn these automorphisms with unexpected and unwanted results. We conclude that finding optimal solutions to the CycleGAN loss does not necessarily lead to the envisioned result in image-to-image translation tasks and that underlying hidden symmetries can render the result utterly useless.
△ Less
Submitted 24 January, 2020;
originally announced January 2020.
-
Out of distribution detection for intra-operative functional imaging
Authors:
Tim J. Adler,
Leonardo Ayala,
Lynton Ardizzone,
Hannes G. Kenngott,
Anant Vemuri,
Beat P. Müller-Stich,
Carsten Rother,
Ullrich Köthe,
Lena Maier-Hein
Abstract:
Multispectral optical imaging is becoming a key tool in the operating room. Recent research has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation. However, the accuracy of these algorithms can only be guaranteed if the spectra acquired during surgery match the ones seen during training. It is therefore of gre…
▽ More
Multispectral optical imaging is becoming a key tool in the operating room. Recent research has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation. However, the accuracy of these algorithms can only be guaranteed if the spectra acquired during surgery match the ones seen during training. It is therefore of great interest to detect so-called out of distribution (OoD) spectra to prevent the algorithm from presenting spurious results. In this paper we present an information theory based approach to OoD detection based on the widely applicable information criterion (WAIC). Our work builds upon recent methodology related to invertible neural networks (INN). Specifically, we make use of an ensemble of INNs as we need their tractable Jacobians in order to compute the WAIC. Comprehensive experiments with in silico, and in vivo multispectral imaging data indicate that our approach is well-suited for OoD detection. Our method could thus be an important step towards reliable functional imaging in the operating room.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
A unified representation network for segmentation with missing modalities
Authors:
Kenneth Lau,
Jonas Adler,
Jens Sjölund
Abstract:
Over the last few years machine learning has demonstrated groundbreaking results in many areas of medical image analysis, including segmentation. A key assumption, however, is that the train- and test distributions match. We study a realistic scenario where this assumption is clearly violated, namely segmentation with missing input modalities. We describe two neural network approaches that can han…
▽ More
Over the last few years machine learning has demonstrated groundbreaking results in many areas of medical image analysis, including segmentation. A key assumption, however, is that the train- and test distributions match. We study a realistic scenario where this assumption is clearly violated, namely segmentation with missing input modalities. We describe two neural network approaches that can handle a variable number of input modalities. The first is modality dropout: a simple but surprisingly effective modification of the training. The second is the unified representation network: a network architecture that maps a variable number of input modalities into a unified representation that can be used for downstream tasks such as segmentation. We demonstrate that modality dropout makes a standard segmentation network reasonably robust to missing modalities, but that the same network works even better if trained on the unified representation.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Multi-Scale Learned Iterative Reconstruction
Authors:
Andreas Hauptmann,
Jonas Adler,
Simon Arridge,
Ozan Öktem
Abstract:
Model-based learned iterative reconstruction methods have recently been shown to outperform classical reconstruction algorithms. Applicability of these methods to large scale inverse problems is however limited by the available memory for training and extensive training times, the latter due to computationally expensive forward models. As a possible solution to these restrictions we propose a mult…
▽ More
Model-based learned iterative reconstruction methods have recently been shown to outperform classical reconstruction algorithms. Applicability of these methods to large scale inverse problems is however limited by the available memory for training and extensive training times, the latter due to computationally expensive forward models. As a possible solution to these restrictions we propose a multi-scale learned iterative reconstruction scheme that computes iterates on discretisations of increasing resolution. This procedure does not only reduce memory requirements, it also considerably speeds up reconstruction and training times, but most importantly is scalable to large scale inverse problems with non-trivial forward operators, such as those that arise in many 3D tomographic applications. In particular, we propose a hybrid network that combines the multi-scale iterative approach with a particularly expressive network architecture which in combination exhibits excellent scalability in 3D.
Applicability of the algorithm is demonstrated for 3D cone beam computed tomography from real measurement data of an organic phantom. Additionally, we examine scalability and reconstruction quality in comparison to established learned reconstruction methods in two dimensions for low dose computed tomography on human phantoms.
△ Less
Submitted 20 April, 2020; v1 submitted 1 August, 2019;
originally announced August 2019.
-
Building Scalable Decentralized Payment Systems
Authors:
John Adler,
Mikerah Quintyne-Collins
Abstract:
Increasing the transactional throughput of decentralized blockchains in a secure manner has been the holy grail of blockchain research for most of the past decade. This paper introduces a scheme for scaling blockchains while retaining virtually identical security and decentralization, colloquially known as optimistic rollup. We propose a layer-2 scaling technique using a permissionless side chain…
▽ More
Increasing the transactional throughput of decentralized blockchains in a secure manner has been the holy grail of blockchain research for most of the past decade. This paper introduces a scheme for scaling blockchains while retaining virtually identical security and decentralization, colloquially known as optimistic rollup. We propose a layer-2 scaling technique using a permissionless side chain with merged consensus. The side chain only supports functionality to transact UTXOs and transfer funds to and from a parent chain in a trust-minimized manner. Optimized implementation and engineering of client code, along with improvements to block propagation efficiency versus currently deployed systems, allow use of this side chain to scale well beyond the capacities exhibited by contemporary blockchains without undue resource demands on full nodes.
△ Less
Submitted 23 July, 2020; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Uncertainty-aware performance assessment of optical imaging modalities with invertible neural networks
Authors:
Tim J. Adler,
Lynton Ardizzone,
Anant Vemuri,
Leonardo Ayala,
Janek Gröhl,
Thomas Kirchner,
Sebastian Wirkert,
Jakob Kruse,
Carsten Rother,
Ullrich Köthe,
Lena Maier-Hein
Abstract:
Purpose: Optical imaging is evolving as a key technique for advanced sensing in the operating room. Recent research has shown that machine learning algorithms can be used to address the inverse problem of converting pixel-wise multispectral reflectance measurements to underlying tissue parameters, such as oxygenation. Assessment of the specific hardware used in conjunction with such algorithms, ho…
▽ More
Purpose: Optical imaging is evolving as a key technique for advanced sensing in the operating room. Recent research has shown that machine learning algorithms can be used to address the inverse problem of converting pixel-wise multispectral reflectance measurements to underlying tissue parameters, such as oxygenation. Assessment of the specific hardware used in conjunction with such algorithms, however, has not properly addressed the possibility that the problem may be ill-posed.
Methods: We present a novel approach to the assessment of optical imaging modalities, which is sensitive to the different types of uncertainties that may occur when inferring tissue parameters. Based on the concept of invertible neural networks, our framework goes beyond point estimates and maps each multispectral measurement to a full posterior probability distribution which is capable of representing ambiguity in the solution via multiple modes. Performance metrics for a hardware setup can then be computed from the characteristics of the posteriors.
Results: Application of the assessment framework to the specific use case of camera selection for physiological parameter estimation yields the following insights: (1) Estimation of tissue oxygenation from multispectral images is a well-posed problem, while (2) blood volume fraction may not be recovered without ambiguity. (3) In general, ambiguity may be reduced by increasing the number of spectral bands in the camera.
Conclusion: Our method could help to optimize optical camera design in an application-specific manner.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Deep Bayesian Inversion
Authors:
Jonas Adler,
Ozan Öktem
Abstract:
Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a…
▽ More
Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a sampling based method using a WGAN with a novel mini-discriminator and a direct approach that trains a neural network using a novel loss function. The performance of both methods is demonstrated on image reconstruction in ultra low dose 3D helical CT. We compute the posterior mean and standard deviation of the 3D images followed by a hypothesis test to assess whether a "dark spot" in the liver of a cancer stricken patient is present. Both methods are computationally efficient and our evaluation shows very promising performance that clearly supports the claim that Bayesian inversion is usable for 3D imaging in time critical applications.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Task adapted reconstruction for inverse problems
Authors:
Jonas Adler,
Sebastian Lunz,
Olivier Verdier,
Carola-Bibiane Schönlieb,
Ozan Öktem
Abstract:
The paper considers the problem of performing a task defined on a model parameter that is only observed indirectly through noisy data in an ill-posed inverse problem. A key aspect is to formalize the steps of reconstruction and task as appropriate estimators (non-randomized decision rules) in statistical estimation problems. The implementation makes use of (deep) neural networks to provide a diffe…
▽ More
The paper considers the problem of performing a task defined on a model parameter that is only observed indirectly through noisy data in an ill-posed inverse problem. A key aspect is to formalize the steps of reconstruction and task as appropriate estimators (non-randomized decision rules) in statistical estimation problems. The implementation makes use of (deep) neural networks to provide a differentiable parametrization of the family of estimators for both steps. These networks are combined and jointly trained against suitable supervised training data in order to minimize a joint differentiable loss function, resulting in an end-to-end task adapted reconstruction method. The suggested framework is generic, yet adaptable, with a plug-and-play structure for adjusting both the inverse problem and the task at hand. More precisely, the data model (forward operator and statistical model of the noise) associated with the inverse problem is exchangeable, e.g., by using neural network architecture given by a learned iterative method. Furthermore, any task that is encodable as a trainable neural network can be used. The approach is demonstrated on joint tomographic image reconstruction, classification and joint tomographic image reconstruction segmentation.
△ Less
Submitted 27 August, 2018;
originally announced September 2018.
-
Deep Learning Framework for Digital Breast Tomosynthesis Reconstruction
Authors:
Nikita Moriakov,
Koen Michielsen,
Jonas Adler,
Ritse Mann,
Ioannis Sechopoulos,
Jonas Teuwen
Abstract:
Digital breast tomosynthesis is rapidly replacing digital mammography as the basic x-ray technique for evaluation of the breasts. However, the sparse sampling and limited angular range gives rise to different artifacts, which manufacturers try to solve in several ways. In this study we propose an extension of the Learned Primal-Dual algorithm for digital breast tomosynthesis. The Learned Primal-Du…
▽ More
Digital breast tomosynthesis is rapidly replacing digital mammography as the basic x-ray technique for evaluation of the breasts. However, the sparse sampling and limited angular range gives rise to different artifacts, which manufacturers try to solve in several ways. In this study we propose an extension of the Learned Primal-Dual algorithm for digital breast tomosynthesis. The Learned Primal-Dual algorithm is a deep neural network consisting of several `reconstruction blocks', which take in raw sinogram data as the initial input, perform a forward and a backward pass by taking projections and back-projections, and use a convolutional neural network to produce an intermediate reconstruction result which is then improved further by the successive reconstruction block. We extend the architecture by providing breast thickness measurements as a mask to the neural network and allow it to learn how to use this thickness mask. We have trained the algorithm on digital phantoms and the corresponding noise-free/noisy projections, and then tested the algorithm on digital phantoms for varying level of noise. Reconstruction performance of the algorithms was compared visually, using MSE loss and Structural Similarity Index. Results indicate that the proposed algorithm outperforms the baseline iterative reconstruction algorithm in terms of reconstruction quality for both breast edges and internal structures and is robust to noise.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Astraea: A Decentralized Blockchain Oracle
Authors:
John Adler,
Ryan Berryhill,
Andreas Veneris,
Zissis Poulos,
Neil Veira,
Anastasia Kastania
Abstract:
The public blockchain was originally conceived to process monetary transactions in a peer-to-peer network while preventing double-spending. It has since been extended to numerous other applications including execution of programs that exist on the blockchain called "smart contracts." Smart contracts have a major limitation, namely they only operate on data that is on the blockchain. Trusted entiti…
▽ More
The public blockchain was originally conceived to process monetary transactions in a peer-to-peer network while preventing double-spending. It has since been extended to numerous other applications including execution of programs that exist on the blockchain called "smart contracts." Smart contracts have a major limitation, namely they only operate on data that is on the blockchain. Trusted entities called oracles attest to external data in order to bring it onto the blockchain but they do so without the robust security guarantees that blockchains generally provide. This has the potential to turn oracles into centralized points-of-failure. To address this concern, this paper introduces Astraea, a decentralized oracle based on a voting game that decides the truth or falsity of propositions. Players fall into two roles: voters and certifiers. Voters play a low-risk/low-reward role that is resistant to adversarial manipulation while certifiers play a high-risk/high-reward role so they are required to play with a high degree of accuracy. This paper also presents a formal analysis of the parameters behind the system to measure the probability of an adversary with bounded funds being able to successfully manipulate the oracle's decision, that shows that the same parameters can be set to make manipulation arbitrarily difficult---a desirable feature for the system. Further, this analysis demonstrates that under those conditions a Nash equilibrium exists where all rational players are forced to behave honestly.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Banach Wasserstein GAN
Authors:
Jonas Adler,
Sebastian Lunz
Abstract:
Wasserstein Generative Adversarial Networks (WGANs) can be used to generate realistic samples from complicated image distributions. The Wasserstein metric used in WGANs is based on a notion of distance between individual images, which induces a notion of distance between probability distributions of images. So far the community has considered $\ell^2$ as the underlying distance. We generalize the…
▽ More
Wasserstein Generative Adversarial Networks (WGANs) can be used to generate realistic samples from complicated image distributions. The Wasserstein metric used in WGANs is based on a notion of distance between individual images, which induces a notion of distance between probability distributions of images. So far the community has considered $\ell^2$ as the underlying distance. We generalize the theory of WGAN with gradient penalty to Banach spaces, allowing practitioners to select the features to emphasize in the generator. We further discuss the effect of some particular choices of underlying norms, focusing on Sobolev norms. Finally, we demonstrate a boost in performance for an appropriate choice of norm on CIFAR-10 and CelebA.
△ Less
Submitted 11 January, 2019; v1 submitted 18 June, 2018;
originally announced June 2018.
-
A modified fuzzy C means algorithm for shading correction in craniofacial CBCT images
Authors:
Awais Ashfaq,
Jonas Adler
Abstract:
CBCT images suffer from acute shading artifacts primarily due to scatter. Numerous image-domain correction algorithms have been proposed in the literature that use patient-specific planning CT images to estimate shading contributions in CBCT images. However, in the context of radiosurgery applications such as gamma knife, planning images are often acquired through MRI which impedes the use of poly…
▽ More
CBCT images suffer from acute shading artifacts primarily due to scatter. Numerous image-domain correction algorithms have been proposed in the literature that use patient-specific planning CT images to estimate shading contributions in CBCT images. However, in the context of radiosurgery applications such as gamma knife, planning images are often acquired through MRI which impedes the use of polynomial fitting approaches for shading correction. We present a new shading correction approach that is independent of planning CT images. Our algorithm is based on the assumption that true CBCT images follow a uniform volumetric intensity distribution per material, and scatter perturbs this uniform texture by contributing cup** and shading artifacts in the image domain. The framework is a combination of fuzzy C-means coupled with a neighborhood regularization term and Otsu's method. Experimental results on artificially simulated craniofacial CBCT images are provided to demonstrate the effectiveness of our algorithm. Spatial non-uniformity is reduced from 16% to 7% in soft tissue and from 44% to 8% in bone regions. With shading-correction, thresholding based segmentation accuracy for bone pixels is improved from 85% to 91% when compared to thresholding without shading-correction. The proposed algorithm is thus practical and qualifies as a plug and play extension into any CBCT reconstruction software for shading correction.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Learning to solve inverse problems using Wasserstein loss
Authors:
Jonas Adler,
Axel Ringh,
Ozan Öktem,
Johan Karlsson
Abstract:
We propose using the Wasserstein loss for training in inverse problems. In particular, we consider a learned primal-dual reconstruction scheme for ill-posed inverse problems using the Wasserstein distance as loss function in the learning. This is motivated by miss-alignments in training data, which when using standard mean squared error loss could severely degrade reconstruction quality. We prove…
▽ More
We propose using the Wasserstein loss for training in inverse problems. In particular, we consider a learned primal-dual reconstruction scheme for ill-posed inverse problems using the Wasserstein distance as loss function in the learning. This is motivated by miss-alignments in training data, which when using standard mean squared error loss could severely degrade reconstruction quality. We prove that training with the Wasserstein loss gives a reconstruction operator that correctly compensates for miss-alignments in certain cases, whereas training with the mean squared error gives a smeared reconstruction. Moreover, we demonstrate these effects by training a reconstruction algorithm using both mean squared error and optimal transport loss for a problem in computerized tomography.
△ Less
Submitted 30 October, 2017;
originally announced October 2017.
-
Model based learning for accelerated, limited-view 3D photoacoustic tomography
Authors:
Andreas Hauptmann,
Felix Lucka,
Marta Betcke,
Nam Huynh,
Jonas Adler,
Ben Cox,
Paul Beard,
Sebastien Ourselin,
Simon Arridge
Abstract:
Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed-up. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates grad…
▽ More
Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed-up. In this work we present a deep neural network that is specifically designed to provide high resolution 3D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artefacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung CT scans and then applied to in-vivo photoacoustic measurement data.
△ Less
Submitted 26 March, 2018; v1 submitted 31 August, 2017;
originally announced August 2017.
-
Learned Primal-dual Reconstruction
Authors:
Jonas Adler,
Ozan Öktem
Abstract:
We propose the Learned Primal-Dual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained end-to-end, working directly from raw measured data and it d…
▽ More
We propose the Learned Primal-Dual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained end-to-end, working directly from raw measured data and it does not depend on any initial reconstruction such as FBP.
We compare performance of the proposed method on low dose CT reconstruction against FBP, TV, and deep learning based post-processing of FBP. For the Shepp-Logan phantom we obtain >6dB PSNR improvement against all compared methods. For human phantoms the corresponding improvement is 6.6dB over TV and 2.2dB over learned post-processing along with a substantial improvement in the SSIM. Finally, our algorithm involves only ten forward-back-projection computations, making the method feasible for time critical clinical applications.
△ Less
Submitted 5 July, 2018; v1 submitted 20 July, 2017;
originally announced July 2017.
-
Solving ill-posed inverse problems using iterative deep neural networks
Authors:
Jonas Adler,
Ozan Öktem
Abstract:
We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The m…
▽ More
We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the "gradient" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.
△ Less
Submitted 22 May, 2017; v1 submitted 13 April, 2017;
originally announced April 2017.