Search | arXiv e-print repository

Sorting of Smartphone Components for Recycling Through Convolutional Neural Networks

Authors: Álvaro G. Becker, Marcelo P. Cenci, Thiago L. T. da Silveira, Hugo M. Veit

Abstract: The recycling of waste electrical and electronic equipment is an essential tool in allowing for a circular economy, presenting the potential for significant environmental and economic gain. However, traditional material separation techniques, based on physical and chemical processes, require substantial investment and do not apply to all cases. In this work, we investigate using an image classific… ▽ More The recycling of waste electrical and electronic equipment is an essential tool in allowing for a circular economy, presenting the potential for significant environmental and economic gain. However, traditional material separation techniques, based on physical and chemical processes, require substantial investment and do not apply to all cases. In this work, we investigate using an image classification neural network as a potential means to control an automated material separation process in treating smartphone waste, acting as a more efficient, less costly, and more widely applicable alternative to existing tools. We produced a dataset with 1,127 images of pyrolyzed smartphone components, which was then used to train and assess a VGG-16 image classification model. The model achieved 83.33% accuracy, lending credence to the viability of using such a neural network in material separation. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.12599 [pdf, other]

Unsupervised Segmentation of Colonoscopy Images

Authors: Heming Yao, Jérôme Lüscher, Benjamin Gutierrez Becker, Josep Arús-Pous, Tommaso Biancalani, Amelie Bigorgne, David Richmond

Abstract: Colonoscopy plays a crucial role in the diagnosis and prognosis of various gastrointestinal diseases. Due to the challenges of collecting large-scale high-quality ground truth annotations for colonoscopy images, and more generally medical images, we explore using self-supervised features from vision transformers in three challenging tasks for colonoscopy images. Our results indicate that image-lev… ▽ More Colonoscopy plays a crucial role in the diagnosis and prognosis of various gastrointestinal diseases. Due to the challenges of collecting large-scale high-quality ground truth annotations for colonoscopy images, and more generally medical images, we explore using self-supervised features from vision transformers in three challenging tasks for colonoscopy images. Our results indicate that image-level features learned from DINO models achieve image classification performance comparable to fully supervised models, and patch-level features contain rich semantic information for object detection. Furthermore, we demonstrate that self-supervised features combined with unsupervised segmentation can be used to discover multiple clinically relevant structures in a fully unsupervised manner, demonstrating the tremendous potential of applying these methods in medical image analysis. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2210.13269 [pdf, other]

IQUAFLOW: A new framework to measure image quality

Authors: P. Gallés, K. Takats, M. Hernández-Cabronero, D. Berga, L. Pega, L. Riordan-Chen, C. Garcia, G. Becker, A. Garriga, A. Bukva, J. Serra-Sagristà, D. Vilaseca, J. Marín

Abstract: IQUAFLOW is a new image quality framework that provides a set of tools to assess image quality. The user can add custom metrics that can be easily integrated. Furthermore, iquaflow allows to measure quality by using the performance of AI models trained on the images as a proxy. This also helps to easily make studies of performance degradation of several modifications of the original dataset, for i… ▽ More IQUAFLOW is a new image quality framework that provides a set of tools to assess image quality. The user can add custom metrics that can be easily integrated. Furthermore, iquaflow allows to measure quality by using the performance of AI models trained on the images as a proxy. This also helps to easily make studies of performance degradation of several modifications of the original dataset, for instance, with images reconstructed after different levels of lossy compression; satellite images would be a use case example, since they are commonly compressed before downloading to the ground. In this situation, the optimization problem consists in finding the smallest images that provide yet sufficient quality to meet the required performance of the deep learning algorithms. Thus, a study with iquaflow is suitable for such case. All this development is wrapped in Mlflow: an interactive tool used to visualize and summarize the results. This document describes different use cases and provides links to their respective repositories. To ease the creation of new studies, we include a cookie-cutter repository. The source code, issue tracker and aforementioned repositories are all hosted on GitHub https://github.com/satellogic/iquaflow. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.08404 [pdf, other]

Using Answer Set Programming for HPC Dependency Solving

Authors: Todd Gamblin, Massimiliano Culpo, Gregory Becker, Sergei Shudler

Abstract: Modern scientific software stacks have become extremely complex, using many programming models and libraries to exploit a growing variety of GPUs and accelerators. Package managers can mitigate this complexity using dependency solvers, but they are reaching their limits. Finding compatible dependency versions is NP-complete, and modeling the semantics of package compatibility modulo build-time opt… ▽ More Modern scientific software stacks have become extremely complex, using many programming models and libraries to exploit a growing variety of GPUs and accelerators. Package managers can mitigate this complexity using dependency solvers, but they are reaching their limits. Finding compatible dependency versions is NP-complete, and modeling the semantics of package compatibility modulo build-time options, GPU runtimes, flags, and other parameters is extremely difficult. Within this enormous configuration space, defining a "good" configuration is daunting. We tackle this problem using Answer Set Programming (ASP), a declarative model for combinatorial search problems. We show, using the Spack package manager, that ASP programs can concisely express the compatibility rules of HPC software stacks and provide strong quality-of-solution guarantees. Using ASP, we can mix new builds with preinstalled binaries, and solver performance is acceptable even when considering tens of thousands of packages. △ Less

Submitted 15 October, 2022; originally announced October 2022.

arXiv:2010.00820 [pdf, other]

Discriminative and Generative Models for Anatomical Shape Analysison Point Clouds with Deep Neural Networks

Authors: Benjamin Gutierrez Becker, Ignacio Sarasua, Christian Wachinger

Abstract: We introduce deep neural networks for the analysis of anatomical shapes that learn a low-dimensional shape representation from the given task, instead of relying on hand-engineered representations. Our framework is modular and consists of several computing blocks that perform fundamental shape processing tasks. The networks operate on unordered point clouds and provide invariance to similarity tra… ▽ More We introduce deep neural networks for the analysis of anatomical shapes that learn a low-dimensional shape representation from the given task, instead of relying on hand-engineered representations. Our framework is modular and consists of several computing blocks that perform fundamental shape processing tasks. The networks operate on unordered point clouds and provide invariance to similarity transformations, avoiding the need to identify point correspondences between shapes. Based on the framework, we assemble a discriminative model for disease classification and age regression, as well as a generative model for the accruate reconstruction of shapes. In particular, we propose a conditional generative model, where the condition vector provides a mechanism to control the generative process. instance, it enables to assess shape variations specific to a particular diagnosis, when passing it as side information. Next to working on single shapes, we introduce an extension for the joint analysis of multiple anatomical structures, where the simultaneous modeling of multiple structures can lead to a more compact encoding and a better understanding of disorders. We demonstrate the advantages of our framework in comprehensive experiments on real and synthetic data. The key insights are that (i) learning a shape representation specific to the given task yields higher performance than alternative shape descriptors, (ii) multi-structure analysis is both more efficient and more accurate than single-structure analysis, and (iii) point clouds generated by our model capture morphological differences associated to Alzheimers disease, to the point that they can be used to train a discriminative model for disease classification. Our framework naturally scales to the analysis of large datasets, giving it the potential to learn characteristic variations in large populations. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: Accepted for publication in the Medical Image Analysis journal

MSC Class: 68T07(Primary) ACM Class: I.2.1

arXiv:1907.04102 [pdf, other]

Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference

Authors: Christian Wachinger, Benjamin Gutierrez Becker, Anna Rieckmann, Sebastian Pölsterl

Abstract: Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due… ▽ More Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: MICCAI 2019

arXiv:1804.10764 [pdf, other]

Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

Authors: Christian Wachinger, Benjamin Gutierrez Becker, Anna Rieckmann

Abstract: Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex models or for finding genome wide associations. A solution is to grow the sample size by merging data across several datasets. However, bias in datasets complicates this approach and includes additional sources of variation… ▽ More Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex models or for finding genome wide associations. A solution is to grow the sample size by merging data across several datasets. However, bias in datasets complicates this approach and includes additional sources of variation in the data instead. In this work, we combine 15 large neuroimaging datasets to study bias. First, we detect bias by demonstrating that scans can be correctly assigned to a dataset with 73.3% accuracy. Next, we introduce metrics to quantify the compatibility across datasets and to create embeddings of neuroimaging sites. Finally, we incorporate the presence of bias for the selection of a training set for predicting autism. For the quantification of the dataset bias, we introduce two metrics: the Bhattacharyya distance between datasets and the age prediction error. The presented embedding of neuroimaging sites provides an interesting new visualization about the similarity of different sites. This could be used to guide the merging of data sources, while limiting the introduction of unwanted variation. Finally, we demonstrate a clear performance increase when incorporating dataset bias for training set selection in autism prediction. Overall, we believe that the growing amount of neuroimaging data necessitates to incorporate data-driven methods for quantifying dataset bias in future analyses. △ Less

Submitted 28 April, 2018; originally announced April 2018.

arXiv:1804.01296 [pdf, other]

Gaussian Process Uncertainty in Age Estimation as a Measure of Brain Abnormality

Authors: Benjamin Gutierrez Becker, Tassilo Klein, Christian Wachinger

Abstract: Multivariate regression models for age estimation are a powerful tool for assessing abnormal brain morphology associated to neuropathology. Age prediction models are built on cohorts of healthy subjects and are built to reflect normal aging patterns. The application of these multivariate models to diseased subjects usually results in high prediction errors, under the hypothesis that neuropathology… ▽ More Multivariate regression models for age estimation are a powerful tool for assessing abnormal brain morphology associated to neuropathology. Age prediction models are built on cohorts of healthy subjects and are built to reflect normal aging patterns. The application of these multivariate models to diseased subjects usually results in high prediction errors, under the hypothesis that neuropathology presents a similar degenerative pattern as that of accelerated aging. In this work, we propose an alternative to the idea that pathology follows a similar trajectory than normal aging. Instead, we propose the use of metrics which measure deviations from the mean aging trajectory. We propose to measure these deviations using two different metrics: uncertainty in a Gaussian process regression model and a newly proposed age weighted uncertainty measure. Consequently, our approach assumes that pathologic brain patterns are different to those of normal aging. We present results for subjects with autism, mild cognitive impairment and Alzheimer's disease to highlight the versatility of the approach to different diseases and age ranges. We evaluate volume, thickness, and VBM features for quantifying brain morphology. Our evaluations are performed on a large number of images obtained from a variety of publicly available neuroimaging databases. Across all features, our uncertainty based measurements yield a better separation between diseased subjects and healthy individuals than the prediction error. Finally, we illustrate differences in the disease pattern to normal aging, supporting the application of uncertainty as a measure of neuropathology. △ Less

Submitted 4 April, 2018; originally announced April 2018.

Comments: Paper accepted in Neuroimage

arXiv:1110.1212 [pdf]

doi 10.1086/665581

Chandra Publication Statistics

Authors: Arnold H. Rots, Sherry L. Winkelman, Glenn Becker

Abstract: In this study we develop and propose publication metrics, based on an analysis of data from the Chandra bibliographic database, that are more meaningful and less sensitive to observatory-specific characteristics than the traditional metrics. They fall in three main categories: speed of publication; fraction of observing time published; and archival usage. Citation of results is a fourth category,… ▽ More In this study we develop and propose publication metrics, based on an analysis of data from the Chandra bibliographic database, that are more meaningful and less sensitive to observatory-specific characteristics than the traditional metrics. They fall in three main categories: speed of publication; fraction of observing time published; and archival usage. Citation of results is a fourth category, but lends itself less well to definite statements. For Chandra, the median time from observation to publication is 2.36 years; after about 7 years 90% of the observing time is published; after 10 years 70% of the observing time is published more than twice; and the total annual publication output of the mission is 60-70% of the cumulative observing time available, assuming a two year lag between data retrieval and publication. △ Less

Submitted 15 February, 2012; v1 submitted 6 October, 2011; originally announced October 2011.

Comments: 22 pages, 8 figures, 3 tables; revised manuscript submitted to PASP

Showing 1–9 of 9 results for author: Becker, G