Search | arXiv e-print repository

arXiv:2212.02028 [pdf, other]

Double U-Net for Super-Resolution and Segmentation of Live Cell Images

Authors: Mayur Bhandary, J. Patricio Reyes, Eylul Ertay, Aman Panda

Abstract: Accurate segmentation of live cell images has broad applications in clinical and research contexts. Deep learning methods have been able to perform cell segmentations with high accuracy; however develo** machine learning models to do this requires access to high fidelity images of live cells. This is often not available due to resource constraints like limited accessibility to high performance m… ▽ More Accurate segmentation of live cell images has broad applications in clinical and research contexts. Deep learning methods have been able to perform cell segmentations with high accuracy; however develo** machine learning models to do this requires access to high fidelity images of live cells. This is often not available due to resource constraints like limited accessibility to high performance microscopes or due to the nature of the studied organisms. Segmentation on low resolution images of live cells is a difficult task. This paper proposes a method to perform live cell segmentation with low resolution images by performing super-resolution as a pre-processing step in the segmentation pipeline. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: 11 pages, 10 figures, Cornell Tech Deep learning, Cornell Tech CS 5787

arXiv:2110.02273 [pdf, other]

Bilevel Imaging Learning Problems as Mathematical Programs with Complementarity Constraints: Reformulation and Theory

Authors: Juan Carlos De los Reyes

Abstract: We investigate a family of bilevel imaging learning problems where the lower-level instance corresponds to a convex variational model involving first- and second-order nonsmooth sparsity-based regularizers. By using geometric properties of the primal-dual reformulation of the lower-level problem and introducing suitable auxiliar variables, we are able to reformulate the original bilevel problems a… ▽ More We investigate a family of bilevel imaging learning problems where the lower-level instance corresponds to a convex variational model involving first- and second-order nonsmooth sparsity-based regularizers. By using geometric properties of the primal-dual reformulation of the lower-level problem and introducing suitable auxiliar variables, we are able to reformulate the original bilevel problems as Mathematical Programs with Complementarity Constraints (MPCC). For the latter, we prove tight constraint qualification conditions (MPCC-RCPLD and partial MPCC-LICQ) and derive Mordukhovich (M-) and Strong (S-) stationarity conditions. The stationarity systems for the MPCC turn also into stationarity conditions for the original formulation. Second-order sufficient optimality conditions are derived as well, together with a local uniqueness result for stationary points. The proposed reformulation may be extended to problems in function spaces, leading to MPCC's with constraints on the gradient of the state. The MPCC reformulation also leads to the efficient use of available large-scale nonlinear programming solvers, as shown in a companion paper, where different imaging applications are studied. △ Less

Submitted 20 March, 2023; v1 submitted 5 October, 2021; originally announced October 2021.

MSC Class: 49K99; 90C33; 68U10; 68T99; 65K10

arXiv:2107.09627 [pdf, other]

Precision-Weighted Federated Learning

Authors: Jonatan Reyes, Lisa Di Jorio, Cecile Low-Kam, Marta Kersten-Oertel

Abstract: Federated Learning using the Federated Averaging algorithm has shown great advantages for large-scale applications that rely on collaborative learning, especially when the training data is either unbalanced or inaccessible due to privacy constraints. We hypothesize that Federated Averaging underestimates the full extent of heterogeneity of data when the aggregation is performed. We propose Precisi… ▽ More Federated Learning using the Federated Averaging algorithm has shown great advantages for large-scale applications that rely on collaborative learning, especially when the training data is either unbalanced or inaccessible due to privacy constraints. We hypothesize that Federated Averaging underestimates the full extent of heterogeneity of data when the aggregation is performed. We propose Precision-weighted Federated Learning a novel algorithm that takes into account the variance of the stochastic gradients when computing the weighted average of the parameters of models trained in a Federated Learning setting. With Precision-weighted Federated Learning, we provide an alternate averaging scheme that leverages the heterogeneity of the data when it has a large diversity of features in its composition. Our method was evaluated using standard image classification datasets with two different data partitioning strategies (IID/non-IID) to measure the performance and speed of our method in resource-constrained environments, such as mobile and IoT devices. We obtained a good balance between computational efficiency and convergence rates with Precision-weighted Federated Learning. Our performance evaluations show 9% better predictions with MNIST, 18% with Fashion-MNIST, and 5% with CIFAR-10 in the non-IID setting. Further reliability evaluations ratify the stability in our method by reaching a 99% reliability index with IID partitions and 96% with non-IID partitions. In addition, we obtained a 20x speedup on Fashion-MNIST with only 10 clients and up to 37x with 100 clients participating in the aggregation concurrently per communication round. The results indicate that Precision-weighted Federated Learning is an effective and faster alternative approach for aggregating private data, especially in domains where data is highly heterogeneous. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 10 pages, 11 figures

ACM Class: I.2.m; I.2.11

arXiv:2006.13163 [pdf, other]

doi 10.3847/1538-4365/aba267

MANTRA: A Machine Learning reference lightcurve dataset for astronomical transient event recognition

Authors: Mauricio Neira, Catalina Gómez, John F. Suárez-Pérez, Diego A. Gómez, Juan Pablo Reyes, Marcela Hernández Hoyos, Pablo Arbeláez, Jaime E. Forero-Romero

Abstract: We introduce MANTRA, an annotated dataset of 4869 transient and 71207 non-transient object lightcurves built from the Catalina Real Time Transient Survey. We provide public access to this dataset as a plain text file to facilitate standardized quantitative comparison of astronomical transient event recognition algorithms. Some of the classes included in the dataset are: supernovae, cataclysmic var… ▽ More We introduce MANTRA, an annotated dataset of 4869 transient and 71207 non-transient object lightcurves built from the Catalina Real Time Transient Survey. We provide public access to this dataset as a plain text file to facilitate standardized quantitative comparison of astronomical transient event recognition algorithms. Some of the classes included in the dataset are: supernovae, cataclysmic variables, active galactic nuclei, high proper motion stars, blazars and flares. As an example of the tasks that can be performed on the dataset we experiment with multiple data pre-processing methods, feature selection techniques and popular machine learning algorithms (Support Vector Machines, Random Forests and Neural Networks). We assess quantitative performance in two classification tasks: binary (transient/non-transient) and eight-class classification. The best performing algorithm in both tasks is the Random Forest Classifier. It achieves an F1-score of 96.25% in the binary classification and 52.79% in the eight-class classification. For the eight-class classification, non-transients ( 96.83% ) is the class with the highest F1-score, while the lowest corresponds to high-proper-motion stars ( 16.79% ); for supernovae it achieves a value of 54.57% , close to the average across classes. The next release of MANTRA includes images and benchmarks with deep learning models. △ Less

Submitted 30 June, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: ApJS accepted, 17 pages, 14 figures

arXiv:2005.00042 [pdf]

Method for Customizable Automated Tagging: Addressing the Problem of Over-tagging and Under-tagging Text Documents

Authors: Maharshi R. Pandya, Jessica Reyes, Bob Vanderheyden

Abstract: Using author provided tags to predict tags for a new document often results in the overgeneration of tags. In the case where the author doesn't provide any tags, our documents face the severe under-tagging issue. In this paper, we present a method to generate a universal set of tags that can be applied widely to a large document corpus. Using IBM Watson's NLU service, first, we collect keywords/ph… ▽ More Using author provided tags to predict tags for a new document often results in the overgeneration of tags. In the case where the author doesn't provide any tags, our documents face the severe under-tagging issue. In this paper, we present a method to generate a universal set of tags that can be applied widely to a large document corpus. Using IBM Watson's NLU service, first, we collect keywords/phrases that we call "complex document tags" from 8,854 popular reports in the corpus. We apply LDA model over these complex document tags to generate a set of 765 unique "simple tags". In applying the tags to a corpus of documents, we run each document through the IBM Watson NLU and apply appropriate simple tags. Using only 765 simple tags, our method allows us to tag 87,397 out of 88,583 total documents in the corpus with at least one tag. About 92.1% of the total 87,397 documents are also determined to be sufficiently-tagged. In the end, we discuss the performance of our method and its limitations. △ Less

Submitted 30 April, 2020; originally announced May 2020.

Comments: Work done by Maharshi R. Pandya and Jessica Reyes as IBM interns under leadership of Bob Vanderheyden. Article to be published

ACM Class: I.2.7

arXiv:2004.11430 [pdf, other]

doi 10.1001/jamanetworkopen.2020.20485

Mobile phone location data reveal the effect and geographic variation of social distancing on the spread of the COVID-19 epidemic

Authors: Song Gao, **meng Rao, Yuhao Kang, Yunlei Liang, Jake Kruse, Doerte Doepfer, Ajay K. Sethi, Juan Francisco Mandujano Reyes, Jonathan Patz, Brian S. Yandell

Abstract: The emergence of SARS-CoV-2 and the coronavirus infectious disease (COVID-19) has become a pandemic. Social (physical) distancing is a key non-pharmacologic control measure to reduce the transmission rate of SARS-COV-2, but high-level adherence is needed. Using daily travel distance and stay-at-home time derived from large-scale anonymous mobile phone location data provided by Descartes Labs and S… ▽ More The emergence of SARS-CoV-2 and the coronavirus infectious disease (COVID-19) has become a pandemic. Social (physical) distancing is a key non-pharmacologic control measure to reduce the transmission rate of SARS-COV-2, but high-level adherence is needed. Using daily travel distance and stay-at-home time derived from large-scale anonymous mobile phone location data provided by Descartes Labs and SafeGraph, we quantify the degree to which social distancing mandates have been followed in the U.S. and its effect on growth of COVID-19 cases. The correlation between the COVID-19 growth rate and travel distance decay rate and dwell time at home change rate was -0.586 (95% CI: -0.742 ~ -0.370) and 0.526 (95% CI: 0.293 ~ 0.700), respectively. Increases in state-specific doubling time of total cases ranged from 1.04 ~ 6.86 days to 3.66 ~ 30.29 days after social distancing orders were put in place, consistent with mechanistic epidemic prediction models. Social distancing mandates reduce the spread of COVID-19 when they are followed. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: 17 pages, 4 figures, 1 table

MSC Class: 65D10 ACM Class: H.4; G.3; J.2

Journal ref: JAMA Network Open. 2020;3(9):e2020485

arXiv:2001.08286 [pdf, other]

A Multi-Scale Tensor Network Architecture for Classification and Regression

Authors: Justin Reyes, Miles Stoudenmire

Abstract: We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks thr… ▽ More We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks through a model based on a matrix product state (MPS) tensor network acting on the coarse-grained data. Because the entire model consists of tensor contractions (apart from the initial non-linear feature map), we can adaptively fine-grain the optimized MPS model backwards through the layers with essentially no loss in performance. The MPS itself is trained using an adaptive algorithm based on the density matrix renormalization group (DMRG) algorithm. We test our methods by performing a classification task on audio data and a regression task on temperature time-series data, studying the dependence of training accuracy on the number of coarse-graining layers and showing how fine-graining through the network may be used to initialize models with access to finer-scale features. △ Less

Submitted 22 January, 2020; originally announced January 2020.

arXiv:1908.08553 [pdf, other]

doi 10.1016/j.cpc.2020.107750

Simulation of Quantum Many-Body Systems on Amazon Cloud

Authors: Justin A. Reyes, Eduardo R. Mucciolo, Dan Marinescu

Abstract: Quantum many-body systems (QMBs) are some of the most challenging physical systems to simulate numerically. Methods involving approximations for tensor network (TN) contractions have proven to be viable alternatives to algorithms such as quantum Monte Carlo or simulated annealing. However, these methods are cumbersome, difficult to implement, and often have significant limitations in their accurac… ▽ More Quantum many-body systems (QMBs) are some of the most challenging physical systems to simulate numerically. Methods involving approximations for tensor network (TN) contractions have proven to be viable alternatives to algorithms such as quantum Monte Carlo or simulated annealing. However, these methods are cumbersome, difficult to implement, and often have significant limitations in their accuracy and efficiency when considering systems in more than one dimension. In this paper, we explore the exact computation of TN contractions on two-dimensional geometries and present a heuristic improvement of TN contraction that reduces the computing time, the amount of memory, and the communication time. We run our algorithm for the Ising model using memory optimized x1.32x large instances on Amazon Web Services (AWS) Elastic Compute Cloud (EC2). Our results show that cloud computing is a viable alternative to supercomputers for this class of scientific applications. △ Less

Submitted 13 January, 2021; v1 submitted 22 August, 2019; originally announced August 2019.

Comments: 25 pages, 11 figures

Journal ref: Computer Physics Communications 261 (2021) 107750

arXiv:1906.08754 [pdf, other]

Learning the Sampling Pattern for MRI

Authors: Ferdia Sherry, Martin Benning, Juan Carlos De los Reyes, Martin J. Graves, Georg Maierhofer, Guy Williams, Carola-Bibiane Schönlieb, Matthias J. Ehrhardt

Abstract: The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition… ▽ More The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 7 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 35% of k-space, however results in reconstructions with mean SSIM 0.914 on a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling. △ Less

Submitted 21 June, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: The main document is 12 pages, the supporting document is 2 pages and attached at the end of the main document

arXiv:1906.03144 [pdf, ps, other]

Identifying Operational Data-paths in Software Defined Networking Driven Data-planes

Authors: José Reyes, Jorge López, Djamal Zeghlache

Abstract: In this paper, we propose an approach that relies on distributed traffic generation and monitoring to identify the operational data-paths in a given Software Defined Networking (SDN) driven data-plane. We show that under certain assumptions, there exist necessary and sufficient conditions for formally guaranteeing that all operational data-paths are discovered using our approach. In order to provi… ▽ More In this paper, we propose an approach that relies on distributed traffic generation and monitoring to identify the operational data-paths in a given Software Defined Networking (SDN) driven data-plane. We show that under certain assumptions, there exist necessary and sufficient conditions for formally guaranteeing that all operational data-paths are discovered using our approach. In order to provide reliable communication within the SDN driven data-planes, assuring that the implemented data-paths are the requested (and expected) ones is necessary. This requires discovering the actual operational (running) data-paths in the data-plane. In SDN, different applications may configure different coexisting data-paths, the resulting data-paths a specific network flow traverses may not be the intended ones. Furthermore, the SDN components may be defected or compromised. We focus on discovering the operational data-paths on SDN driven data-planes. However, the proposed approach is applicable to any data-plane where the operational data-paths must be verified and / or certified. A data-path discovery toolkit has been implemented. We describe the corresponding set of tools, and showcase the obtained experimental results that reveal inconsistencies in well-known SDN applications. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Comments: Submitted preprint under revision

arXiv:1903.01322 [pdf]

Automatic Handgun Detection in X-ray Images using Bag of Words Model with Selective Search

Authors: David Castro Piñol, Enrique Juan Marañón Reyes

Abstract: Baggage inspection systems using X-ray screening are crucial for security. Only 90% of threat objects are recognized from the X-ray system based in human inspection. Manual detection requires high concentration due to the images complexity and the challenges objects points of view. An algorithm based on Bag of Visual Word (BoVW) with Selective Search is proposed in this paper for handguns detectio… ▽ More Baggage inspection systems using X-ray screening are crucial for security. Only 90% of threat objects are recognized from the X-ray system based in human inspection. Manual detection requires high concentration due to the images complexity and the challenges objects points of view. An algorithm based on Bag of Visual Word (BoVW) with Selective Search is proposed in this paper for handguns detection in single energy X-ray images from the public GDXray database. This approach is an adaptation of BoVW for X-ray baggage images context. In order to evaluate the proposed method the algorithm effectiveness recognition was tested on all bounding boxes returned by selective search algorithm in 200 images. The most relevant result is the precision and true positive rate (PPV = 80%, TPR= 92%). This approach achieves good performance for handgun recognition. In addition, it is the first time the Selective Search localization algorithm was tested in baggage X-ray images and showed possibilities with Bag of Visual Words. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: in Spanish

arXiv:1508.07243 [pdf, other]

doi 10.1007/s10851-016-0662-8

Bilevel parameter learning for higher-order total variation regularisation models

Authors: J. C. De los Reyes, C. -B. Schönlieb, T. Valkonen

Abstract: We consider a bilevel optimisation approach for parameter learning in higher-order total variation image reconstruction models. Apart from the least squares cost functional, naturally used in bilevel learning, we propose and analyse an alternative cost, based on a Huber regularised TV-seminorm. Differentiability properties of the solution operator are verified and a first-order optimality system i… ▽ More We consider a bilevel optimisation approach for parameter learning in higher-order total variation image reconstruction models. Apart from the least squares cost functional, naturally used in bilevel learning, we propose and analyse an alternative cost, based on a Huber regularised TV-seminorm. Differentiability properties of the solution operator are verified and a first-order optimality system is derived. Based on the adjoint information, a quasi-Newton algorithm is proposed for the numerical solution of the bilevel problems. Numerical experiments are carried out to show the suitability of our approach and the improved performance of the new cost functional. Thanks to the bilevel optimisation framework, also a detailed comparison between TGV$^2$ and ICTV is carried out, showing the advantages and shortcomings of both regularisers, depending on the structure of the processed images and their noise level. △ Less

Submitted 28 August, 2015; originally announced August 2015.

arXiv:1505.02120 [pdf, other]

Bilevel approaches for learning of variational imaging models

Authors: Luca Calatroni, Cao Chung, Juan Carlos De Los Reyes, Carola-Bibiane Schönlieb, Tuomo Valkonen

Abstract: We review some recent learning approaches in variational imaging, based on bilevel optimisation, and emphasize the importance of their treatment in function space. The paper covers both analytical and numerical techniques. Analytically, we include results on the existence and structure of minimisers, as well as optimality conditions for their characterisation. Based on this information, Newton typ… ▽ More We review some recent learning approaches in variational imaging, based on bilevel optimisation, and emphasize the importance of their treatment in function space. The paper covers both analytical and numerical techniques. Analytically, we include results on the existence and structure of minimisers, as well as optimality conditions for their characterisation. Based on this information, Newton type methods are studied for the solution of the problems at hand, combining them with sampling techniques in case of large databases. The computational verification of the developed techniques is extensively documented, covering instances with different type of regularisers, several noise models, spatially dependent weights and large image databases. △ Less

Submitted 8 May, 2015; originally announced May 2015.

arXiv:1505.01953 [pdf, other]

doi 10.1016/j.jmaa.2015.09.023

The structure of optimal parameters for image restoration problems

Authors: Juan Carlos De Los Reyes, Carola-Bibiane Schönlieb, Tuomo Valkonen

Abstract: We study the qualitative properties of optimal regularisation parameters in variational models for image restoration. The parameters are solutions of bilevel optimisation problems with the image restoration problem as constraint. A general type of regulariser is considered, which encompasses total variation (TV), total generalized variation (TGV) and infimal-convolution total variation (ICTV). We… ▽ More We study the qualitative properties of optimal regularisation parameters in variational models for image restoration. The parameters are solutions of bilevel optimisation problems with the image restoration problem as constraint. A general type of regulariser is considered, which encompasses total variation (TV), total generalized variation (TGV) and infimal-convolution total variation (ICTV). We prove that under certain conditions on the given data optimal parameters derived by bilevel optimisation problems exist. A crucial point in the existence proof turns out to be the boundedness of the optimal parameters away from $0$ which we prove in this paper. The analysis is done on the original -- in image restoration typically non-smooth variational problem -- as well as on a smoothed approximation set in Hilbert space which is the one considered in numerical computations. For the smoothed bilevel problem we also prove that it $Γ$ converges to the original problem as the smoothing vanishes. All analysis is done in function spaces rather than on the discretised learning problem. △ Less

Submitted 8 May, 2015; originally announced May 2015.

Showing 1–14 of 14 results for author: Reyes, J