-
Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond
Authors:
Ziyun Yang,
Kevin Choy,
Sina Farsiu
Abstract:
Generic object detection is a category-independent task that relies on accurate modeling of objectness. Most relevant CNN-based models of objectness utilize loss functions (e.g., binary cross entropy) that focus on the single-response, i.e., the loss response of a single pixel. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions (i.e., hard regions) before…
▽ More
Generic object detection is a category-independent task that relies on accurate modeling of objectness. Most relevant CNN-based models of objectness utilize loss functions (e.g., binary cross entropy) that focus on the single-response, i.e., the loss response of a single pixel. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions (i.e., hard regions) before delving into the semantic meaning, we propose a novel loss function, Spatial Coherence Loss (SCLoss), that uses the mutual response between adjacent pixels to suppress or emphasize the single-response of pixels. We demonstrate that the proposed SCLoss can gradually learn the hard regions by detecting and emphasizing their boundaries. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art (SOTA) salient or camouflaged object detection (SOD or COD) models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in the SOTA outcomes for different applications. Finally, as a demonstrative example of the potential uses for other related tasks, we show an application of SCLoss for semantic segmentation.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
Authors:
Chunming He,
Chengyu Fang,
Yulun Zhang,
Tian Ye,
Kai Li,
Longxiang Tang,
Zhenhua Guo,
Xiu Li,
Sina Farsiu
Abstract:
Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To ta…
▽ More
Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To tackle these problems, we propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task. Reti-Diff comprises two key components: the Retinex-based latent DM (RLDM) and the Retinex-guided transformer (RGformer). To ensure detailed reconstruction and illumination correction, RLDM is empowered to acquire Retinex knowledge and extract reflectance and illumination priors. These priors are subsequently utilized by RGformer to guide the decomposition of image features into their respective reflectance and illumination components. Following this, RGformer further enhances and consolidates the decomposed features, resulting in the production of refined images with consistent content and robustness to handle complex degradation scenarios. Extensive experiments show that Reti-Diff outperforms existing methods on three IDIR tasks, as well as downstream applications. Code will be available at \url{https://github.com/ChunmingHe/Reti-Diff}.
△ Less
Submitted 9 March, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Directional Connectivity-based Segmentation of Medical Images
Authors:
Ziyun Yang,
Sina Farsiu
Abstract:
Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional infor…
▽ More
Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at https://github.com/Zyun-Y/DconnNet.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional Network for Retinal OCT Fluid Segmentation
Authors:
Reza Rasti,
Armin Biglari,
Mohammad Rezapourian,
Ziyun Yang,
Sina Farsiu
Abstract:
Optical coherence tomography (OCT) helps ophthalmologists assess macular edema, accumulation of fluids, and lesions at microscopic resolution. Quantification of retinal fluids is necessary for OCT-guided treatment management, which relies on a precise image segmentation step. As manual analysis of retinal fluids is a time-consuming, subjective, and error-prone task, there is increasing demand for…
▽ More
Optical coherence tomography (OCT) helps ophthalmologists assess macular edema, accumulation of fluids, and lesions at microscopic resolution. Quantification of retinal fluids is necessary for OCT-guided treatment management, which relies on a precise image segmentation step. As manual analysis of retinal fluids is a time-consuming, subjective, and error-prone task, there is increasing demand for fast and robust automatic solutions. In this study, a new convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation. The model benefits from hierarchical representation learning of textural, contextual, and edge features using a new self-adaptive dual-attention (SDA) module, multiple self-adaptive attention-based skip connections (SASC), and a novel multi-scale deep self supervision learning (DSL) scheme. The attention mechanism in the proposed SDA module enables the model to automatically extract deformation-aware representations at different levels, and the introduced SASC paths further consider spatial-channel interdependencies for concatenation of counterpart encoder and decoder units, which improve representational capability. RetiFluidNet is also optimized using a joint loss function comprising a weighted version of dice overlap and edge-preserved connectivity-based losses, where several hierarchical stages of multi-scale local losses are integrated into the optimization process. The model is validated based on three publicly available datasets: RETOUCH, OPTIMA, and DUKE, with comparisons against several baselines. Experimental results on the datasets prove the effectiveness of the proposed model in retinal OCT fluid segmentation and reveal that the suggested method is more effective than existing state-of-the-art fluid segmentation algorithms in adapting to retinal OCT scans recorded by various image scanning instruments.
△ Less
Submitted 14 December, 2022; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Computational 3D microscopy with optical coherence refraction tomography
Authors:
Kevin C. Zhou,
Ryan P. McNabb,
Ruobing Qian,
Simone Degan,
Al-Hafeez Dhalla,
Sina Farsiu,
Joseph A. Izatt
Abstract:
Optical coherence tomography (OCT) has seen widespread success as an in vivo clinical diagnostic 3D imaging modality, impacting areas including ophthalmology, cardiology, and gastroenterology. Despite its many advantages, such as high sensitivity, speed, and depth penetration, OCT suffers from several shortcomings that ultimately limit its utility as a 3D microscopy tool, such as its pervasive coh…
▽ More
Optical coherence tomography (OCT) has seen widespread success as an in vivo clinical diagnostic 3D imaging modality, impacting areas including ophthalmology, cardiology, and gastroenterology. Despite its many advantages, such as high sensitivity, speed, and depth penetration, OCT suffers from several shortcomings that ultimately limit its utility as a 3D microscopy tool, such as its pervasive coherent speckle noise and poor lateral resolution required to maintain millimeter-scale imaging depths. Here, we present 3D optical coherence refraction tomography (OCRT), a computational extension of OCT which synthesizes an incoherent contrast mechanism by combining multiple OCT volumes, acquired across two rotation axes, to form a resolution-enhanced, speckle-reduced, refraction-corrected 3D reconstruction. Our label-free computational 3D microscope features a novel optical design incorporating a parabolic mirror to enable the capture of 5D plenoptic datasets, consisting of millimetric 3D fields of view over up to $\pm75^\circ$ without moving the sample. We demonstrate that 3D OCRT reveals 3D features unobserved by conventional OCT in fruit fly, zebrafish, and mouse samples.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
High-speed multiview imaging approaching 4pi steradians using conic section mirrors: theoretical and practical considerations
Authors:
Kevin C. Zhou,
Al-Hafeez Dhalla,
Ryan P. McNabb,
Ruobing Qian,
Sina Farsiu,
Joseph A. Izatt
Abstract:
Illuminating or imaging samples from a broad angular range is essential in a wide variety of computational 3D imaging and resolution-enhancement techniques, such as optical projection tomography (OPT), optical diffraction tomography (ODT), synthetic aperture microscopy, Fourier ptychographic microscopy (FPM), structured illumination microscopy (SIM), photogrammetry, and optical coherence refractio…
▽ More
Illuminating or imaging samples from a broad angular range is essential in a wide variety of computational 3D imaging and resolution-enhancement techniques, such as optical projection tomography (OPT), optical diffraction tomography (ODT), synthetic aperture microscopy, Fourier ptychographic microscopy (FPM), structured illumination microscopy (SIM), photogrammetry, and optical coherence refraction tomography (OCRT). The wider the angular coverage, the better the resolution enhancement or 3D resolving capabilities. However, achieving such angular ranges is a practical challenge, especially when approaching plus-or-minus 90 degrees or beyond. Often, researchers resort to expensive, proprietary high numerical aperture (NA) objectives, or to rotating the sample or source-detector pair, which sacrifices temporal resolution or perturbs the sample. Here, we propose several new strategies for multi-angle imaging approaching 4pi steradians using concave parabolic or ellipsoidal mirrors and fast, low rotational inertia scanners, such as galvanometers. We derive theoretically and empirically relations between a variety of system parameters (e.g., NA, wavelength, focal length, telecentricity) and achievable fields of view (FOVs) and importantly show that intrinsic tilt aberrations do not restrict FOV for many multi-view imaging applications, contrary to conventional wisdom. Finally, we present strategies for avoiding spherical aberrations at obliquely illuminated flat boundaries. Our simple designs allow for high-speed multi-angle imaging for microscopic, mesoscopic, and macroscopic applications.
△ Less
Submitted 17 November, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.
-
BiconNet: An Edge-preserved Connectivity-based Approach for Salient Object Detection
Authors:
Ziyun Yang,
Somayyeh Soltanian-Zadeh,
Sina Farsiu
Abstract:
Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this li…
▽ More
Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this limitation, we propose a connectivity-based approach called bilateral connectivity network (BiconNet), which uses connectivity masks together with saliency masks as labels for effective modeling of inter-pixel relationships and object saliency. Moreover, we propose a bilateral voting module to enhance the output connectivity map, and a novel edge feature enhancement method that efficiently utilizes edge-specific features. Through comprehensive experiments on five benchmark datasets, we demonstrate that our proposed method can be plugged into any existing state-of-the-art saliency-based SOD framework to improve its performance with negligible parameter increase.
△ Less
Submitted 8 August, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Modeling Extremes with d-max-decreasing Neural Networks
Authors:
Ali Hasan,
Khalil Elkhalil,
Yuting Ng,
Joao M. Pereira,
Sina Farsiu,
Jose H. Blanchet,
Vahid Tarokh
Abstract:
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasin…
▽ More
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasing, a stronger form of convexity, is an essential shape constraint in the characterization of MEVs. As far as we know, our proposed architecture provides the first class of non-parametric estimators for MEVs that preserve these essential shape constraints. We show that our architecture approximates the dependence structure encoded by MEVs at parametric rate. Moreover, we present a new method for sampling high-dimensional MEVs using a generative model. We demonstrate our methodology on a wide range of experimental settings, ranging from environmental sciences to financial mathematics and verify that the structural properties of MEVs are retained compared to existing methods.
△ Less
Submitted 1 March, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Mesoscopic photogrammetry with an unstabilized phone camera
Authors:
Kevin C. Zhou,
Colin Cooke,
Jaehee Park,
Ruobing Qian,
Roarke Horstmeyer,
Joseph A. Izatt,
Sina Farsiu
Abstract:
We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coali…
▽ More
We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coaligned height map, which acts as a pixel-wise radial deformation field that orthorectifies each camera image to allow homographic registration. The height maps themselves are reparameterized as the output of an untrained encoder-decoder convolutional neural network (CNN) with the raw camera images as the input, which effectively removes many reconstruction artifacts. Our method also jointly estimates both the camera's dynamic 6D pose and its distortion using a nonparametric model, the latter of which is especially important in mesoscopic applications when using cameras not designed for imaging at short working distances, such as smartphone cameras. We also propose strategies for reducing computation time and memory, applicable to other multi-frame registration problems. Finally, we demonstrate our method using sequences of multi-megapixel images captured by an unstabilized smartphone on a variety of samples (e.g., painting brushstrokes, circuit board, seeds).
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Unified k-space theory of optical coherence tomography
Authors:
Kevin C. Zhou,
Ruobing Qian,
Al-Hafeez Dhalla,
Sina Farsiu,
Joseph A. Izatt
Abstract:
We present a general theory of optical coherence tomography (OCT), which synthesizes the fundamental concepts and implementations of OCT under a common 3D k-space framework. At the heart of this analysis is the Fourier diffraction theorem, which relates the coherent interaction between a sample and plane wave to the Ewald sphere in the 3D k-space representation of the sample. While only the axial…
▽ More
We present a general theory of optical coherence tomography (OCT), which synthesizes the fundamental concepts and implementations of OCT under a common 3D k-space framework. At the heart of this analysis is the Fourier diffraction theorem, which relates the coherent interaction between a sample and plane wave to the Ewald sphere in the 3D k-space representation of the sample. While only the axial dimension of OCT is typically analyzed in k-space, we show that embracing a fully 3D k-space formalism allows explanation of nearly every fundamental physical phenomenon or property of OCT, including contrast mechanism, resolution, dispersion, aberration, limited depth of focus, and speckle. The theory also unifies diffraction tomography, confocal microscopy, point-scanning OCT, line-field OCT, full-field OCT, Bessel-beam OCT, transillumination OCT, interferometric synthetic aperture microscopy (ISAM), and optical coherence refraction tomography (OCRT), among others. Our unified theory not only enables clear understanding of existing techniques, but also suggests new research directions to continue advancing the field of OCT.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Fisher Auto-Encoders
Authors:
Khalil Elkhalil,
Ali Hasan,
Jie Ding,
Sina Farsiu,
Vahid Tarokh
Abstract:
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and la…
▽ More
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.
△ Less
Submitted 23 October, 2020; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Identifying Latent Stochastic Differential Equations
Authors:
Ali Hasan,
João M. Pereira,
Sina Farsiu,
Vahid Tarokh
Abstract:
We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the map** from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of varia…
▽ More
We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the map** from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of variational autoencoders, we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of latent variable models to show that the proposed model can recover not only the underlying SDE coefficients, but also the original latent variables, up to an isometry, in the limit of infinite data. We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets.
△ Less
Submitted 26 November, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Learning Partial Differential Equations from Data Using Neural Networks
Authors:
Ali Hasan,
João M. Pereira,
Robert Ravier,
Sina Farsiu,
Vahid Tarokh
Abstract:
We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictio…
▽ More
We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictionary functions, and generalizes previous methods that only consider parabolic PDEs. We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. We validate the model on simulated data generated by the known PDEs and added Gaussian noise, and we study our method under different levels of noise. We also compare the error of our method with a Cramer-Rao lower bound for an ordinary differential equation. Our results indicate that our method outperforms other methods in estimating PDEs, especially in the low signal-to-noise regime.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
MimickNet, Matching Clinical Post-Processing Under Realistic Black-Box Constraints
Authors:
Ouwen Huang,
Will Long,
Nick Bottenus,
Gregg E. Trahey,
Sina Farsiu,
Mark L. Palmeri
Abstract:
Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms…
▽ More
Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms raw conventional delay-and-summed (DAS) beams into the approximate post-processed images found on clinical-grade scanners. Training MimickNet only requires post-processed image samples from a scanner of interest without the need for explicit pairing to raw DAS data. This flexibility allows it to hypothetically approximate any manufacturer's post-processing without access to the pre-processed data. MimickNet generates images with an average similarity index measurement (SSIM) of 0.930$\pm$0.0892 on a 300 cineloop test set, and it generalizes to cardiac cineloops outside of our train-test distribution achieving an SSIM of 0.967$\pm$0.002. We also explore the theoretical SSIM achievable by evaluating MimickNet performance when trained under gray-box constraints (i.e., when both pre-processed and post-processed images are available). To our knowledge, this is the first work to establish deep learning models that closely approximate current clinical-grade ultrasound post-processing under realistic black-box constraints where before and after post-processing data is unavailable. MimickNet serves as a clinical post-processing baseline for future works in ultrasound image formation to compare against. To this end, we have made the MimickNet software open source.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.