Search | arXiv e-print repository

Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond

Authors: Ziyun Yang, Kevin Choy, Sina Farsiu

Abstract: Generic object detection is a category-independent task that relies on accurate modeling of objectness. Most relevant CNN-based models of objectness utilize loss functions (e.g., binary cross entropy) that focus on the single-response, i.e., the loss response of a single pixel. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions (i.e., hard regions) before… ▽ More Generic object detection is a category-independent task that relies on accurate modeling of objectness. Most relevant CNN-based models of objectness utilize loss functions (e.g., binary cross entropy) that focus on the single-response, i.e., the loss response of a single pixel. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions (i.e., hard regions) before delving into the semantic meaning, we propose a novel loss function, Spatial Coherence Loss (SCLoss), that uses the mutual response between adjacent pixels to suppress or emphasize the single-response of pixels. We demonstrate that the proposed SCLoss can gradually learn the hard regions by detecting and emphasizing their boundaries. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art (SOTA) salient or camouflaged object detection (SOD or COD) models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in the SOTA outcomes for different applications. Finally, as a demonstrative example of the potential uses for other related tasks, we show an application of SCLoss for semantic segmentation. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2311.11638 [pdf, other]

Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

Authors: Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, Sina Farsiu

Abstract: Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To ta… ▽ More Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To tackle these problems, we propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task. Reti-Diff comprises two key components: the Retinex-based latent DM (RLDM) and the Retinex-guided transformer (RGformer). To ensure detailed reconstruction and illumination correction, RLDM is empowered to acquire Retinex knowledge and extract reflectance and illumination priors. These priors are subsequently utilized by RGformer to guide the decomposition of image features into their respective reflectance and illumination components. Following this, RGformer further enhances and consolidates the decomposed features, resulting in the production of refined images with consistent content and robustness to handle complex degradation scenarios. Extensive experiments show that Reti-Diff outperforms existing methods on three IDIR tasks, as well as downstream applications. Code will be available at \url{https://github.com/ChunmingHe/Reti-Diff}. △ Less

Submitted 9 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 20 pages, 11 figures, 11 tables

arXiv:2304.00145 [pdf]

Directional Connectivity-based Segmentation of Medical Images

Authors: Ziyun Yang, Sina Farsiu

Abstract: Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional infor… ▽ More Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at https://github.com/Zyun-Y/DconnNet. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Comments: Accepted by CVPR 2023

arXiv:2209.12468 [pdf]

doi 10.1109/TMI.2022.3228285

RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional Network for Retinal OCT Fluid Segmentation

Authors: Reza Rasti, Armin Biglari, Mohammad Rezapourian, Ziyun Yang, Sina Farsiu

Abstract: Optical coherence tomography (OCT) helps ophthalmologists assess macular edema, accumulation of fluids, and lesions at microscopic resolution. Quantification of retinal fluids is necessary for OCT-guided treatment management, which relies on a precise image segmentation step. As manual analysis of retinal fluids is a time-consuming, subjective, and error-prone task, there is increasing demand for… ▽ More Optical coherence tomography (OCT) helps ophthalmologists assess macular edema, accumulation of fluids, and lesions at microscopic resolution. Quantification of retinal fluids is necessary for OCT-guided treatment management, which relies on a precise image segmentation step. As manual analysis of retinal fluids is a time-consuming, subjective, and error-prone task, there is increasing demand for fast and robust automatic solutions. In this study, a new convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation. The model benefits from hierarchical representation learning of textural, contextual, and edge features using a new self-adaptive dual-attention (SDA) module, multiple self-adaptive attention-based skip connections (SASC), and a novel multi-scale deep self supervision learning (DSL) scheme. The attention mechanism in the proposed SDA module enables the model to automatically extract deformation-aware representations at different levels, and the introduced SASC paths further consider spatial-channel interdependencies for concatenation of counterpart encoder and decoder units, which improve representational capability. RetiFluidNet is also optimized using a joint loss function comprising a weighted version of dice overlap and edge-preserved connectivity-based losses, where several hierarchical stages of multi-scale local losses are integrated into the optimization process. The model is validated based on three publicly available datasets: RETOUCH, OPTIMA, and DUKE, with comparisons against several baselines. Experimental results on the datasets prove the effectiveness of the proposed model in retinal OCT fluid segmentation and reveal that the suggested method is more effective than existing state-of-the-art fluid segmentation algorithms in adapting to retinal OCT scans recorded by various image scanning instruments. △ Less

Submitted 14 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 11 pages, Early Access Version, IEEE Transactions on Medical Imaging

arXiv:2202.11837 [pdf, other]

doi 10.1364/OPTICA.454860

Computational 3D microscopy with optical coherence refraction tomography

Authors: Kevin C. Zhou, Ryan P. McNabb, Ruobing Qian, Simone Degan, Al-Hafeez Dhalla, Sina Farsiu, Joseph A. Izatt

Abstract: Optical coherence tomography (OCT) has seen widespread success as an in vivo clinical diagnostic 3D imaging modality, impacting areas including ophthalmology, cardiology, and gastroenterology. Despite its many advantages, such as high sensitivity, speed, and depth penetration, OCT suffers from several shortcomings that ultimately limit its utility as a 3D microscopy tool, such as its pervasive coh… ▽ More Optical coherence tomography (OCT) has seen widespread success as an in vivo clinical diagnostic 3D imaging modality, impacting areas including ophthalmology, cardiology, and gastroenterology. Despite its many advantages, such as high sensitivity, speed, and depth penetration, OCT suffers from several shortcomings that ultimately limit its utility as a 3D microscopy tool, such as its pervasive coherent speckle noise and poor lateral resolution required to maintain millimeter-scale imaging depths. Here, we present 3D optical coherence refraction tomography (OCRT), a computational extension of OCT which synthesizes an incoherent contrast mechanism by combining multiple OCT volumes, acquired across two rotation axes, to form a resolution-enhanced, speckle-reduced, refraction-corrected 3D reconstruction. Our label-free computational 3D microscope features a novel optical design incorporating a parabolic mirror to enable the capture of 5D plenoptic datasets, consisting of millimetric 3D fields of view over up to $\pm75^\circ$ without moving the sample. We demonstrate that 3D OCRT reveals 3D features unobserved by conventional OCT in fruit fly, zebrafish, and mouse samples. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Journal ref: Optica 9(6), 593-601 (2022)

arXiv:2106.15826 [pdf, other]

doi 10.1364/JOSAA.440592

High-speed multiview imaging approaching 4pi steradians using conic section mirrors: theoretical and practical considerations

Authors: Kevin C. Zhou, Al-Hafeez Dhalla, Ryan P. McNabb, Ruobing Qian, Sina Farsiu, Joseph A. Izatt

Abstract: Illuminating or imaging samples from a broad angular range is essential in a wide variety of computational 3D imaging and resolution-enhancement techniques, such as optical projection tomography (OPT), optical diffraction tomography (ODT), synthetic aperture microscopy, Fourier ptychographic microscopy (FPM), structured illumination microscopy (SIM), photogrammetry, and optical coherence refractio… ▽ More Illuminating or imaging samples from a broad angular range is essential in a wide variety of computational 3D imaging and resolution-enhancement techniques, such as optical projection tomography (OPT), optical diffraction tomography (ODT), synthetic aperture microscopy, Fourier ptychographic microscopy (FPM), structured illumination microscopy (SIM), photogrammetry, and optical coherence refraction tomography (OCRT). The wider the angular coverage, the better the resolution enhancement or 3D resolving capabilities. However, achieving such angular ranges is a practical challenge, especially when approaching plus-or-minus 90 degrees or beyond. Often, researchers resort to expensive, proprietary high numerical aperture (NA) objectives, or to rotating the sample or source-detector pair, which sacrifices temporal resolution or perturbs the sample. Here, we propose several new strategies for multi-angle imaging approaching 4pi steradians using concave parabolic or ellipsoidal mirrors and fast, low rotational inertia scanners, such as galvanometers. We derive theoretically and empirically relations between a variety of system parameters (e.g., NA, wavelength, focal length, telecentricity) and achievable fields of view (FOVs) and importantly show that intrinsic tilt aberrations do not restrict FOV for many multi-view imaging applications, contrary to conventional wisdom. Finally, we present strategies for avoiding spherical aberrations at obliquely illuminated flat boundaries. Our simple designs allow for high-speed multi-angle imaging for microscopic, mesoscopic, and macroscopic applications. △ Less

Submitted 17 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

Journal ref: JOSA A 38(12), 1810-1822 (2021)

arXiv:2103.00334 [pdf, other]

BiconNet: An Edge-preserved Connectivity-based Approach for Salient Object Detection

Authors: Ziyun Yang, Somayyeh Soltanian-Zadeh, Sina Farsiu

Abstract: Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this li… ▽ More Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this limitation, we propose a connectivity-based approach called bilateral connectivity network (BiconNet), which uses connectivity masks together with saliency masks as labels for effective modeling of inter-pixel relationships and object saliency. Moreover, we propose a bilateral voting module to enhance the output connectivity map, and a novel edge feature enhancement method that efficiently utilizes edge-specific features. Through comprehensive experiments on five benchmark datasets, we demonstrate that our proposed method can be plugged into any existing state-of-the-art saliency-based SOD framework to improve its performance with negligible parameter increase. △ Less

Submitted 8 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: This paper is accepted by Pattern Recognition. Please cite as following: "Z. Yang, S. Soltanian-Zadeh, and S. Farsiu, "BiconNet: An Edge-preserved Connectivity-based Approach for Salient Object Detection", Pattern Recognition, (In Press) 2021"

arXiv:2102.09042 [pdf, other]

Modeling Extremes with d-max-decreasing Neural Networks

Authors: Ali Hasan, Khalil Elkhalil, Yuting Ng, Joao M. Pereira, Sina Farsiu, Jose H. Blanchet, Vahid Tarokh

Abstract: We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasin… ▽ More We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasing, a stronger form of convexity, is an essential shape constraint in the characterization of MEVs. As far as we know, our proposed architecture provides the first class of non-parametric estimators for MEVs that preserve these essential shape constraints. We show that our architecture approximates the dependence structure encoded by MEVs at parametric rate. Moreover, we present a new method for sampling high-dimensional MEVs using a generative model. We demonstrate our methodology on a wide range of experimental settings, ranging from environmental sciences to financial mathematics and verify that the structural properties of MEVs are retained compared to existing methods. △ Less

Submitted 1 March, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2012.06044 [pdf, other]

Mesoscopic photogrammetry with an unstabilized phone camera

Authors: Kevin C. Zhou, Colin Cooke, Jaehee Park, Ruobing Qian, Roarke Horstmeyer, Joseph A. Izatt, Sina Farsiu

Abstract: We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coali… ▽ More We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coaligned height map, which acts as a pixel-wise radial deformation field that orthorectifies each camera image to allow homographic registration. The height maps themselves are reparameterized as the output of an untrained encoder-decoder convolutional neural network (CNN) with the raw camera images as the input, which effectively removes many reconstruction artifacts. Our method also jointly estimates both the camera's dynamic 6D pose and its distortion using a nonparametric model, the latter of which is especially important in mesoscopic applications when using cameras not designed for imaging at short working distances, such as smartphone cameras. We also propose strategies for reducing computation time and memory, applicable to other multi-frame registration problems. Finally, we demonstrate our method using sequences of multi-megapixel images captured by an unstabilized smartphone on a variety of samples (e.g., painting brushstrokes, circuit board, seeds). △ Less

Submitted 10 December, 2020; originally announced December 2020.

Journal ref: CVPR 2021

arXiv:2012.04875 [pdf, other]

doi 10.1364/AOP.417102

Unified k-space theory of optical coherence tomography

Authors: Kevin C. Zhou, Ruobing Qian, Al-Hafeez Dhalla, Sina Farsiu, Joseph A. Izatt

Abstract: We present a general theory of optical coherence tomography (OCT), which synthesizes the fundamental concepts and implementations of OCT under a common 3D k-space framework. At the heart of this analysis is the Fourier diffraction theorem, which relates the coherent interaction between a sample and plane wave to the Ewald sphere in the 3D k-space representation of the sample. While only the axial… ▽ More We present a general theory of optical coherence tomography (OCT), which synthesizes the fundamental concepts and implementations of OCT under a common 3D k-space framework. At the heart of this analysis is the Fourier diffraction theorem, which relates the coherent interaction between a sample and plane wave to the Ewald sphere in the 3D k-space representation of the sample. While only the axial dimension of OCT is typically analyzed in k-space, we show that embracing a fully 3D k-space formalism allows explanation of nearly every fundamental physical phenomenon or property of OCT, including contrast mechanism, resolution, dispersion, aberration, limited depth of focus, and speckle. The theory also unifies diffraction tomography, confocal microscopy, point-scanning OCT, line-field OCT, full-field OCT, Bessel-beam OCT, transillumination OCT, interferometric synthetic aperture microscopy (ISAM), and optical coherence refraction tomography (OCRT), among others. Our unified theory not only enables clear understanding of existing techniques, but also suggests new research directions to continue advancing the field of OCT. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Journal ref: Advances in Optics and Photonics 13, 462-514 (2021)

arXiv:2007.06120 [pdf, other]

Fisher Auto-Encoders

Authors: Khalil Elkhalil, Ali Hasan, Jie Ding, Sina Farsiu, Vahid Tarokh

Abstract: It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and la… ▽ More It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs. △ Less

Submitted 23 October, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

arXiv:2007.06075 [pdf, other]

Identifying Latent Stochastic Differential Equations

Authors: Ali Hasan, João M. Pereira, Sina Farsiu, Vahid Tarokh

Abstract: We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the map** from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of varia… ▽ More We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the map** from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of variational autoencoders, we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of latent variable models to show that the proposed model can recover not only the underlying SDE coefficients, but also the original latent variables, up to an isometry, in the limit of infinite data. We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets. △ Less

Submitted 26 November, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

Comments: 20 pages, 8 figures, to be published in IEEE Transactions of Signal Processing

arXiv:1910.10262 [pdf, other]

Learning Partial Differential Equations from Data Using Neural Networks

Authors: Ali Hasan, João M. Pereira, Robert Ravier, Sina Farsiu, Vahid Tarokh

Abstract: We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictio… ▽ More We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictionary functions, and generalizes previous methods that only consider parabolic PDEs. We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. We validate the model on simulated data generated by the known PDEs and added Gaussian noise, and we study our method under different levels of noise. We also compare the error of our method with a Cramer-Rao lower bound for an ordinary differential equation. Our results indicate that our method outperforms other methods in estimating PDEs, especially in the low signal-to-noise regime. △ Less

Submitted 22 October, 2019; originally announced October 2019.

arXiv:1908.05782 [pdf, other]

doi 10.1109/TMI.2020.2970867

MimickNet, Matching Clinical Post-Processing Under Realistic Black-Box Constraints

Authors: Ouwen Huang, Will Long, Nick Bottenus, Gregg E. Trahey, Sina Farsiu, Mark L. Palmeri

Abstract: Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms… ▽ More Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms raw conventional delay-and-summed (DAS) beams into the approximate post-processed images found on clinical-grade scanners. Training MimickNet only requires post-processed image samples from a scanner of interest without the need for explicit pairing to raw DAS data. This flexibility allows it to hypothetically approximate any manufacturer's post-processing without access to the pre-processed data. MimickNet generates images with an average similarity index measurement (SSIM) of 0.930$\pm$0.0892 on a 300 cineloop test set, and it generalizes to cardiac cineloops outside of our train-test distribution achieving an SSIM of 0.967$\pm$0.002. We also explore the theoretical SSIM achievable by evaluating MimickNet performance when trained under gray-box constraints (i.e., when both pre-processed and post-processed images are available). To our knowledge, this is the first work to establish deep learning models that closely approximate current clinical-grade ultrasound post-processing under realistic black-box constraints where before and after post-processing data is unavailable. MimickNet serves as a clinical post-processing baseline for future works in ultrasound image formation to compare against. To this end, we have made the MimickNet software open source. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: This work has been submitted to the IEEE Transactions on Medical Imaging on July 1st, 2019 for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Showing 1–14 of 14 results for author: Farsiu, S