-
Parametric Modeling and Estimation of Photon Registrations for 3D Imaging
Authors:
Weijian Zhang,
Hashan K. Weerasooriya,
Prateek Chennuri,
Stanley H. Chan
Abstract:
In single-photon light detection and ranging (SP-LiDAR) systems, the histogram distortion due to hardware dead time fundamentally limits the precision of depth estimation. To compensate for the dead time effects, the photon registration distribution is typically modeled based on the Markov chain self-excitation process. However, this is a discrete process and it is computationally expensive, thus…
▽ More
In single-photon light detection and ranging (SP-LiDAR) systems, the histogram distortion due to hardware dead time fundamentally limits the precision of depth estimation. To compensate for the dead time effects, the photon registration distribution is typically modeled based on the Markov chain self-excitation process. However, this is a discrete process and it is computationally expensive, thus hindering potential neural network applications and fast simulations. In this paper, we overcome the modeling challenge by proposing a continuous parametric model. We introduce a Gaussian-uniform mixture model (GUMM) and periodic padding to address high noise floors and noise slopes respectively. By deriving and implementing a customized expectation maximization (EM) algorithm, we achieve accurate histogram matching in scenarios that were deemed difficult in the literature.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Improving Rehabilitative Assessment with Statistical and Shape Preserving Surrogate Data and Singular Spectrum Analysis
Authors:
T. K. M. Lee,
H. W. Chan,
K. H. Leo,
E. Chew,
Ling Zhao,
S. Sanei
Abstract:
Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable…
▽ More
Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable spread of features yet retain the distinctive features of the original data. These would be its basic statistical properties and overall shape which are important for short time series such as in rehabilitative applications or in quickly changing portions of lengthy data. In our earlier work synthesized surrogate time series were used to augment rehabilitative data. This gave good results in classification but the resulting waveforms did not preserve the original signal shape. To remedy this, we use singular spectrum analysis (SSA) to separate a signal into trends and cycles to describe the shape of the signal and low level components. In a novel way we subject the low level component to randomizing processes then recombine this with the original trend and cycle components to form a synthetic time series. We compare our approach with other methods, using statistical and shape measures and demonstrate its effectiveness in classification.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Frequency stabilization of self-sustained oscillations in a sideband-driven electromechanical resonator
Authors:
B. Zhang,
Yingming Yan,
X. Dong,
M. I. Dykman,
H. B. Chan
Abstract:
We present a method to stabilize the frequency of self-sustained vibrations in micro- and nanomechanical resonators. The method refers to a two-mode system with the vibrations at significantly different frequencies. The signal from one mode is used to control the other mode. In the experiment, self-sustained oscillations of micromechanical modes are excited by pum** at the blue-detuned sideband…
▽ More
We present a method to stabilize the frequency of self-sustained vibrations in micro- and nanomechanical resonators. The method refers to a two-mode system with the vibrations at significantly different frequencies. The signal from one mode is used to control the other mode. In the experiment, self-sustained oscillations of micromechanical modes are excited by pum** at the blue-detuned sideband of the higher-frequency mode. Phase fluctuations of the two modes show near perfect anti-correlation. They can be compensated in either one of the modes by a stepwise change of the pump phase. The phase change of the controlled mode is proportional to the pump phase change, with the proportionality constant independent of the pump amplitude and frequency. This finding allows us to stabilize the phase of one mode against phase diffusion using the measured phase of the other mode. We demonstrate that phase fluctuations of either the high or low frequency mode can be significantly reduced. The results open new opportunities in generating stable vibrations in a broad frequency range via parametric downconversion in nonlinear resonators.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review
Authors:
Yi Hao Chan,
Deepank Girish,
Sukrit Gupta,
**g Xia,
Chockalingam Kasi,
Yinan He,
Conghao Wang,
Jagath C. Rajapakse
Abstract:
Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and…
▽ More
Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and model explainability techniques have been applied on fMRI datasets for disorder prediction tasks, with a particular emphasis on the robustness of biomarkers produced for neurodegenerative diseases and neuropsychiatric disorders. We found that while most studies have performant models, salient features highlighted in these studies vary greatly across studies on the same disorder and little has been done to evaluate their robustness. To address these issues, we suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of these potential biomarkers. We further highlight gaps in the existing literature and put together a prediction-attribution-evaluation framework that could set the foundations for future research on improving the robustness of potential biomarkers discovered via GNNs.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Fidelitous Augmentation of Human Accelerometric Data for Deep Learning
Authors:
Tracey K. M. Lee,
H. W. Chan,
K. H. Leo,
Effie Chew,
L. Zhao,
Saeid Sanei
Abstract:
Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both tra…
▽ More
Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both training and testing purposes. However the time domain features of the data may not be maintained. This motivates for our work, the objective which is to preserve the following features in a synthesized TS: its fundamental statistical characteristics and important time domain features like its general shape and prominent transients. In a novel way, we first isolate important TS features into various components using a spectrogram and singular spectrum analysis. The residual signal is then randomized in a way that preserves its statistical properties. These components are then recombined for the synthetic time series. Using accelerometer data in a clinical setting, we use statistical and shape measures to compare our method to others. We show it has higher fidelity to the original signal features, has good diversity and performs better data classification in a deep learning application.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Resolution Limit of Single-Photon LiDAR
Authors:
Stanley H. Chan,
Hashan K. Weerasooriya,
Weijian Zhang,
Pamela Abshire,
Istvan Gyongy,
Robert K. Henderson
Abstract:
Single-photon Light Detection and Ranging (LiDAR) systems are often equipped with an array of detectors for improved spatial resolution and sensing speed. However, given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spat…
▽ More
Single-photon Light Detection and Ranging (LiDAR) systems are often equipped with an array of detectors for improved spatial resolution and sensing speed. However, given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spatial resolution of the sensor array and the SNR received at each pixel. Theoretical characterization of this fundamental limit is explored. By deriving the photon arrival statistics and introducing a series of new approximation techniques, the Mean Squared Error (MSE) of the maximum-likelihood estimator of the time delay is derived. The theoretical predictions align well with simulations and real data.
△ Less
Submitted 30 March, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Authors:
Xingguang Zhang,
Nicholas Chimitt,
Yiheng Chi,
Zhiyuan Mao,
Stanley H. Chan
Abstract:
Recovering images distorted by atmospheric turbulence is a challenging inverse problem due to the stochastic nature of turbulence. Although numerous turbulence mitigation (TM) algorithms have been proposed, their efficiency and generalization to real-world dynamic scenarios remain severely limited. Building upon the intuitions of classical TM algorithms, we present the Deep Atmospheric TUrbulence…
▽ More
Recovering images distorted by atmospheric turbulence is a challenging inverse problem due to the stochastic nature of turbulence. Although numerous turbulence mitigation (TM) algorithms have been proposed, their efficiency and generalization to real-world dynamic scenarios remain severely limited. Building upon the intuitions of classical TM algorithms, we present the Deep Atmospheric TUrbulence Mitigation network (DATUM). DATUM aims to overcome major challenges when transitioning from classical to deep learning approaches. By carefully integrating the merits of classical multi-frame TM methods into a deep network structure, we demonstrate that DATUM can efficiently perform long-range temporal aggregation using a recurrent fashion, while deformable attention and temporal-channel attention seamlessly facilitate pixel registration and lucky imaging. With additional supervision, tilt and blur degradation can be jointly mitigated. These inductive biases empower DATUM to significantly outperform existing methods while delivering a tenfold increase in processing speed. A large-scale training dataset, ATSyn, is presented as a co-invention to enable generalization in real turbulence. Our code and datasets are available at https://xg416.github.io/DATUM.
△ Less
Submitted 7 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Kernel Diffusion: An Alternate Approach to Blind Deconvolution
Authors:
Yash Sanghvi,
Yiheng Chi,
Stanley H. Chan
Abstract:
Blind deconvolution problems are severely ill-posed because neither the underlying signal nor the forward operator are not known exactly. Conventionally, these problems are solved by alternating between estimation of the image and kernel while kee** the other fixed. In this paper, we show that this framework is flawed because of its tendency to get trapped in local minima and, instead, suggest t…
▽ More
Blind deconvolution problems are severely ill-posed because neither the underlying signal nor the forward operator are not known exactly. Conventionally, these problems are solved by alternating between estimation of the image and kernel while kee** the other fixed. In this paper, we show that this framework is flawed because of its tendency to get trapped in local minima and, instead, suggest the use of a kernel estimation strategy with a non-blind solver. This framework is employed by a diffusion method which is trained to sample the blur kernel from the conditional distribution with guidance from a pre-trained non-blind solver. The proposed diffusion method leads to state-of-the-art results on both synthetic and real blur datasets.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Single-Shot Plug-and-Play Methods for Inverse Problems
Authors:
Yanqi Cheng,
Lipei Zhang,
Zhenda Shen,
Shujun Wang,
Lequan Yu,
Raymond H. Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on p…
▽ More
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations
Authors:
Zhenda Shen,
Yanqi Cheng,
Raymond H. Chan,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secon…
▽ More
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
The Secrets of Non-Blind Poisson Deconvolution
Authors:
Abhiram Gnanasambandam,
Yash Sanghvi,
Stanley H. Chan
Abstract:
Non-blind image deconvolution has been studied for several decades but most of the existing work focuses on blur instead of noise. In photon-limited conditions, however, the excessive amount of shot noise makes traditional deconvolution algorithms fail. In searching for reasons why these methods fail, we present a systematic analysis of the Poisson non-blind deconvolution algorithms reported in th…
▽ More
Non-blind image deconvolution has been studied for several decades but most of the existing work focuses on blur instead of noise. In photon-limited conditions, however, the excessive amount of shot noise makes traditional deconvolution algorithms fail. In searching for reasons why these methods fail, we present a systematic analysis of the Poisson non-blind deconvolution algorithms reported in the literature, covering both classical and deep learning methods. We compile a list of five "secrets" highlighting the do's and don'ts when designing algorithms. Based on this analysis, we build a proof-of-concept method by combining the five secrets. We find that the new method performs on par with some of the latest methods while outperforming some older ones.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer's Disease from sMRI and PET Scans
Authors:
Yanteng Zhanga,
Xiaohai He,
Yi Hao Chan,
Qizhi Teng,
Jagath C. Rajapakse
Abstract:
In recent years, deep learning models have been applied to neuroimaging data for early diagnosis of Alzheimer's disease (AD). Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET) images provide structural and functional information about the brain, respectively. Combining these features leads to improved performance than using a single modality alone in building pred…
▽ More
In recent years, deep learning models have been applied to neuroimaging data for early diagnosis of Alzheimer's disease (AD). Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET) images provide structural and functional information about the brain, respectively. Combining these features leads to improved performance than using a single modality alone in building predictive models for AD diagnosis. However, current multi-modal approaches in deep learning, based on sMRI and PET, are mostly limited to convolutional neural networks, which do not facilitate integration of both image and phenotypic information of subjects. We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains. In this study, we demonstrate how brain networks can be created from sMRI or PET images and be used in a population graph framework that can combine phenotypic information with imaging features of these brain networks. Then, we present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data at both the level of node vectors and adjacency matrices. Finally, we perform late fusion to combine the preliminary decisions made in each branch and produce a final prediction. As multi-modality data becomes available, multi-source and multi-modal is the trend of AD diagnosis. We conducted explorative experiments based on multi-modal imaging data combined with non-imaging phenotypic information for AD diagnosis and analyzed the impact of phenotypic information on diagnostic performance. Results from experiments demonstrated that our proposed multi-modal approach improves performance for AD diagnosis, and this study also provides technical reference and support the need for multivariate multi-modal diagnosis methods.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Computational Image Formation: Simulators in the Deep Learning Era
Authors:
Stanley H. Chan
Abstract:
At the pinnacle of computational imaging is the co-optimization of camera and algorithm. This, however, is not the only form of computational imaging. In problems such as imaging through adverse weather, the bigger challenge is how to accurately simulate the forward degradation process so that we can synthesize data to train reconstruction models and/or integrating the forward model as part of the…
▽ More
At the pinnacle of computational imaging is the co-optimization of camera and algorithm. This, however, is not the only form of computational imaging. In problems such as imaging through adverse weather, the bigger challenge is how to accurately simulate the forward degradation process so that we can synthesize data to train reconstruction models and/or integrating the forward model as part of the reconstruction algorithm. This article introduces the concept of computational image formation (CIF). Compared to the standard inverse problems where the goal is to recover the latent image $x$ from the observation $y = G(x)$, CIF shifts the focus to designing an approximate map** $H$ such that $H \approx G$ while giving a good image reconstruction result. The word "computational" highlights the fact that the image formation is now replaced by a numerical simulator. While matching the mother nature remains an important goal, CIF pays even greater attention on strategically choosing an $H$ so that the reconstruction performance is maximized.
The goal of this article is to conceptualize the idea of CIF by elaborating on its meaning and implications. The first part of the article is a discussion on the four attributes of a CIF simulator: accurate enough to mimic $G$, fast enough to be integrated as part of the reconstruction, provides a well-posed inverse problem when plugged into the reconstruction, and differentiable to allow backpropagation. The second part of the article is a detailed case study based on imaging through atmospheric turbulence. A plethora of simulators, old and new ones, are discussed. The third part of the article is a collection of other examples that fall into the category of CIF, including imaging through bad weather, dynamic vision sensors, and differentiable optics. Finally, thoughts about the future direction and recommendations to the community are shared.
△ Less
Submitted 26 October, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Physics-Driven Turbulence Image Restoration with Stochastic Refinement
Authors:
Ajay Jaiswal,
Xingguang Zhang,
Stanley H. Chan,
Zhangyang Wang
Abstract:
Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning model…
▽ More
Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Spatially Varying Exposure with 2-by-2 Multiplexing: Optimality and Universality
Authors:
Xiangyu Qu,
Yiheng Chi,
Stanley H. Chan
Abstract:
The advancement of new digital image sensors has enabled the design of exposure multiplexing schemes where a single image capture can have multiple exposures and conversion gains in an interlaced format, similar to that of a Bayer color filter array. In this paper, we ask the question of how to design such multiplexing schemes for adaptive high-dynamic range (HDR) imaging where the multiplexing sc…
▽ More
The advancement of new digital image sensors has enabled the design of exposure multiplexing schemes where a single image capture can have multiple exposures and conversion gains in an interlaced format, similar to that of a Bayer color filter array. In this paper, we ask the question of how to design such multiplexing schemes for adaptive high-dynamic range (HDR) imaging where the multiplexing scheme can be updated according to the scenes. We present two new findings.
(i) We address the problem of design optimality. We show that given a multiplex pattern, the conventional optimality criteria based on the input/output-referred signal-to-noise ratio (SNR) of the independently measured pixels can lead to flawed decisions because it cannot encapsulate the location of the saturated pixels. We overcome the issue by proposing a new concept known as the spatially varying exposure risk (SVE-Risk) which is a pseudo-idealistic quantification of the amount of recoverable pixels. We present an efficient enumeration algorithm to select the optimal multiplex patterns.
(ii) We report a design universality observation that the design of the multiplex pattern can be decoupled from the image reconstruction algorithm. This is a significant departure from the recent literature that the multiplex pattern should be jointly optimized with the reconstruction algorithm. Our finding suggests that in the context of exposure multiplexing, an end-to-end training may not be necessary.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
On Propagation Characteristics of Reconfigurable Surface Wave Platform: Simulation and Experimental Verification
Authors:
Z. Chu,
K. F. Tong,
K. K. Wong,
C. B. Chae,
C. H. Chan
Abstract:
Reconfigurable intelligent surface (RIS) as a smart reflector is revolutionizing research for next-generation wireless communications. Complementing this is a concept of using RIS as an efficient propagation medium for potentially superior path loss characteristics. Motivated by a recent porous surface architecture that facilitates reconfigurable pathways with cavities filled with fluid metal, thi…
▽ More
Reconfigurable intelligent surface (RIS) as a smart reflector is revolutionizing research for next-generation wireless communications. Complementing this is a concept of using RIS as an efficient propagation medium for potentially superior path loss characteristics. Motivated by a recent porous surface architecture that facilitates reconfigurable pathways with cavities filled with fluid metal, this paper studies the propagation characteristics of different pathway configurations in different lossy materials on the reconfigurable surface wave platform by using a commercial full electromagnetic simulation software and S-parameters experiments. This paper also looks into the best scheme to switch between a straight pathway and a $90^\circ$-bend and attempts to quantify the additional path loss when making a turn. Our experimental results verify the simulation results, showing the effectiveness of the proposed reconfigurable surface wave platform for a wide-band, low path loss and highly programmable communications.
△ Less
Submitted 2 August, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
HDR Imaging with Spatially Varying Signal-to-Noise Ratios
Authors:
Yiheng Chi,
Xingguang Zhang,
Stanley H. Chan
Abstract:
While today's high dynamic range (HDR) image fusion algorithms are capable of blending multiple exposures, the acquisition is often controlled so that the dynamic range within one exposure is narrow. For HDR imaging in photon-limited situations, the dynamic range can be enormous and the noise within one exposure is spatially varying. Existing image denoising algorithms and HDR fusion algorithms bo…
▽ More
While today's high dynamic range (HDR) image fusion algorithms are capable of blending multiple exposures, the acquisition is often controlled so that the dynamic range within one exposure is narrow. For HDR imaging in photon-limited situations, the dynamic range can be enormous and the noise within one exposure is spatially varying. Existing image denoising algorithms and HDR fusion algorithms both fail to handle this situation, leading to severe limitations in low-light HDR imaging. This paper presents two contributions. Firstly, we identify the source of the problem. We find that the issue is associated with the co-existence of (1) spatially varying signal-to-noise ratio, especially the excessive noise due to very dark regions, and (2) a wide luminance range within each exposure. We show that while the issue can be handled by a bank of denoisers, the complexity is high. Secondly, we propose a new method called the spatially varying high dynamic range (SV-HDR) fusion network to simultaneously denoise and fuse images. We introduce a new exposure-shared block within our custom-designed multi-scale transformer framework. In a variety of testing conditions, the performance of the proposed SV-HDR is better than the existing methods.
△ Less
Submitted 15 April, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Scattering and Gathering for Spatially Varying Blurs
Authors:
Nicholas Chimitt,
Xingguang Zhang,
Yiheng Chi,
Stanley H. Chan
Abstract:
A spatially varying blur kernel $h(\mathbf{x},\mathbf{u})$ is specified by an input coordinate $\mathbf{u} \in \mathbb{R}^2$ and an output coordinate $\mathbf{x} \in \mathbb{R}^2$. For computational efficiency, we sometimes write $h(\mathbf{x},\mathbf{u})$ as a linear combination of spatially invariant basis functions. The associated pixelwise coefficients, however, can be indexed by either the in…
▽ More
A spatially varying blur kernel $h(\mathbf{x},\mathbf{u})$ is specified by an input coordinate $\mathbf{u} \in \mathbb{R}^2$ and an output coordinate $\mathbf{x} \in \mathbb{R}^2$. For computational efficiency, we sometimes write $h(\mathbf{x},\mathbf{u})$ as a linear combination of spatially invariant basis functions. The associated pixelwise coefficients, however, can be indexed by either the input coordinate or the output coordinate. While appearing subtle, the two indexing schemes will lead to two different forms of convolutions known as scattering and gathering, respectively. We discuss the origin of the operations. We discuss conditions under which the two operations are identical. We show that scattering is more suitable for simulating how light propagates and gathering is more suitable for image filtering such as denoising.
△ Less
Submitted 9 March, 2024; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Structured Kernel Estimation for Photon-Limited Deconvolution
Authors:
Yash Sanghvi,
Zhiyuan Mao,
Stanley H. Chan
Abstract:
Images taken in a low light condition with the presence of camera shake suffer from motion blur and photon shot noise. While state-of-the-art image restoration networks show promising results, they are largely limited to well-illuminated scenes and their performance drops significantly when photon shot noise is strong.
In this paper, we propose a new blur estimation technique customized for phot…
▽ More
Images taken in a low light condition with the presence of camera shake suffer from motion blur and photon shot noise. While state-of-the-art image restoration networks show promising results, they are largely limited to well-illuminated scenes and their performance drops significantly when photon shot noise is strong.
In this paper, we propose a new blur estimation technique customized for photon-limited conditions. The proposed method employs a gradient-based backpropagation method to estimate the blur kernel. By modeling the blur kernel using a low-dimensional representation with the key points on the motion trajectory, we significantly reduce the search space and improve the regularity of the kernel estimation problem. When plugged into an iterative framework, our novel low-dimensional representation provides improved kernel estimates and hence significantly better deconvolution performance when compared to end-to-end trained neural networks. The source code and pretrained models are available at \url{https://github.com/sanghviyashiitb/structured-kernel-cvpr23}
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
A Global and Patch-wise Contrastive Loss for Accurate Automated Exudate Detection
Authors:
Wei Tang,
Kangning Cui,
Raymond H. Chan
Abstract:
Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. However, the unique characteristics of hard exudates, ranging from their inconsistent shapes to indistinct boundaries, pose significant challenges to existing segmentation techniques. To address these…
▽ More
Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. However, the unique characteristics of hard exudates, ranging from their inconsistent shapes to indistinct boundaries, pose significant challenges to existing segmentation techniques. To address these issues, we present a novel supervised contrastive learning framework to optimize hard exudate segmentation. Specifically, we introduce a patch-wise density contrasting scheme to distinguish between areas with varying lesion concentrations, and therefore improve the model's proficiency in segmenting small lesions. To handle the ambiguous boundaries, we develop a discriminative edge inspection module to dynamically analyze the pixels that lie around the boundaries and accurately delineate the exudates. Upon evaluation using the IDRiD dataset and comparison with state-of-the-art frameworks, our method exhibits its effectiveness and shows potential for computer-assisted hard exudate detection. The code to replicate experiments is available at github.com/wetang7/HECL/.
△ Less
Submitted 2 March, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Continuous U-Net: Faster, Greater and Noiseless
Authors:
Chun-Wun Cheng,
Christina Runkel,
Lihao Liu,
Raymond H Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromisin…
▽ More
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromising the performance and computational cost, as well as the fact that they do not account for inherent noise in the data. They have problems associated with discrete layers, and do not offer any theoretical underpinning. In this work we introduce continuous U-Net, a novel family of networks for image segmentation. Firstly, continuous U-Net is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations. Secondly, we provide theoretical guarantees for our network demonstrating faster convergence, higher robustness and less sensitivity to noise. Thirdly, we derive qualitative measures to tailor-made segmentation tasks. We demonstrate, through extensive numerical and visual results, that our model outperforms existing U-Net blocks for several medical image segmentation benchmarking datasets.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Real-Time Dense Field Phase-to-Space Simulation of Imaging through Atmospheric Turbulence
Authors:
Nicholas Chimitt,
Xingguang Zhang,
Zhiyuan Mao,
Stanley H. Chan
Abstract:
Numerical simulation of atmospheric turbulence is one of the biggest bottlenecks in develo** computational techniques for solving the inverse problem in long-range imaging. The classical split-step method is based upon numerical wave propagation which splits the propagation path into many segments and propagates every pixel in each segment individually via the Fresnel integral. This repeated eva…
▽ More
Numerical simulation of atmospheric turbulence is one of the biggest bottlenecks in develo** computational techniques for solving the inverse problem in long-range imaging. The classical split-step method is based upon numerical wave propagation which splits the propagation path into many segments and propagates every pixel in each segment individually via the Fresnel integral. This repeated evaluation becomes increasingly time-consuming for larger images. As a result, the split-step simulation is often done only on a sparse grid of points followed by an interpolation to the other pixels. Even so, the computation is expensive for real-time applications. In this paper, we present a new simulation method that enables \emph{real-time} processing over a \emph{dense} grid of points. Building upon the recently developed multi-aperture model and the phase-to-space transform, we overcome the memory bottleneck in drawing random samples from the Zernike correlation tensor. We show that the cross-correlation of the Zernike modes has an insignificant contribution to the statistics of the random samples. By approximating these cross-correlation blocks in the Zernike tensor, we restore the homogeneity of the tensor which then enables Fourier-based random sampling. On a $512\times512$ image, the new simulator achieves 0.025 seconds per frame over a dense field. On a $3840 \times 2160$ image which would have taken 13 hours to simulate using the split-step method, the new simulator can run at approximately 60 seconds per frame.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
What Does a One-Bit Quanta Image Sensor Offer?
Authors:
Stanley H. Chan
Abstract:
The one-bit quanta image sensor (QIS) is a photon-counting device that captures image intensities using binary bits. Assuming that the analog voltage generated at the floating diffusion of the photodiode follows a Poisson-Gaussian distribution, the sensor produces either a ``1'' if the voltage is above a certain threshold or ``0'' if it is below the threshold. The concept of this binary sensor has…
▽ More
The one-bit quanta image sensor (QIS) is a photon-counting device that captures image intensities using binary bits. Assuming that the analog voltage generated at the floating diffusion of the photodiode follows a Poisson-Gaussian distribution, the sensor produces either a ``1'' if the voltage is above a certain threshold or ``0'' if it is below the threshold. The concept of this binary sensor has been proposed for more than a decade, and physical devices have been built to realize the concept. However, what benefits does a one-bit QIS offer compared to a conventional multi-bit CMOS image sensor? Besides the known empirical results, are there theoretical proofs to support these findings?
The goal of this paper is to provide new theoretical support from a signal processing perspective. In particular, it is theoretically found that the sensor can offer three benefits: (1) Low-light: One-bit QIS performs better at low-light because it has a low read noise, and its one-bit quantization can produce an error-free measurement. However, this requires the exposure time to be appropriately configured. (2) Frame rate: One-bit sensors can operate at a much higher speed because a response is generated as soon as a photon is detected. However, in the presence of read noise, there exists an optimal frame rate beyond which the performance will degrade. A Closed-form expression of the optimal frame rate is derived. (3) Dynamic range: One-bit QIS offers a higher dynamic range. The benefit is brought by two complementary characteristics of the sensor: nonlinearity and exposure bracketing. The decoupling of the two factors is theoretically proved, and closed-form expressions are derived.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Photon-Limited Blind Deconvolution using Unsupervised Iterative Kernel Estimation
Authors:
Yash Sanghvi,
Abhiram Gnanasambandam,
Zhiyuan Mao,
Stanley H. Chan
Abstract:
Blind deconvolution is a challenging problem, but in low-light it is even more difficult. Existing algorithms, both classical and deep-learning based, are not designed for this condition. When the photon shot noise is strong, conventional deconvolution methods fail because (1) the image does not have enough signal-to-noise ratio to perform the blur estimation; (2) While deep neural networks are po…
▽ More
Blind deconvolution is a challenging problem, but in low-light it is even more difficult. Existing algorithms, both classical and deep-learning based, are not designed for this condition. When the photon shot noise is strong, conventional deconvolution methods fail because (1) the image does not have enough signal-to-noise ratio to perform the blur estimation; (2) While deep neural networks are powerful, many of them do not consider the forward process. When the noise is strong, these networks fail to simultaneously deblur and denoise; (3) While iterative schemes are known to be robust in the classical frameworks, they are seldom considered in deep neural networks because it requires a differentiable non-blind solver.
This paper addresses the above challenges by presenting an \emph{unsupervised} blind deconvolution method. At the core of this method is a reformulation of the general blind deconvolution framework from the conventional image-kernel alternating minimization to a purely kernel-based minimization. This kernel-based minimization leads to a new iterative scheme that backpropagates an unsupervised loss through a pre-trained non-blind solver to update the blur kernel. Experimental results show that the proposed framework achieves superior results than state-of-the-art blind deconvolution algorithms in low-light conditions.
△ Less
Submitted 17 November, 2022; v1 submitted 31 July, 2022;
originally announced August 2022.
-
Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model
Authors:
Zhiyuan Mao,
Ajay Jaiswal,
Zhangyang Wang,
Stanley H. Chan
Abstract:
Image restoration algorithms for atmospheric turbulence are known to be much more challenging to design than traditional ones such as blur or noise because the distortion caused by the turbulence is an entanglement of spatially varying blur, geometric distortion, and sensor noise. Existing CNN-based restoration methods built upon convolutional kernels with static weights are insufficient to handle…
▽ More
Image restoration algorithms for atmospheric turbulence are known to be much more challenging to design than traditional ones such as blur or noise because the distortion caused by the turbulence is an entanglement of spatially varying blur, geometric distortion, and sensor noise. Existing CNN-based restoration methods built upon convolutional kernels with static weights are insufficient to handle the spatially dynamical atmospheric turbulence effect. To address this problem, in this paper, we propose a physics-inspired transformer model for imaging through atmospheric turbulence. The proposed network utilizes the power of transformer blocks to jointly extract a dynamical turbulence distortion map and restore a turbulence-free image. In addition, recognizing the lack of a comprehensive dataset, we collect and present two new real-world turbulence datasets that allow for evaluation with both classical objective metrics (e.g., PSNR and SSIM) and a new task-driven metric using text recognition accuracy. Both real testing sets and all related code will be made publicly available.
△ Less
Submitted 24 July, 2022; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Imaging through the Atmosphere using Turbulence Mitigation Transformer
Authors:
Xingguang Zhang,
Zhiyuan Mao,
Nicholas Chimitt,
Stanley H. Chan
Abstract:
Restoring images distorted by atmospheric turbulence is a ubiquitous problem in long-range imaging applications. While existing deep-learning-based methods have demonstrated promising results in specific testing conditions, they suffer from three limitations: (1) lack of generalization capability from synthetic training data to real turbulence data; (2) failure to scale, hence causing memory and s…
▽ More
Restoring images distorted by atmospheric turbulence is a ubiquitous problem in long-range imaging applications. While existing deep-learning-based methods have demonstrated promising results in specific testing conditions, they suffer from three limitations: (1) lack of generalization capability from synthetic training data to real turbulence data; (2) failure to scale, hence causing memory and speed challenges when extending the idea to a large number of frames; (3) lack of a fast and accurate simulator to generate data for training neural networks. In this paper, we introduce the turbulence mitigation transformer (TMT) that explicitly addresses these issues. TMT brings three contributions: Firstly, TMT explicitly uses turbulence physics by decoupling the turbulence degradation and introducing a multi-scale loss for removing distortion, thus improving effectiveness. Secondly, TMT presents a new attention module along the temporal axis to extract extra features efficiently, thus improving memory and speed. Thirdly, TMT introduces a new simulator based on the Fourier sampler, temporal correlation, and flexible kernel size, thus improving our capability to synthesize better training data. TMT outperforms state-of-the-art video restoration models, especially in generalizing from synthetic to real turbulence data. Code, videos, and datasets are available at \href{https://xg416.github.io/TMT}{https://xg416.github.io/TMT}.
△ Less
Submitted 11 December, 2023; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Tilt-then-Blur or Blur-then-Tilt? Clarifying the Atmospheric Turbulence Model
Authors:
Stanley H. Chan
Abstract:
Imaging at a long distance often requires advanced image restoration algorithms to compensate for the distortions caused by atmospheric turbulence. However, unlike many standard restoration problems such as deconvolution, the forward image formation model of the atmospheric turbulence does not have a simple expression. Thanks to the Zernike representation of the phase, one can show that the forwar…
▽ More
Imaging at a long distance often requires advanced image restoration algorithms to compensate for the distortions caused by atmospheric turbulence. However, unlike many standard restoration problems such as deconvolution, the forward image formation model of the atmospheric turbulence does not have a simple expression. Thanks to the Zernike representation of the phase, one can show that the forward model is a combination of tilt (pixel shifting due to the linear phase terms) and blur (image smoothing due to the high order aberrations).
Confusions then arise between the ordering of the two operators. Should the model be tilt-then-blur, or blur-then-tilt? Some papers in the literature say that the model is tilt-then-blur, whereas more papers say that it is blur-then-tilt. This paper clarifies the differences between the two and discusses why the tilt-then-blur is the correct model. Recommendations are given to the research community.
△ Less
Submitted 18 August, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Facial Image Reconstruction from Functional Magnetic Resonance Imaging via GAN Inversion with Improved Attribute Consistency
Authors:
Pei-Chun Chang,
Yan-Yu Tien,
Chia-Lin Chen,
Li-Fen Chen,
Yong-Sheng Chen,
Hui-Ling Chan
Abstract:
Neuroscience studies have revealed that the brain encodes visual content and embeds information in neural activity. Recently, deep learning techniques have facilitated attempts to address visual reconstructions by map** brain activity to image stimuli using generative adversarial networks (GANs). However, none of these studies have considered the semantic meaning of latent code in image space. O…
▽ More
Neuroscience studies have revealed that the brain encodes visual content and embeds information in neural activity. Recently, deep learning techniques have facilitated attempts to address visual reconstructions by map** brain activity to image stimuli using generative adversarial networks (GANs). However, none of these studies have considered the semantic meaning of latent code in image space. Omitting semantic information could potentially limit the performance. In this study, we propose a new framework to reconstruct facial images from functional Magnetic Resonance Imaging (fMRI) data. With this framework, the GAN inversion is first applied to train an image encoder to extract latent codes in image space, which are then bridged to fMRI data using linear transformation. Following the attributes identified from fMRI data using an attribute classifier, the direction in which to manipulate attributes is decided and the attribute manipulator adjusts the latent code to improve the consistency between the seen image and the reconstructed image. Our experimental results suggest that the proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition
Authors:
Jiao Suo,
Yifan Liu,
Clio Cheng,
Keer Wang,
Meng Chen,
Ho-yin Chan,
Roy Vellaisamy,
Ning Xi,
Vivian W. Q. Lou,
Wen Jung Li
Abstract:
Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc. Lightweight wearable sensors, including inertial sensors, surface electromyography sensors, and flexible strain/pressure, are promising to become the next-generation human motion capture devices. H…
▽ More
Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc. Lightweight wearable sensors, including inertial sensors, surface electromyography sensors, and flexible strain/pressure, are promising to become the next-generation human motion capture devices. Herein, we present a wireless wearable device consisting of a sixteen-channel flexible sponge-based pressure sensor array to recognize various human lower limb motions by detecting contours on the human skin caused by calf gastrocnemius muscle actions. Each sensing element is a round porous structure of thin carbon nanotube/polydimethylsiloxane nanocomposites with a diameter of 4 mm and thickness of about 400 μm. Ten human subjects were recruited to perform ten different lower limb motions while wearing the developed device. The motion classification result with the support vector machine method shows a macro-recall of about 97.3% for all ten motions tested. This work demonstrates a portable wearable muscle activity detection device with a lower limb motion recognition application, which can be potentially used in assistive robot control, healthcare, sports monitoring, etc.
△ Less
Submitted 29 June, 2022; v1 submitted 30 April, 2022;
originally announced May 2022.
-
SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks
Authors:
Chak Ho Chan,
Kaizhi Qian,
Yang Zhang,
Mark Hasegawa-Johnson
Abstract:
SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SpeechSplit requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SpeechSplit 2.0, which constrains the information flow of the speech component to…
▽ More
SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SpeechSplit requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SpeechSplit 2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. Evaluation results show that SpeechSplit 2.0 achieves comparable performance to SpeechSplit in speech disentanglement and superior robustness to the bottleneck size variations. Our code is available at https://github.com/biggytruck/SpeechSplit2.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
On the Insensitivity of Bit Density to Read Noise in One-bit Quanta Image Sensors
Authors:
Stanley H. Chan
Abstract:
The one-bit quanta image sensor is a photon-counting device that produces binary measurements where each bit represents the presence or absence of a photon. In the presence of read noise, the sensor quantizes the analog voltage into the binary bits using a threshold value $q$. The average number of ones in the bitstream is known as the bit-density and is often the sufficient statistics for signal…
▽ More
The one-bit quanta image sensor is a photon-counting device that produces binary measurements where each bit represents the presence or absence of a photon. In the presence of read noise, the sensor quantizes the analog voltage into the binary bits using a threshold value $q$. The average number of ones in the bitstream is known as the bit-density and is often the sufficient statistics for signal estimation. An intriguing phenomenon is observed when the quanta exposure is at the unity and the threshold is $q = 0.5$. The bit-density demonstrates a complete insensitivity as long as the read noise level does not exceeds a certain limit. In other words, the bit density stays at a constant independent of the amount of read noise. This paper provides a mathematical explanation of the phenomenon by deriving conditions under which the phenomenon happens. It was found that the insensitivity holds when some forms of the symmetry of the underlying Poisson-Gaussian distribution holds.
△ Less
Submitted 30 January, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Exposure-Referred Signal-to-Noise Ratio for Digital Image Sensors
Authors:
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
The signal-to-noise ratio (SNR) is a fundamental tool to measure the performance of an image sensor. However, confusions sometimes arise between the two types of SNRs. The first one is the output-referred SNR which measures the ratio between the signal and the noise seen at the sensor's output. This SNR is easy to compute, and it is linear in the log-log scale for most image sensors. The second SN…
▽ More
The signal-to-noise ratio (SNR) is a fundamental tool to measure the performance of an image sensor. However, confusions sometimes arise between the two types of SNRs. The first one is the output-referred SNR which measures the ratio between the signal and the noise seen at the sensor's output. This SNR is easy to compute, and it is linear in the log-log scale for most image sensors. The second SNR is the exposure-referred SNR, also known as the input-referred SNR. This SNR considers the noise at the input by including a derivative term to the output-referred SNR. The two SNRs have similar behaviors for sensors with a large full-well capacity. However, for sensors with a small full-well capacity, the exposure-referred SNR can capture some behaviors that the output-referred SNR cannot.
While the exposure-referred SNR has been known and used by the industry for a long time, a theoretically rigorous derivation from a signal processing perspective is lacking. In particular, while various equations can be found in different sources of the literature, there is currently no paper that attempts to assemble, derive, and organize these equations in one place. This paper aims to fill the gap by answering four questions: (1) How is the exposure-referred SNR derived from first principles? (2) Is the output-referred SNR a special case of the exposure-referred SNR, or are they completely different? (3) How to compute the SNR efficiently? (4) What utilities can the SNR bring to solving imaging tasks? New theoretical results are derived for image sensors of any bit-depth and full-well capacity.
△ Less
Submitted 12 June, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement
Authors:
Xue Zhang,
Gene Cheung,
Jiahao Pang,
Yash Sanghvi,
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a pri…
▽ More
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors.
Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from a representative depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint map** and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We minimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes in two established point cloud quality metrics.
△ Less
Submitted 6 October, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Photon Limited Non-Blind Deblurring Using Algorithm Unrolling
Authors:
Yash Sanghvi,
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
Image deblurring in photon-limited conditions is ubiquitous in a variety of low-light applications such as photography, microscopy, and astronomy. However, the presence of photon shot noise due to low illumination and/or short exposure makes the deblurring task substantially more challenging than the conventional deblurring problems. In this paper, we present an algorithm unrolling approach for th…
▽ More
Image deblurring in photon-limited conditions is ubiquitous in a variety of low-light applications such as photography, microscopy, and astronomy. However, the presence of photon shot noise due to low illumination and/or short exposure makes the deblurring task substantially more challenging than the conventional deblurring problems. In this paper, we present an algorithm unrolling approach for the photon-limited deblurring problem by unrolling a Plug-and-Play algorithm for a fixed number of iterations. By introducing a three-operator splitting formation of the Plug-and-Play framework, we obtain a series of differentiable steps which allows the fixed iteration unrolled network to be trained end-to-end. The proposed algorithm demonstrates significantly better image recovery compared to existing state-of-the-art deblurring approaches. We also present a new photon-limited deblurring dataset for evaluating the performance of algorithms.
△ Less
Submitted 26 October, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Identifying Autism Spectrum Disorder Based on Individual-Aware Down-Sampling and Multi-Modal Learning
Authors:
Li Pan,
Jundong Liu,
Mingqin Shi,
Chi Wah Wong,
Kei Hang Katie Chan
Abstract:
Autism Spectrum Disorder(ASD) is a set of neurodevelopmental conditions that affect patients' social abilities. In recent years, many studies have employed deep learning to diagnose this brain dysfunction through functional MRI (fMRI). However, existing approaches solely focused on the abnormal brain functional connections but ignored the impact of regional activities. Due to this biased prior kno…
▽ More
Autism Spectrum Disorder(ASD) is a set of neurodevelopmental conditions that affect patients' social abilities. In recent years, many studies have employed deep learning to diagnose this brain dysfunction through functional MRI (fMRI). However, existing approaches solely focused on the abnormal brain functional connections but ignored the impact of regional activities. Due to this biased prior knowledge, previous diagnosis models suffered from inter-site measurement heterogeneity and inter-individual phenotypic differences. To address this issue, we propose a novel feature extraction method for fMRI that can learn a personalized lower-resolution representation of the entire brain networking regarding both the functional connections and regional activities. Specifically, we abstract the brain imaging as a graph structure and straightforwardly downsample it to substructures by hierarchical graph pooling. To further recalibrate the distribution of the extracted features under phenotypic information, we subsequently embed the sparse feature vectors into a population graph, where the hidden inter-subject heterogeneity and homogeneity are explicitly expressed as inter- and intra-community connectivity differences, and utilize Graph Convolutional Networks to learn the node embeddings. By these means, our framework can extract features directly and efficiently from the entire fMRI and be aware of implicit inter-individual variance. We have evaluated our framework on the ABIDE-I dataset with 10-fold cross-validation. The present model has achieved a mean classification accuracy of 87.62\% and a mean AUC of 0.92, better than the state-of-the-art methods.
△ Less
Submitted 25 October, 2021; v1 submitted 19 September, 2021;
originally announced September 2021.
-
The CORSMAL benchmark for the prediction of the properties of containers
Authors:
Alessio Xompero,
Santiago Donaher,
Vladimir Iashin,
Francesca Palermo,
Gökhan Solak,
Claudio Coppola,
Reina Ishikawa,
Yuichi Nagao,
Ryo Hachiuma,
Qi Liu,
Fan Feng,
Chuanlin Lan,
Rosa H. M. Chan,
Guilherme Christmann,
Jyun-Ting Song,
Gonuguntla Neeharika,
Chinnakotla Krishna Teja Reddy,
Dinesh Jain,
Bakhtawar Ur Rehman,
Andrea Cavallaro
Abstract:
The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmar…
▽ More
The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework.
△ Less
Submitted 21 April, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
Authors:
Zhiyuan Mao,
Nicholas Chimitt,
Stanley H. Chan
Abstract:
Fast and accurate simulation of imaging through atmospheric turbulence is essential for develo** turbulence mitigation algorithms. Recognizing the limitations of previous approaches, we introduce a new concept known as the phase-to-space (P2S) transform to significantly speed up the simulation. P2S is build upon three ideas: (1) reformulating the spatially varying convolution as a set of invaria…
▽ More
Fast and accurate simulation of imaging through atmospheric turbulence is essential for develo** turbulence mitigation algorithms. Recognizing the limitations of previous approaches, we introduce a new concept known as the phase-to-space (P2S) transform to significantly speed up the simulation. P2S is build upon three ideas: (1) reformulating the spatially varying convolution as a set of invariant convolutions with basis functions, (2) learning the basis function via the known turbulence statistics models, (3) implementing the P2S transform via a light-weight network that directly convert the phase representation to spatial representation. The new simulator offers 300x -- 1000x speed up compared to the mainstream split-step simulators while preserving the essential turbulence statistics.
△ Less
Submitted 20 August, 2021; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Graphene-based Distributed 3D Sensing Electrodes for Map** Spatiotemporal Auricular Physiological Signals
Authors:
Q. Huang,
C. Wu,
S. Hou,
H. Sun,
K. Yao,
J. Law,
M. Yang,
A. L. R. Vellaisamy,
X. Yu,
H. Y. Chan,
L. Lao,
Y. Sun,
W. J. Li
Abstract:
Underneath the ear skin there are richly branching vascular and neural networks that ultimately connecting to our heart and brain. Hence, the three-dimensional (3D) map** of auricular electrophysiological signals could provide a new perspective for biomedical studies such as diagnosis of cardiovascular diseases and neurological disorders. However, it is still extremely challenging for current se…
▽ More
Underneath the ear skin there are richly branching vascular and neural networks that ultimately connecting to our heart and brain. Hence, the three-dimensional (3D) map** of auricular electrophysiological signals could provide a new perspective for biomedical studies such as diagnosis of cardiovascular diseases and neurological disorders. However, it is still extremely challenging for current sensing techniques to cover the entire ultra-curved auricle. Here, we report a graphene-based ear-conformable sensing device with embedded and distributed 3D electrodes which enable full-auricle physiological monitoring. The sensing device, which incorporates programable 3D electrode thread array and personalized auricular mold, has 3D-conformable sensing interfaces with curved auricular skin, and was developed using one-step multi-material 3D-printing process. As a proof-of-concept, spatiotemporal auricular electrical skin resistance (AESR) map** was demonstrated. For the first time, 3D AESR contours were generated and human subject-specific AESR distributions among a population were observed. From the data of 17 volunteers, the auricular region-specific AESR changes after cycling exercise were observed in 98% of the tests and were validated via machine learning techniques. Correlations of AESR with heart rate and blood pressure were also studied using statistical analysis. This 3D electronic platform and AESR-based new biometrical findings show promising biomedical applications.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
Graph Signal Restoration Using Nested Deep Algorithm Unrolling
Authors:
Masatoshi Nagahama,
Koki Yamada,
Yuichi Tanaka,
Stanley H. Chan,
Yonina C. Eldar
Abstract:
Graph signal processing is a ubiquitous task in many applications such as sensor, social, transportation and brain networks, point cloud processing, and graph neural networks. Often, graph signals are corrupted in the sensing process, thus requiring restoration. In this paper, we propose two graph signal restoration methods based on deep algorithm unrolling (DAU). First, we present a graph signal…
▽ More
Graph signal processing is a ubiquitous task in many applications such as sensor, social, transportation and brain networks, point cloud processing, and graph neural networks. Often, graph signals are corrupted in the sensing process, thus requiring restoration. In this paper, we propose two graph signal restoration methods based on deep algorithm unrolling (DAU). First, we present a graph signal denoiser by unrolling iterations of the alternating direction method of multiplier (ADMM). We then suggest a general restoration method for linear degradation by unrolling iterations of Plug-and-Play ADMM (PnP-ADMM). In the second approach, the unrolled ADMM-based denoiser is incorporated as a submodule, leading to a nested DAU structure. The parameters in the proposed denoising/restoration methods are trainable in an end-to-end manner. Our approach is interpretable and keeps the number of parameters small since we only tune graph-independent regularization parameters. We overcome two main challenges in existing graph signal restoration methods: 1) limited performance of convex optimization algorithms due to fixed parameters which are often determined manually. 2) large number of parameters of graph neural networks that result in difficulty of training. Several experiments for graph signal denoising and interpolation are performed on synthetic and real-world data. The proposed methods show performance improvements over several existing techniques in terms of root mean squared error in both tasks.
△ Less
Submitted 1 June, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
HDR Imaging with Quanta Image Sensors: Theoretical Limits and Optimal Reconstruction
Authors:
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
High dynamic range (HDR) imaging is one of the biggest achievements in modern photography. Traditional solutions to HDR imaging are designed for and applied to CMOS image sensors (CIS). However, the mainstream one-micron CIS cameras today generally have a high read noise and low frame-rate. These, in turn, limit the acquisition speed and quality, making the cameras slow in the HDR mode. In this pa…
▽ More
High dynamic range (HDR) imaging is one of the biggest achievements in modern photography. Traditional solutions to HDR imaging are designed for and applied to CMOS image sensors (CIS). However, the mainstream one-micron CIS cameras today generally have a high read noise and low frame-rate. These, in turn, limit the acquisition speed and quality, making the cameras slow in the HDR mode. In this paper, we propose a new computational photography technique for HDR imaging. Recognizing the limitations of CIS, we use the Quanta Image Sensor (QIS) to trade the spatial-temporal resolution with bit-depth. QIS is a single-photon image sensor that has comparable pixel pitch to CIS but substantially lower dark current and read noise. We provide a complete theoretical characterization of the sensor in the context of HDR imaging, by proving the fundamental limits in the dynamic range that QIS can offer and the trade-offs with noise and speed. In addition, we derive an optimal reconstruction algorithm for single-bit and multi-bit QIS. Our algorithm is theoretically optimal for \emph{all} linear reconstruction schemes based on exposure bracketing. Experimental results confirm the validity of the theory and algorithm, based on synthetic and real QIS data.
△ Less
Submitted 2 December, 2020; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Dynamic Low-light Imaging with Quanta Image Sensors
Authors:
Yiheng Chi,
Abhiram Gnanasambandam,
Vladlen Koltun,
Stanley H. Chan
Abstract:
Imaging in low light is difficult because the number of photons arriving at the sensor is low. Imaging dynamic scenes in low-light environments is even more difficult because as the scene moves, pixels in adjacent frames need to be aligned before they can be denoised. Conventional CMOS image sensors (CIS) are at a particular disadvantage in dynamic low-light settings because the exposure cannot be…
▽ More
Imaging in low light is difficult because the number of photons arriving at the sensor is low. Imaging dynamic scenes in low-light environments is even more difficult because as the scene moves, pixels in adjacent frames need to be aligned before they can be denoised. Conventional CMOS image sensors (CIS) are at a particular disadvantage in dynamic low-light settings because the exposure cannot be too short lest the read noise overwhelms the signal. We propose a solution using Quanta Image Sensors (QIS) and present a new image reconstruction algorithm. QIS are single-photon image sensors with photon counting capabilities. Studies over the past decade have confirmed the effectiveness of QIS for low-light imaging but reconstruction algorithms for dynamic scenes in low light remain an open problem. We fill the gap by proposing a student-teacher training protocol that transfers knowledge from a motion teacher and a denoising teacher to a student network. We show that dynamic scenes can be reconstructed from a burst of frames at a photon level of 1 photon per pixel per frame. Experimental results confirm the advantages of the proposed method compared to existing methods.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Real-time 3D Nanoscale Coherent Imaging via Physics-aware Deep Learning
Authors:
Henry Chan,
Youssef S. G. Nashed,
Saugat Kandel,
Stephan Hruszkewycz,
Subramanian Sankaranarayanan,
Ross J. Harder,
Mathew J. Cherukara
Abstract:
Phase retrieval, the problem of recovering lost phase information from measured intensity alone, is an inverse problem that is widely faced in various imaging modalities ranging from astronomy to nanoscale imaging. The current process of phase recovery is iterative in nature. As a result, the image formation is time-consuming and computationally expensive, precluding real-time imaging. Here, we us…
▽ More
Phase retrieval, the problem of recovering lost phase information from measured intensity alone, is an inverse problem that is widely faced in various imaging modalities ranging from astronomy to nanoscale imaging. The current process of phase recovery is iterative in nature. As a result, the image formation is time-consuming and computationally expensive, precluding real-time imaging. Here, we use 3D nanoscale X-ray imaging as a representative example to develop a deep learning model to address this phase retrieval problem. We introduce 3D-CDI-NN, a deep convolutional neural network and differential programming framework trained to predict 3D structure and strain solely from input 3D X-ray coherent scattering data. Our networks are designed to be "physics-aware" in multiple aspects; in that the physics of x-ray scattering process is explicitly enforced in the training of the network, and the training data are drawn from atomistic simulations that are representative of the physics of the material. We further refine the neural network prediction through a physics-based optimization procedure to enable maximum accuracy at lowest computational cost. 3D-CDI-NN can invert a 3D coherent diffraction pattern to real-space structure and strain hundreds of times faster than traditional iterative phase retrieval methods, with negligible loss in accuracy. Our integrated machine learning and differential programming solution to the phase retrieval problem is broadly applicable across inverse problems in other application areas.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Point Spread Function Engineering for 3D Imaging of Space Debris using a Continuous Exact l0 Penalty (CEL0) Based Algorithm
Authors:
Chao Wang,
Raymond H. Chan,
Robert J. Plemmons,
Sudhakar Prasad
Abstract:
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D…
▽ More
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D inverse problem in the discretized form. A recently developed penalty called continuous exact l0 (CEL0) is applied in this problem for the Gaussian noise model. Numerical experiments and comparisons illustrate the efficiency of the algorithm.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Image Classification in the Dark using Quanta Image Sensors
Authors:
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
State-of-the-art image classifiers are trained and tested using well-illuminated images. These images are typically captured by CMOS image sensors with at least tens of photons per pixel. However, in dark environments when the photon flux is low, image classification becomes difficult because the measured signal is suppressed by noise. In this paper, we present a new low-light image classification…
▽ More
State-of-the-art image classifiers are trained and tested using well-illuminated images. These images are typically captured by CMOS image sensors with at least tens of photons per pixel. However, in dark environments when the photon flux is low, image classification becomes difficult because the measured signal is suppressed by noise. In this paper, we present a new low-light image classification solution using Quanta Image Sensors (QIS). QIS are a new type of image sensors that possess photon counting ability without compromising on pixel size and spatial resolution. Numerous studies over the past decade have demonstrated the feasibility of QIS for low-light imaging, but their usage for image classification has not been studied. This paper fills the gap by presenting a student-teacher learning scheme which allows us to classify the noisy QIS raw data. We show that with student-teacher learning, we are able to achieve image classification at a photon level of one photon per pixel or lower. Experimental results verify the effectiveness of the proposed method compared to existing solutions.
△ Less
Submitted 16 July, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Simulating Anisoplanatic Turbulence by Sampling Inter-modal and Spatially Correlated Zernike Coefficients
Authors:
Nicholas Chimitt,
Stanley H. Chan
Abstract:
Simulating atmospheric turbulence is an essential task for evaluating turbulence mitigation algorithms and training learning-based methods. Advanced numerical simulators for atmospheric turbulence are available, but they require evaluating wave propagation which is computationally expensive. In this paper, we present a propagation-free method for simulating imaging through turbulence. The key idea…
▽ More
Simulating atmospheric turbulence is an essential task for evaluating turbulence mitigation algorithms and training learning-based methods. Advanced numerical simulators for atmospheric turbulence are available, but they require evaluating wave propagation which is computationally expensive. In this paper, we present a propagation-free method for simulating imaging through turbulence. The key idea behind our work is a new method to draw inter-modal and spatially correlated Zernike coefficients. By establishing the equivalence between the angle-of-arrival correlation by Basu, McCrae and Fiorino (2015) and the multi-aperture correlation by Chanan (1992), we show that the Zernike coefficients can be drawn according to a covariance matrix defining the correlations. We propose fast and scalable sampling strategies to draw these samples. The new method allows us to compress the wave propagation problem into a sampling problem, hence making the new simulator significantly faster than existing ones. Experimental results show that the simulator has an excellent match with the theory and real turbulence data.
△ Less
Submitted 22 June, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
QC-SPHRAM: Quasi-conformal Spherical Harmonics Based Geometric Distortions on Hippocampal Surfaces for Early Detection of the Alzheimer's Disease
Authors:
Anthony Hei-Long Chan,
Yishan Luo,
Lin Shi,
Ronald Lok-Ming Lui
Abstract:
We propose a disease classification model, called the QC-SPHARM, for the early detection of the Alzheimer's Disease (AD). The proposed QC-SPHARM can distinguish between normal control (NC) subjects and AD patients, as well as between amnestic mild cognitive impairment (aMCI) patients having high possibility progressing into AD and those who do not. Using the spherical harmonics (SPHARM) based regi…
▽ More
We propose a disease classification model, called the QC-SPHARM, for the early detection of the Alzheimer's Disease (AD). The proposed QC-SPHARM can distinguish between normal control (NC) subjects and AD patients, as well as between amnestic mild cognitive impairment (aMCI) patients having high possibility progressing into AD and those who do not. Using the spherical harmonics (SPHARM) based registration, hippocampal surfaces segmented from the ADNI data are individually registered to a template surface constructed from the NC subjects using SPHARM. Local geometric distortions of the deformation from the template surface to each subject are quantified in terms of conformality distortions and curvatures distortions. The measurements are combined with the spherical harmonics coefficients and the total volume change of the subject from the template. Afterwards, a t-test based feature selection method incorporating the bagging strategy is applied to extract those local regions having high discriminating power of the two classes. The disease diagnosis machine can therefore be built using the data under the Support Vector Machine (SVM) setting. Using 110 NC subjects and 110 AD patients from the ADNI database, the proposed algorithm achieves 85:2% testing accuracy on 80 random samples as testing subjects, with the incorporation of surface geometry in the classification machine. Using 20 aMCI patients who has advanced to AD during a two-year period and another 20 aMCI patients who remain non-AD for the next two years, the algorithm achieves 81:2% accuracy using 10 randomly picked subjects as testing data. Our proposed method is 6%-15% better than other classification models without the incorporation of surface geometry. The results demonstrate the advantages of using local geometric distortions as the discriminating criterion for early AD diagnosis.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Decoding of visual-related information from the human EEG using an end-to-end deep learning approach
Authors:
Lingling Yang,
Leanne Lai Hang Chan,
Yao Lu
Abstract:
There is increasing interest in using deep learning approach for EEG analysis as there are still rooms for the improvement of EEG analysis in its accuracy. Convolutional long short-term (CNNLSTM) has been successfully applied in time series data with spatial structure through end-to-end learning. Here, we proposed a CNNLSTM based neural network architecture termed EEG_CNNLSTMNet for the classifica…
▽ More
There is increasing interest in using deep learning approach for EEG analysis as there are still rooms for the improvement of EEG analysis in its accuracy. Convolutional long short-term (CNNLSTM) has been successfully applied in time series data with spatial structure through end-to-end learning. Here, we proposed a CNNLSTM based neural network architecture termed EEG_CNNLSTMNet for the classification of EEG signals in response to grating stimuli with different spatial frequencies. EEG_CNNLSTMNet comprises two convolutional layers and one bidirectional long short-term memory (LSTM) layer. The convolutional layers capture local temporal characteristics of the EEG signal at each channel as well as global spatial characteristics across channels, while the LSTM layer extracts long-term temporal dependency of EEG signals. Our experiment showed that EEG_CNNLSTMNet performed much better at EEG classification than a traditional machine learning approach, i.e. a support vector machine (SVM) with features. Additionally, EEG_CNNLSTMNet outperformed EEGNet, a state-of-art neural network architecture for the intra-subject case. We infer that the underperformance when using an LSTM layer in the inter-subject case is due to long-term dependency characteristics in the EEG signal that vary greatly across subjects. Moreover, the inter-subject fine-tuned classification model using very little data of the new subject achieved much higher accuracy than that trained only on the data from the other subjects. Our study suggests that the fine-tuned inter-subject model can be a potential end-to-end EEG analysis method considering both the accuracy and the required training data of the new subject.
△ Less
Submitted 19 December, 2019; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Rethinking Atmospheric Turbulence Mitigation
Authors:
Nicholas Chimitt,
Zhiyuan Mao,
Guanzhe Hong,
Stanley H. Chan
Abstract:
State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem…
▽ More
State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem by analyzing the reference frame generation and the blind deconvolution steps in a typical restoration pipeline. By leveraging tools in large deviation theory, we rigorously prove the minimum number of frames required to generate a reliable reference for both static and dynamic scenes. We discuss how a turbulence agnostic model can lead to potential flaws, and how to configure a simple spatial-temporal non-local weighted averaging method to generate references. For blind deconvolution, we present a new data-driven prior by analyzing the distributions of the point spread functions. We demonstrate how a simple prior can outperform state-of-the-art blind deconvolution methods.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
Performance Analysis of Plug-and-Play ADMM: A Graph Signal Processing Perspective
Authors:
Stanley H. Chan
Abstract:
The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration framework that allows advanced image denoising priors to be integrated into physical forward models to generate high quality image restoration results. However, despite the enormous number of applications and several theoretical studies trying to prove the convergence by leveraging tools in convex analysis, very little is known…
▽ More
The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration framework that allows advanced image denoising priors to be integrated into physical forward models to generate high quality image restoration results. However, despite the enormous number of applications and several theoretical studies trying to prove the convergence by leveraging tools in convex analysis, very little is known about why the algorithm is doing so well. The goal of this paper is to fill the gap by discussing the performance of PnP ADMM. By restricting the denoisers to the class of graph filters under a linearity assumption, or more specifically the symmetric smoothing filters, we offer three contributions: (1) We show conditions under which an equivalent maximum-a-posteriori (MAP) optimization exists, (2) we present a geometric interpretation and show that the performance gain is due to an intrinsic pre-denoising characteristic of the PnP prior, (3) we introduce a new analysis technique via the concept of consensus equilibrium, and provide interpretations to problems involving multiple priors.
△ Less
Submitted 17 May, 2019; v1 submitted 31 August, 2018;
originally announced September 2018.