Estimation of motion blur kernel parameters using regression convolutional neural networks1,2

Luis G. Varela, Laura E. Boucheron, Steven Sandoval, David Voelz, and Abu Bucker Siddik
Klipsch School of Electrical and Computer Engineering
New Mexico State University
Las Cruces, NM 88001, USA
{varelal,lboucher,spsandov,davvoelz,siddik}@nmsu.edu
Abstract

Many deblurring and blur kernel estimation methods use a maximum a posteriori (MAP) approach or deep learning-based classification techniques to sharpen an image and/or predict the blur kernel. We propose a regression approach using convolutional neural networks (CNNs) to predict parameters of linear motion blur kernels, the length and orientation of the blur. We analyze the relationship between length and angle of linear motion blur that can be represented as digital filter kernels. A large dataset of blurred images is generated using a suite of blur kernels and used to train a regression CNN for prediction of length and angle of the motion blur. The coefficients of determination for estimation of length and angle are found to be greater than or equal to 0.89, even under the presence of significant additive Gaussian noise, up to a variance of 10% (SNR of 10 dB). Using our estimated kernel in a non-blind image deblurring method, the sum of squared differences error ratio demonstrates higher cumulative histogram values than comparison methods, with most test images yielding an error ratio of less than or equal to 1.25.

11footnotetext: The official version of this paper appears as Luis G. Varela, Laura E. Boucheron, Steven Sandoval, David Voelz, Abu Bucker Siddik, “Estimation of motion blur kernel parameters using regression convolutional neural networks,” J. Electron. Imaging 33(2), 023062 (2024); doi: 10.1117/1.JEI.33.2.023062.22footnotetext: Copyright 2024 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

1 Introduction

Linear motion blur has been studied as a model for camera shake, camera platform movement, and moving objects during imaging [1, 2]. The studies presented within were designed as an initial exploration of a regression convolutional neural network (CNN) to estimate linear motion blur parameters from blurry images. The effects of atmospheric turbulence in an image are generally represented by a spatially-varying blur [3, 4], while recent work has demonstrated that spatially-varying blurred images can be modeled as a superposition of locally linear motion blurs [5]. One way to parametrize linear motion blur kernels is by length and orientation. In this work, we study the ability to accurately estimate the length and angle parameters of a uniform linear motion blurred image. The results may then be used as a foundation upon which to build methods to estimate spatially-varying blur parameters and to serve as a baseline in subsequent studies.

A uniform blurred image can be described by

Ib=KIs+N,subscript𝐼b𝐾subscript𝐼s𝑁I_{\text{b}}=K*I_{\text{s}}+N,italic_I start_POSTSUBSCRIPT b end_POSTSUBSCRIPT = italic_K ∗ italic_I start_POSTSUBSCRIPT s end_POSTSUBSCRIPT + italic_N , (1)

where Ibsubscript𝐼bI_{\text{b}}italic_I start_POSTSUBSCRIPT b end_POSTSUBSCRIPT is the blurred image, K𝐾Kitalic_K is the blur kernel, Issubscript𝐼sI_{\text{s}}italic_I start_POSTSUBSCRIPT s end_POSTSUBSCRIPT is the sharp latent image, N𝑁Nitalic_N is additive noise, and * is the convolution operator. This formulation assumes a uniform blur because the same kernel K𝐾Kitalic_K is applied across the entire image.

The main contributions of this paper are a thorough exploration of linear motion blur kernels and development of a regression CNN for blind prediction of motion blur parameters from blurry images. We study the space of motion blur parametrized by length and angle, specifically the subset of linear motion blur that can be described as 2D discrete motion blur kernels. Furthermore, we include motion blur kernels that are not described as square odd-sized kernels and study complications that arise in defining 2D discrete motion blur kernels. To that end we create a dataset of blurred images spanning a large range of possible blur kernels for use in training and testing a regression CNN based on the VGG16 [6] network to predict linear motion blur kernel parameters. The network is analyzed using the coefficient of determination R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT score, which is a common metric to quantify performance of regression analysis, as well as a deconvolved error ratio (the ratio of the error when deconvolving using the predicted blur kernel to the error when deconvolving using the actual blur kernel) [7].

The organization of this paper is as follows. Section 2 gives a discussion on previous work including classical deblurring and deep learning deblurring methods. Section 3 contains a detailed description of the exploration of the linear blur kernel parameter space as well as the creation of a blurred dataset for deep learning training. Section 4 presents the proposed regression CNN for blur parameter prediction and Sec. 5 presents results of our regression prediction method along with comparison to previous work. Finally, in Sec. 6 we provide a conclusion and briefly discuss our future work.

2 Related Work

Deblurring and kernel blur estimation have been extensively studied in computer vision with applications of recovering the sharp image from blurry images caused by camera shake, fast moving objects in frame, or atmospheric turbulence. Prior to deep learning methods, many researchers used a maximum a posteriori (MAP) approach for both blur kernel estimation and deblurring. Two commonly used varieties of MAP are implicit regularization as seen in [8, 9] and explicit edge prediction based approaches like those in [10, 11, 12, 13, 1]. While many approaches have used MAP to estimate both the latent image and the blur kernel, Levin et al. [14, 7] prove that this approach tends to favor the no-blur explanation (i.e., that the kernel is an impulse and the “deblurred” image is the blurry image) instead of converging to the true blur kernel and latent sharp image. Moreover, it is advocated in [14, 7] that estimating only the blur kernel is a better approach since there are fewer parameters to determine than if one were to also estimate the latent sharp image. Levin et al. [14, 7] use an expectation-maximization (EM) framework to optimize kernel prediction with either a Gaussian or sparse prior. More recently methods have considered deep learning approaches to estimate the blur kernel [15], estimate the latent sharp image [16], or both [17].

In the first variety of MAP-based methods, edge-based approaches in a MAP framework use extracted edges as an image prior in the MAP optimization. Cho et al. [10] introduced a gradient computation using derivatives of the image to compute edges. Money & Kang [12] used shock filters for edge detection. Cho et al. [11] used the Radon transform for edge based analysis; image deblurring may use a MAP algorithm or the inverse Radon transform informed by the detected edges. Jia [13] used object boundary transparency as an estimation of edge location. Fergus et al. [1] introduced natural image statistics defined by distributions of image gradients as a prior.

In the second variety of MAP-based methods, implicit regularization MAP approaches use different regularization terms to enforce desired image priors. Xu et al. [8] incorporated a regularization term to approximate the L0subscript𝐿0L_{0}italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT cost which improves computational speeds over alternative implicit sparse regularizations. The framework in [8] alternated between estimating the latent sharp image and the blur kernel in each iteration. Krishnan et al. [9] used a ratio of L1/L2subscript𝐿1subscript𝐿2L_{1}/L_{2}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms as a regularization to estimate the kernel. This helps with the attenuation of high frequencies that blur introduces in an image.

Although many of the previous works discussed above involve the estimation of generalized blur kernels, some work specializes in motion blur kernel prediction. Whyte et al. [18] proposed a new method to estimate motion blur by parametrizing a geometric model in terms of rotational velocity of the camera during exposure time. They modified the Fergus et al. [1] algorithm to implement non-uniform deblurring. Hirsch et al. [19] combined projective motion path blur models with efficient filter flow [20] to estimate motion blur at a patch level and deblur by modifying the Krishnan & Fergus [21] algorithm.

Recent methods using deep learning have used various architectures including convolutional neural networks (CNNs) [22, 23, 17, 15, 24], encoder-decoder networks [25], generative adversarial networks (GANs) [16], and fully convolutional networks (FCNs) [26] to tackle the problems of blur kernel prediction and deblurring. Sun et al. [22] used a CNN to classify the best-fit motion blur kernel for each image patch. They used 73 different motion blur kernels and were able to expand to 361 kernels by rotation of their input image. Xu et al. [23] used a CNN to recover sharp edges from blurred images and then estimated the blur kernel from the blurry and recovered edges via a MAP optimization. Li et al. [17] used a mixture of deep learning and MAP estimation to deblur an image, using a binary classification CNN trained to predict the probability of whether the input is blurry or sharp. The classifier is used as a regularization term of the latent image prior in a MAP framework to deblur. Yan & Shao [15] used a two step process to predict blur parameters: first, a deep neural network classified the type of blur and second, a general regression neural network predicted the parameters of the blur. Nasonov and Nasonova [24] used a CNN to predict length and angle for linear motion blur with several formulations of the output values, although they noted difficulty in the simultaneous estimation of length and angle. Carbajal et al. [25] used an encoder and two decoders to predict motion kernel bases for the image and mixing coefficients for each pixel. Zhang et al. [16] used a conditional GAN to deblur images, combining an adversarial loss and a perceptually-motivated content loss. Gong et al. [26] used an FCN to predict a motion flow map where each pixel has its own motion vector estimate. While many of these approaches are able to handle spatially varying blur, we focus on a thorough exploration of motion kernel parameters in uniform blur; this will serve as a framework for spatially varying blur in future work.

There are three main contributions to this work. First, we study complications that arise in defining a motion blur kernel and, in particular, we analyze which motion blur kernels exist within different kernel shapes rather than using square odd-sized kernels as often implicitly assumed in the implementation—likely for ease of coding. Second, by exploring all motion blur kernel possibilities for a suite of length and angle combinations, we train a CNN which uses regression prediction instead of classification. By formulating the prediction as a regression, we train two output nodes (one for length and one for angle) which can span the full range of possible parameter values and can be trained on any granularity of parameters. Use of a classification network would necessitate 13,034 output nodes for the granularity considered here (99 lengths and 180 angles) and modification of the network architecture to train on any other granularity or post-hoc computation (e.g., as in [22]) to define predictions at another granularity. Use of regression also may alleviate some of the issues in simultaneous prediction of length and angle noted in [24]. Third, we expand the range of additive noise beyond that studied for other methods [14, 11, 23, 17, 9, 18] to analyze Gaussian noise up to a variance of 10% (10 dB SNR).

3 Blur Kernels and Blurred Dataset

A linear blur kernel can be parametrized by the length and orientation angle of a line. To formulate a discrete blur kernel for application to a digital image using Eq. (1), we specifically distinguish between a continuous line and a discrete pixel line. Next, we explore angle and length combinations that exist within linear motion blur kernels. Finally, we describe how the 2014 COCO dataset [27] is used to create a new blurred dataset for deep learning training and testing.

3.1 Continuous and Pixel Lines

Here, the term “line” refers to a line segment, r𝑟ritalic_r denotes a Euclidean distance (or a continuous or discrete line), rsubscript𝑟r_{\infty}italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is a Chebyshev distance for a discrete line, θ(90,90]𝜃9090\theta\in(-90,90]italic_θ ∈ ( - 90 , 90 ] denotes the orientation (angle) of a continuous line with respect to the horizontal axis in units of degrees, and ϕ(90,90]italic-ϕ9090\phi\in(-90,90]italic_ϕ ∈ ( - 90 , 90 ] is the angle of a discrete pixel line with the same conventions. For a given length r𝑟ritalic_r, the set of possible continuous line angles is denoted Θ(90,90]Θ9090\Theta\in(-90,90]roman_Θ ∈ ( - 90 , 90 ] and the set of possible discrete pixel line angles is ΦΘΦΘ\Phi\subseteq\Thetaroman_Φ ⊆ roman_Θ. A line in a 2D continuous domain 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a 1D object with zero width and length r𝑟ritalic_r. On the other hand, a pixel line in a 2D discrete domain 2superscript2\mathbb{Z}^{2}blackboard_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is defined on a discrete pixel grid which gives the line a non-zero width and length r𝑟ritalic_r. Figure 1 shows a continuous line with parameters (r,θ)=(3,90)𝑟𝜃390(r,\theta)=(3,90)( italic_r , italic_θ ) = ( 3 , 90 ) and a pixel line with parameters (r,ϕ)=(3,90)𝑟italic-ϕ390(r,\phi)=(3,90)( italic_r , italic_ϕ ) = ( 3 , 90 ).

Refer to caption
Figure 1: (a) A continuous line with parameters (r,θ)=(3,90)𝑟𝜃3superscript90(r,\theta)=(3,90^{\circ})( italic_r , italic_θ ) = ( 3 , 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) and (b) pixel line with parameters (r,ϕ)=(3,90)𝑟italic-ϕ3superscript90(r,\phi)=(3,90^{\circ})( italic_r , italic_ϕ ) = ( 3 , 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ). Note that the pixel line has a width of 1 due to its definition on a 2D discrete grid.
Refer to caption
Figure 2: Multiple continuous lines can be represented by the same pixel line of Chebyshev length r=2subscript𝑟2r_{\infty}=2italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 2. Example continuous lines are shown in red, overlaid on the pixel line kernel (the squares) that describes those continuous lines, where white squares (pixels) correspond to the line and black squares (pixels) correspond to the absence of a line. (a) Two continuous lines (with θ1=30subscript𝜃1superscript30\theta_{1}=30^{\circ}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and θ2=ϕ=0subscript𝜃2italic-ϕsuperscript0\theta_{2}=\phi=0^{\circ}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_ϕ = 0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) described by the same ϕ=0italic-ϕsuperscript0\phi=0^{\circ}italic_ϕ = 0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT pixel line. (b) Two continuous lines (with θ1=60subscript𝜃1superscript60\theta_{1}=60^{\circ}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 60 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and θ2=ϕ=90subscript𝜃2italic-ϕsuperscript90\theta_{2}=\phi=90^{\circ}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_ϕ = 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) described by the same ϕ=90italic-ϕsuperscript90\phi=90^{\circ}italic_ϕ = 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT pixel line. (c) Three continuous lines (with θ1=31subscript𝜃1superscript31\theta_{1}=31^{\circ}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 31 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, θ3=ϕ=45subscript𝜃3italic-ϕsuperscript45\theta_{3}=\phi=45^{\circ}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_ϕ = 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, and θ2=59subscript𝜃2superscript59\theta_{2}=59^{\circ}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 59 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT described by the same ϕ=45italic-ϕsuperscript45\phi=45^{\circ}italic_ϕ = 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT pixel line. (d) Three continuous lines (with θ1=31subscript𝜃1superscript31\theta_{1}=-31^{\circ}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - 31 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, θ3=ϕ=45subscript𝜃3italic-ϕsuperscript45\theta_{3}=\phi=-45^{\circ}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_ϕ = - 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, and θ2=59subscript𝜃2superscript59\theta_{2}=-59^{\circ}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 59 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT described by the same ϕ=45italic-ϕsuperscript45\phi=-45^{\circ}italic_ϕ = - 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT pixel line.

One limitation in defining pixel lines is the interrelation between length and angle. In essence, a shorter pixel line will have a limited number of unambiguous angles. Figure 2(a) illustrates how multiple continuous lines with angle θ[0,30]𝜃030\theta\in[0,30]italic_θ ∈ [ 0 , 30 ] can be interpreted by the same horizontal pixel line of length r=2subscript𝑟2r_{\infty}=2italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 2. Similarly, a pixel line of length r=2subscript𝑟2r_{\infty}=2italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 2 and angle ϕ=45italic-ϕ45\phi=45italic_ϕ = 45 [Fig. 2(c)] can describe many intermediate lines with θ(30,60)𝜃3060\theta\in(30,60)italic_θ ∈ ( 30 , 60 ). The four pixel lines in Fig. 2 show the only pixel lines that can be described for length r=2subscript𝑟2r_{\infty}=2italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 2. This means that, for shorter length motion blurs, there will be gaps in the angles that can be represented. For example, a length r=2subscript𝑟2r_{\infty}=2italic_r start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 2 pixel line can only have angles ϕ{45,0,45,90}italic-ϕ4504590\phi\in\{-45,0,45,90\}italic_ϕ ∈ { - 45 , 0 , 45 , 90 } as shown in Fig. 2.

Conventional convolutional kernels are typically assumed to be square and odd in dimension. The preference for square kernels is related to the horizontal and vertical symmetry of features. The preference for odd-sized kernels is to have an unambiguous center point, as the center point is commonly assumed to be associated with the pixel being processed. The assumption of square or odd-sized kernels in the blur model of Eq. (1) is not mathematically necessary, however, and we consider blur kernels that may be non-square and/or even in one or more dimension.

3.2 Blur Kernel Creation

Refer to caption
Figure 3: (a) Unique length and angle combinations (r,ϕ)usubscript𝑟italic-ϕu(r,\phi)_{\text{u}}( italic_r , italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT for all possible discrete blur kernels for r[2,100]𝑟2100r\in[2,100]italic_r ∈ [ 2 , 100 ] and ϕ(90,90]italic-ϕ9090\phi\in(-90,90]italic_ϕ ∈ ( - 90 , 90 ]. Shorter length kernels have many fewer possible angles that they can represent, including the four possible angles of ϕ{45,0,45,90}italic-ϕ4504590\phi\in\{-45,0,45,90\}italic_ϕ ∈ { - 45 , 0 , 45 , 90 } for r=2𝑟2r=2italic_r = 2. (b) Number of angles that can be represented for a given kernel length. Shorter length kernels have many fewer possible angles that they can represent.

To explore the gaps in angles resulting from representation of continuous lines as pixel lines, we assume a range of Euclidean lengths r=2,3,,100𝑟23100r=2,3,\ldots,100italic_r = 2 , 3 , … , 100 and angles θ=89,88,,90𝜃898890\theta=-89,-88,\ldots,90italic_θ = - 89 , - 88 , … , 90, and gather the resulting unique (r,ϕ)𝑟italic-ϕ(r,\phi)( italic_r , italic_ϕ ) discrete line parameter pairs, illustrated in Fig. 3. We assume the h×w𝑤h\times witalic_h × italic_w kernel corresponding to an (r,θ)𝑟𝜃(r,\theta)( italic_r , italic_θ ) continuous line has hhitalic_h and w𝑤witalic_w defined as

h={rcos(θ),cos(θ)01,cos(θ)=0,cases𝑟𝜃𝜃01𝜃0h=\begin{cases}\lceil r\cos(\theta)\rceil,&\cos(\theta)\neq 0\\ 1,&\cos(\theta)=0\\ \end{cases},italic_h = { start_ROW start_CELL ⌈ italic_r roman_cos ( italic_θ ) ⌉ , end_CELL start_CELL roman_cos ( italic_θ ) ≠ 0 end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL roman_cos ( italic_θ ) = 0 end_CELL end_ROW , (2)
w={rsin(θ),sin(θ)01,sin(θ)=0,𝑤cases𝑟𝜃𝜃01𝜃0w=\begin{cases}\lceil r\sin(\theta)\rceil,&\sin(\theta)\neq 0\\ 1,&\sin(\theta)=0\\ \end{cases},italic_w = { start_ROW start_CELL ⌈ italic_r roman_sin ( italic_θ ) ⌉ , end_CELL start_CELL roman_sin ( italic_θ ) ≠ 0 end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL roman_sin ( italic_θ ) = 0 end_CELL end_ROW , (3)

where \lceil\cdot\rceil⌈ ⋅ ⌉ is the ceiling operator. The conditions cos(θ)=0𝜃0\cos(\theta)=0roman_cos ( italic_θ ) = 0 and sin(θ)=0𝜃0\sin(\theta)=0roman_sin ( italic_θ ) = 0 explicitly define a height or width of one for the horizontal and vertical cases, respectively. We use r𝑟ritalic_r as the Euclidean distance labels and draw an (r,θ)𝑟𝜃(r,\theta)( italic_r , italic_θ ) continuous line (e.g., the red lines in Fig. 2) through an h×w𝑤h\times witalic_h × italic_w 2D discrete pixel grid (e.g., the 1×2121\times 21 × 2, 2×1212\times 12 × 1, and 2×2222\times 22 × 2 pixel grids in Fig. 2), resulting in a discrete pixel line (e.g., the white pixels in Fig. 2). The ceiling operator introduces a quantization error; since r𝑟ritalic_r may result in a line ending mid-pixel, the ceiling operator corresponds to the choice to use the pixel where r𝑟ritalic_r ends as part of the motion blur kernel. The worst case for the quantization error is for angles θ=±45𝜃plus-or-minus45\theta=\pm 45italic_θ = ± 45 with the Euclidean and Chebyshev distances differing by a factor of 22\sqrt{2}square-root start_ARG 2 end_ARG.

We use the line function from the skimage.draw library in Python to draw a continuous line through the pixel grid, resulting in a discrete pixel line consistent with the Bresenham digital line algorithm [28]. If θ𝜃\thetaitalic_θ is positive, the line is drawn from the lower-left corner (h,0)0(h,0)( italic_h , 0 ) to the upper-right corner (0,w)0𝑤(0,w)( 0 , italic_w ), e.g., Fig. 2(c), and if θ𝜃\thetaitalic_θ is negative, the line is drawn from the upper-left corner (0,0)00(0,0)( 0 , 0 ) to the lower-right corner (h,w)𝑤(h,w)( italic_h , italic_w ), e.g., Fig. 2(d).

For each length r=2,3,,100𝑟23100r=2,3,\ldots,100italic_r = 2 , 3 , … , 100, we generate a discrete blur kernel associated with a continuous line of length r𝑟ritalic_r and angle θ=0,1,,90𝜃0190\theta=0,1,\ldots,90italic_θ = 0 , 1 , … , 90. Noting that multiple angles θ𝜃\thetaitalic_θ may result in the same discrete blur kernel (see Fig. 2), a single discrete angle ϕitalic-ϕ\phiitalic_ϕ must be defined for each unique discrete blur kernel for use as a training label. We thus quantize the continuous angle θ𝜃\thetaitalic_θ for a given blur kernel as

ϕ={0,0Θ90,90Θmedian(Θ),else,italic-ϕcases00Θ9090ΘmedianΘelse\phi=\begin{cases}0,&0\in\Theta\\ 90,&90\in\Theta\\ \lceil\text{median}(\Theta)\rceil,&\text{else}\end{cases},italic_ϕ = { start_ROW start_CELL 0 , end_CELL start_CELL 0 ∈ roman_Θ end_CELL end_ROW start_ROW start_CELL 90 , end_CELL start_CELL 90 ∈ roman_Θ end_CELL end_ROW start_ROW start_CELL ⌈ median ( roman_Θ ) ⌉ , end_CELL start_CELL else end_CELL end_ROW , (4)

where ΘΘ\Thetaroman_Θ is the set of continuous angles θ𝜃\thetaitalic_θ resulting in the given blur kernel and where \lceil\cdot\rceil⌈ ⋅ ⌉ is the ceiling operator necessary to define an integer angle in cases that ΘΘ\Thetaroman_Θ contains an even number of elements. Note that this definition of ϕitalic-ϕ\phiitalic_ϕ has two special cases. If 0Θ0Θ0\in\Theta0 ∈ roman_Θ, the kernel is consistent with a horizontal line [see Fig. 2(a)] and we assign ϕ=0italic-ϕ0\phi=0italic_ϕ = 0. Similarly, if 90Θ90Θ90\in\Theta90 ∈ roman_Θ, the kernel is consistent with a vertical line [see Fig. 2(b)] and we assign ϕ=90italic-ϕ90\phi=90italic_ϕ = 90. These special cases avoid the median operator assigning an erroneous label to a horizontal or vertical line.

We compute all possible (r,ϕ)𝑟italic-ϕ(r,\phi)( italic_r , italic_ϕ ) combinations by generating the blur kernel for r=2,3,,100𝑟23100r=2,3,\ldots,100italic_r = 2 , 3 , … , 100 and θ=0,1,,90𝜃0190\theta=0,1,\ldots,90italic_θ = 0 , 1 , … , 90 and computing the resulting blur kernel angle ϕitalic-ϕ\phiitalic_ϕ using Eq. (4). We span only positive angles θ[0,90]𝜃090\theta\in[0,90]italic_θ ∈ [ 0 , 90 ] since kernels with negative angles will be symmetric versions of the kernels with positive angles. The resulting values r𝑟ritalic_r and ϕitalic-ϕ\phiitalic_ϕ that yield unique discrete pixel lines are denoted as (r,ϕ)usubscript𝑟italic-ϕu(r,\phi)_{\text{u}}( italic_r , italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT. These parameter values are also used as labels to train a network to predict the blur parameters from a blurry image (Sec. 4) using (r,ϕ)=(1,0)𝑟italic-ϕ10(r,\phi)=(1,0)( italic_r , italic_ϕ ) = ( 1 , 0 ) as labels for a non-blurred image.

Figure 3 illustrates the (r,ϕ)usubscript𝑟italic-ϕu(r,\phi)_{\text{u}}( italic_r , italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT combinations as a scatter plot, where we note the existence of gaps where a continuous line cannot be described by a pixel line. As expected, gaps are larger for smaller lengths, indicating limited unique pixel lines (blur kernels) for shorter blur lengths. Figure 3 plots the number of unique angles ϕitalic-ϕ\phiitalic_ϕ versus length r𝑟ritalic_r where we note that a line must have approximately 70 pixels or more in order to represent all orientations ϕ=89,88,,90italic-ϕ898890\phi=-89,-88,\ldots,90italic_ϕ = - 89 , - 88 , … , 90. This exploration of the (r,ϕ)𝑟italic-ϕ(r,\phi)( italic_r , italic_ϕ ) space explored 17,820 possible (r,θ)𝑟𝜃(r,\theta)( italic_r , italic_θ ) lines resulting in 13,034 unique (r,ϕ)usubscriptsubscript𝑟,italic-ϕu(r_{,}\phi)_{\text{u}}( italic_r start_POSTSUBSCRIPT , end_POSTSUBSCRIPT italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT kernels; this is orders of magnitude larger than other explorations, e.g., the 361 kernels in [22].

3.3 COCO Blurred Dataset

We use the 2014 COCO dataset [27] to create a blurred dataset for training and validating a model for blind estimation of length and angle given only a blurry image. The 2014 COCO dataset consists of a training dataset with 82,783 images and a validation set of 40,504 images. The training dataset is used to create a blurred training dataset and the validation dataset is split (with no overlap) to create blurred validation and test datasets.

Refer to caption
Figure 4: Representation of COCO images versus length and angle. (a) Number of unique COCO images per angle in training and validation/testing datasets. (b) Number of unique COCO images per length in training and validation/testing datasets.

The blurred dataset is created by defining a blur generator that loops through the labels (r,ϕ)usubscript𝑟italic-ϕu(r,\phi)_{\text{u}}( italic_r , italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT, creating the corresponding blur kernel using the skimage.draw function (see Sec. 3.2), and convolving the blur kernel with an image from the COCO dataset. A random COCO image is chosen (without replacement) for each length r𝑟ritalic_r to provide a variety of images in the blurred dataset. There is a dataset imbalance in lengths and/or angles represented in the blurred dataset due to the uneven representation of angles for shorter lengths (see Fig. 3). We improve the balance by creating a minimum of 175 blurred images per length, implying that more unique COCO images are used for shorter blur lengths. Each run of the blur generator creates 21,789 blurred images. The creation of the training set used 12 parallel threads of the blur generator, each operating on a subset of the COCO training dataset, creating a total of 261,468 blurred images. The validation and testing sets were each created with one run of the blur generator operating on the validation and testing splits of COCO validation dataset, creating a total of 21,789 blurred images for each. Figure 4 shows the number of unique COCO images versus length and angle for the training, validation, and testing datasets. We notice a spike in the number of unique images for length r=1𝑟1r=1italic_r = 1 and for angle ϕ=0italic-ϕ0\phi=0italic_ϕ = 0 due to the non-blurred images that are part of the dataset. In general, more unique images are used for shorter lengths as the blur generator needs to loop through more images to complete the minimum threshold of images per length. We notice also that small length blur kernels create peaks at angles of ϕ={45,0,45,90}italic-ϕ4504590\phi=\{-45,0,45,90\}italic_ϕ = { - 45 , 0 , 45 , 90 } since smaller length kernels are limited in the angles they are capable of representing.

Refer to caption
Figure 5: Distribution of blurred images for angle and length. (a) Number of blurred images per angle. (b) Number of blurred images per length.

There is an interdependence between angle and length, resulting in an inability to completely balance both length and angle in the blurred dataset. Figure 5 shows the distribution of labeled blurred images with respect to angle and length. Our choice to generate a minimum of 175 blurred images per length in the blur generator favors a more uniform length distribution [Fig. 5(b)]. This, however, results in peaks in the angle distribution [Fig. 5(a)] due to the over-representation of certain angles for smaller lengths (see Fig. 3). A choice to favor a more uniform angle distribution, however, would result in smaller representation of shorter blur lengths, creating a more problematic data imbalance. While there are other choices that could be made regarding data imbalance here, we find that this blurred dataset can be used to train a network to accurately predict both length and angle.

4 Uniform Blur Prediction

We formulate blur length and angle estimation as a deep learning regression problem using the VGG16 [6] architecture as the backbone of our model. VGG16 was trained from scratch with TensorFlow’s default layer initializations to predict both the length and angle parameters for a uniform linear motion blurred image. The last layer of VGG16 was modified to have two output nodes (corresponding to prediction of length and angle) with sigmoid activations. Since the sigmoid activation outputs values in the range [0,1]01[0,1][ 0 , 1 ], length and angle are normalized to be in the range [0,1]01[0,1][ 0 , 1 ] to train the model. The predicted length and angle parameters are re-scaled to their native ranges (r[1,100]𝑟1100r\in[1,100]italic_r ∈ [ 1 , 100 ], ϕ(90,90]italic-ϕ9090\phi\in(-90,90]italic_ϕ ∈ ( - 90 , 90 ]) for validation.

Due to the fixed input size of VGG16 (224×224224224224\times 224224 × 224 pixels), we created a TensorFlow data generator to randomly crop the blurred COCO images. We crop instead of resizing the image to maintain accuracy in the blur angle labels since a resize could change the aspect ratio of the image and thus the blur angle. If a blurred image was smaller than the input size for VGG16, it was skipped and not used in training; in total there were 892 training, 5 validation, and 13 test images that were skipped. This results in less than 0.3%percent0.30.3\%0.3 % of images skipped for each dataset.

We use the Adam optimizer to with a learning rate of 0.1 and epsilon of 0.1 and the mean squared error (MSE) as the loss function. A batch size of 50 was empirically determined to mitigate convergence issues. The model is trained for 50 epochs, saving the weights for the best model throughout training; training is terminated if the MSE performance has not improved within the previous 5 epochs. Training takes about 25 minutes per epoch and about 12 hours to fully train on an NVIDIA RTX-3090.

We performed multiple trainings of the network for different noise levels, resulting in four different models. The first model is trained with no noise and the other three models are trained with different levels of additive white Gaussian noise with variance σ2{0.001,0.01,0.1}superscript𝜎20.0010.010.1\sigma^{2}\in\{0.001,0.01,0.1\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.001 , 0.01 , 0.1 }, corresponding to signal-to-noise (SNR) values of {30,20,10}302010\{30,20,10\}{ 30 , 20 , 10 } dB, respectively. Noise is added in our data generator after the blurred image is cropped and normalized to the range [0,1]01[0,1][ 0 , 1 ]. Each model is tested by predicting blur parameters for the noiseless test set and for three additional test runs in which three noise levels are added to the test set.

5 Experiments and Results

This section presents results using the R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT coefficient of determination metric for evaluating the regression model. Next, we consider a scenario with additive noise and present results our model’s predictions. Finally we compare with other methods of blur kernel estimation [9, 14, 25] and evaluate by assessing deblurred images using the error ratio score as introduced by [7].

5.1 Metrics

We validate the performance of blur estimation using metrics that measure the accuracy of the parameter estimation itself and also the quality of an image deblurred using the estimated kernel.

5.1.1 Accuracy of Parameter Estimation

In testing, we measure performance using the coefficient of determination, R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which measures the goodness of fit between actual known values of variable yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and estimated values xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

R2=1i=1n(yixi)2i=1n(yiy¯)2,superscript𝑅21superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖subscript𝑥𝑖2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖¯𝑦2R^{2}=1-\frac{\sum_{i=1}^{n}(y_{i}-x_{i})^{2}}{\sum_{i=1}^{n}(y_{i}-\bar{y})^{% 2}},italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 - divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_y end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (5)

where y¯¯𝑦\bar{y}over¯ start_ARG italic_y end_ARG is the mean of the known variables yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and n𝑛nitalic_n is the number of samples [29]. The numerator i=1n(yixi)2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖subscript𝑥𝑖2\sum_{i=1}^{n}(y_{i}-x_{i})^{2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the sum of squares of the residual prediction errors and the denominator i=1n(yiy¯)2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖¯𝑦2\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_y end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is proportional to the variance of the known data. A perfect model will have zero residual errors and thus an R2=1superscript𝑅21R^{2}=1italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1. A naïve model that always predicts the average of the data y¯¯𝑦\bar{y}over¯ start_ARG italic_y end_ARG will have equal numerator and denominator and thus an R2=0superscript𝑅20R^{2}=0italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0. Models that have predictions worse than the naïve model will have an R2<0superscript𝑅20R^{2}<0italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < 0.

5.1.2 Quality of Deblurred Image

In addition to studying the accuracy of blur kernel estimation, we measure the quality of the image deblurred using the estimated blur kernel in a non-blind deblurring method (see Sec. 5.3). To measure the quality of the deblurred image, we use the error ratio as presented in [7]. The error ratio is motivated by the fact that, even with a perfectly estimated blur kernel, one may not be able to perfectly predict the latent sharp image. The error ratio EK^/EKsubscript𝐸^𝐾subscript𝐸𝐾E_{\hat{K}}/E_{\vphantom{\hat{K}}{K}}italic_E start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT / italic_E start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is computed by considering the error EK^subscript𝐸^𝐾E_{\hat{K}}italic_E start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT between the true sharp image and the latent sharp image recovered using the estimated blur kernel K^^𝐾\hat{K}over^ start_ARG italic_K end_ARG and the error EKsubscript𝐸𝐾E_{\vphantom{\hat{K}}{K}}italic_E start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT between the true sharp image and the latent sharp image recovered using the true blur kernel K𝐾Kitalic_K. An error ratio of 1 indicates that the images deblurred using the estimated and true kernel are identical.

We use the sum of squared differences (SSD) as in [7] to define the error ratio. Error ratios are generally presented as a cumulative histogram for error ratios binned in the range [1,4]14[1,4][ 1 , 4 ]. As noted in [7], SSD error ratios above 2 tend to indicate significant perceptually noticeable distortion present in the image deconvolved with the estimated kernel as compared to the image deconvolved with the true kernel.

5.2 Accuracy of Parameter Estimation

5.2.1 Noise-Free Predictions

A VGG16 based model (Sec. 4) was first trained on blurred images without any additive noise. Scatter plots of estimated versus actual length and angle are shown in Fig. 6 along with the corresponding R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT scores. We find the model to be highly accurate for prediction of both length and angle as demonstrated by the R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT scores of 0.9869 and 0.9935, respectively. In Fig. 6(a) we note a larger spread in estimated values as the length increases. This is not surprising, as larger blur lengths are expected to be more difficult to accurately estimate. In Fig. 6(b) we note certain angles have a larger spread in estimated values. This is most notable for ϕ{45,0,45,90}italic-ϕ4504590\phi\in\{-45,0,45,90\}italic_ϕ ∈ { - 45 , 0 , 45 , 90 }, but can be noted for other angles. This is due to errors in prediction for the smaller length kernels which have a limited set of angles. It is important to recall, however, that many of these incorrect angle predictions will result in a correct blur kernel. As an example, an image blurred with kernel parameters (r,ϕ)=(2,0)𝑟italic-ϕ20(r,\phi)=(2,0)( italic_r , italic_ϕ ) = ( 2 , 0 ) may have an angle prediction of ϕ^=27^italic-ϕ27\hat{\phi}=27over^ start_ARG italic_ϕ end_ARG = 27. However, since blur kernels of length r=2𝑟2r=2italic_r = 2 can only be represented by ϕ{45,0,45,90}italic-ϕ4504590\phi\in\{-45,0,45,90\}italic_ϕ ∈ { - 45 , 0 , 45 , 90 }, the blur kernel created with parameters (r,ϕ)=(2,27)𝑟italic-ϕ227(r,\phi)=(2,27)( italic_r , italic_ϕ ) = ( 2 , 27 ) will generate a blur kernel estimate of (r,ϕ)u=(2,0)subscript𝑟italic-ϕu20(r,\phi)_{\text{u}}=(2,0)( italic_r , italic_ϕ ) start_POSTSUBSCRIPT u end_POSTSUBSCRIPT = ( 2 , 0 ) which is correct.

Refer to caption
Figure 6: Scatterplots of estimated versus actual values for (a) length and (b) angle. Individual estimates are represented as blue points and the best fit linear line is plotted in red. (a) Scatterplot of estimated versus actual length, R2=0.9869superscript𝑅20.9869R^{2}=0.9869italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.9869. (b) Scatterplot of estimated versus actual angle, R2=0.9935superscript𝑅20.9935R^{2}=0.9935italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.9935.

5.2.2 Predictions with Additive Noise

We used the model from Sec. 5.2.1, trained on noise-free blurred images, and tested it on images with additive white Gaussian noise with variance σ2{0.001,0.01,0.1}superscript𝜎20.0010.010.1\sigma^{2}\in\{0.001,0.01,0.1\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.001 , 0.01 , 0.1 }, corresponding to SNR values of {30,20,10}302010\{30,20,10\}{ 30 , 20 , 10 } dB, respectively. Results for this experiment are shown in the first row of Tables 1 and 2 for length and angle prediction, respectively. We note the estimation of the length parameter is more susceptible to noise than angle, resulting in a complete failure of prediction (R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT scores less than 0) for even the smallest level of additive noise, σ2=0.001superscript𝜎20.001\sigma^{2}=0.001italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.001. We hypothesize that additive noise can alter the intensity distribution along the blur path in a blurry image, creating the appearance of artificially shorter or longer blur paths. Those blur paths, however, will likely retain more characteristics of their angle for the same level of noise.

Table 1: R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT score for length prediction for training and testing under different levels of additive noise.
Testing σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Training σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 0 0.001 0.01 0.1
0 0.9869 -0.26 -3.17 -3.17
0.001 0.9607 0.9557 -1.14 -3.12
0.01 0.9527 0.9539 0.9523 -2.91
0.1 0.8923 0.8932 0.8960 0.8772
Table 2: R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT score for angle prediction for training and testing under different levels of additive noise.
Testing σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Training σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 0 0.001 0.01 0.1
0 0.9935 0.8772 0.3935 0
0.001 0.9758 0.9754 0.6509 -0.16
0.01 0.9733 0.9735 0.9682 0.0128
0.1 0.8999 0.9009 0.9010 0.8834

We trained three additional models using noisy blurred images with additive white Gaussian noise with variance σ2{0.001,0.01,0.1}superscript𝜎20.0010.010.1\sigma^{2}\in\{0.001,0.01,0.1\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.001 , 0.01 , 0.1 } and tested each of those models on noise-free and noisy images, i.e., for σ2{0,0.001,0.01,0.1}superscript𝜎200.0010.010.1\sigma^{2}\in\{0,0.001,0.01,0.1\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0 , 0.001 , 0.01 , 0.1 }. Results for those three models are in the second through fourth rows of Tables 1 and 2 for length and angle prediction, respectively. We again note a higher sensitivity to noise in the prediction of length (Table 1) than angle (Table 2). We further note that the R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT score decreases by 0.1similar-toabsent0.1\sim 0.1∼ 0.1 for the model trained on the highest level of noise σ2=0.1superscript𝜎20.1\sigma^{2}=0.1italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.1 and tested on noise-free images, compared to the noise-free model tested on noise-free images, but that the same model is robust to varying levels of noise. Finally we note that models trained on noisy data appear to be robust to noise levels less than or equal to the noise level on which they are trained. This implies that a single model trained on a single noise level can yield accurate predictions even for smaller noise levels not seen in training.

Most other methods [14, 11, 23, 17] that test under additive noise test up to a Gaussian noise of σ2=0.01superscript𝜎20.01\sigma^{2}=0.01italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.01 to simulate sensor noise, while some [9, 18] test noise up to σ2=0.02superscript𝜎20.02\sigma^{2}=0.02italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.02. In comparison, our model is tested up to σ2=0.1superscript𝜎20.1\sigma^{2}=0.1italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.1 and demonstrates a higher tolerance for noise which can be an advantage when modeling atmospheric turbulence.

5.3 Quality of Deblurred Images

5.3.1 Deconvolution and Comparison Methods

We quantify the performance of our blur parameter estimation by using the expected patch log likelihood (EPLL) method [30] to deconvolve the images using our predicted motion blur kernel and the ground truth blur kernel. We used the Python implementation available at [31], replacing the Gaussian kernel with a linear motion blur kernel. EPLL only accepts square, odd-sized blur kernels, necessitating a padding of kernels to be square and odd-sized. We symmetrically zero pad the shortest side of the kernel to keep the kernel centered and pad to match the size of the longest side. If the longest side is even, however, this results in an even-sized square kernel. We use a similar approach to [32], creating four odd-sized kernels each zero padded with the line asymmetrically offset toward a different corner. The blurred image is deconvolved with each of the four kernels and the average of the four deconvolved images is considered the resulting deblurred image.

Additionally, we compare performance with the blur kernel estimation methods from Levin et al. [14], Krishnan et al. [9], and Carbajal et al. [25] These works also include methods for estimating the deblurred image. For fairness of comparisons, we use only the blur kernel estimate from [14, 9, 25] and use those blur kernel estimates in the same EPLL [30] deblurring framework as described above. For the method in [14], we used the Matlab implementation provided at [33], using the deconv_diagfe_filt_sps function with a kernel prediction size of (101,101)101101(101,101)( 101 , 101 ) to allow prediction of the largest kernel size in our blurred dataset. Since the method in [14] estimates a blur kernel for each of the three channels in a color image, we applied EPLL to each channel using the kernel estimated for that channel. For the method in [9], we used the Matlab implementation provided at [34], again with kernel size of (101,101)101101(101,101)( 101 , 101 ). For the method in [25], we used the python implementation provided at [35] with the fixed kernel size of (33,33)3333(33,33)( 33 , 33 ); we note that this will put this method at a disadvantage in predicting longer blur lengths. Additionally, we estimated a single blur kernel for the uniformly blurred images by using the image-averaged mixing coefficients in the superposition of the kernel bases; this allows us to use the same EPLL deblurring framework for comparison to the other methods.

5.3.2 Reduced Test Set

Due to the computational complexity of the EPLL deconvolution [30], as well as the kernel estimation methods in [9, 14, 25], we generated reduced-size test datasets for these experiments. From the test dataset (Sec. 3.3), two subsets are created to span length and angle, respectively. The first subset, subsequently referred to as Length 5 (L5), uses 5555 randomly selected blurred images for each length, totaling 500500500500 images. The second subset, subsequently referred to as Angle 3 (A3) uses 3333 randomly selected blurred images for each angle, totaling 540540540540 images. All images in these reduced test sets are noise free.

5.3.3 Error Ratio Comparisons

We calculated SSD error ratios for our proposed blur kernel estimation as well as those from Levin et al. [14], Krishnan et al. [9], and Carbajal et al. [25] as seen in Fig. 7. Recall that an error ratio of 1 indicates that deblurring with the estimated kernel results in an identical image to that deblurred with the ground truth kernel. An error ratio of 2 or higher is considered unacceptable as it was shown in [7] that such SSD error ratios indicate the presence of significant perceptually noticeable distortions in the latent image estimated using the estimated kernel. An error ratio <1absent1<1< 1 indicates that the image deblurred with the estimated kernel achieves a better match to the true sharp image than deblurring with the true kernel.

Refer to caption
Figure 7: Error ratios for the (a) A3 sub-dataset and (b) L5 sub-dataset. The bin at 1.0 includes error ratios 1.0absent1.0\leq 1.0≤ 1.0 and the bin at 3.0 includes error ratios 3.0absent3.0\geq 3.0≥ 3.0. The results are presented for the proposed method ( ), Krishnan et al. [9] ( ), Levin et al. [14] ( ), and Carbajal et al. [25] ( ).

Our proposed method has the highest cumulative histogram values for the error ratio at 1 for both A3 and L5 datasets and yields an error ratio of 1.25 or less for most images in the A3 and L5 test sets. This means that our model is able to predict more accurate kernels compared to the other methods. Since our method creates the kernel using linear parameters, we are less prone to additive noise in the kernel. Levin et al.’s [14] kernel prediction has noise added in the kernel since many of the pixels that are supposed to be zero are instead small numbers close to zero. This noise can be seen to affect its results in kernel prediction. Krishnan et al. [9] thresholds the small elements of the kernel to zero which increases robustness to noise. Carbajal et al. [25] has competitive performance for error ratios \geq1.25; the method is at a disadvantage at longer blur lengths due to its limitation to 33×33333333\times 3333 × 33 kernels as noted above which may contribute to the diminished performance at the lowest error ratio.

Refer to caption
Figure 8: Deblurred images for image blurred with a kernel of length 5 and angle 17. All images are deblurred with the EPLL method [30]. (a) Image deblurred using kernel estimated with our proposed method, error ratio 1.2858. (b) Image deblurred using kernel estimated with the method in Krishan et al. [9], error ratio 1.1320. (c) Image deblurred using kernel estimated with the method in Levin et al. [14], error ratio 1.5607. (d) Image deblurred using kernel estimated with the method in Carbajal et al. [25], error ratio 1.0396. (e) Original sharp image. (f) Image deblurred with the ground truth kernel. (g) Blurred image.
Refer to caption
Figure 9: Deblurred images for sharp input image. All images are deblurred with the EPLL method [30]. (a) Image deblurred using kernel estimated with our proposed method, error ratio 1.5813. (b) Image deblurred using kernel estimated with the method in Krishan et al. [9], error ratio 1.7345. (c) Image deblurred using kernel estimated with the method in Levin et al. [14], error ratio 4.4847. (d) Image deblurred using kernel estimated with the method in Carbajal et al. [25], error ratio 1.3388. (e) Original sharp image. (f) Image deblurred with the ground truth kernel.

Figures 8 and 9 show two qualitative examples of images deblurred using the kernels estimated with our proposed method and the methods in [9, 14, 25]. The example in Fig. 8 uses a blurred image as an input and the example in Fig. 9 has a sharp image as an input for demonstration of the methods in the absence of blur. We note similar qualitative results between the deblurred images using our estimated blur kernel and the blur kernel estimated by the method in [9, 25], and that both methods yield similar results to the image deblurred with the truth kernel. We do note, however, some ringing in the images deblurred using the kernel from [9]; this is particularly noticeable in the areas around the upper power line in the image in Fig. 8(b) and the darker snow shadows in Fig. 9(b). The blur kernel estimated by the method in [14] introduces significant artifacts, especially apparent in Fig. 9(c) but also apparent in Fig. 8(c) as dark regions in the sign. The blur kernel estimated by [25] yields very good qualitative results, perhaps indicating the advantage of a kernel that is not constrained to linear motion blur when considering very large image regions. Overall, the proposed method of kernel estimation appears to yield deblurred images close to that which could be achieved with the ground truth kernel which validates this approach as the foundation for future work in estimation of motion-blur parameters in spatially-varying blur.

6 Conclusions

In this paper, we have studied in detail the limitation in representation of linear blur kernels, particularly for shorter blur lengths. We find an interdependence between length and angle in representing blur kernels, meaning that develo** a dataset that is balanced in both length and angle is not possible. Furthermore, while much existing research in linear blur prediction has implicitly assumed square odd-sized kernels, we relax this assumption and allow for non-square even-sized kernels. A blurred dataset was created from the 2014 COCO dataset using a suite of blur kernels for length r[1,100]𝑟1100r\in[1,100]italic_r ∈ [ 1 , 100 ] and angle ϕ(90,90]italic-ϕ9090\phi\in(-90,90]italic_ϕ ∈ ( - 90 , 90 ], providing a more comprehensive variety of blur kernels than previously studied.

This thorough definition of linear motion blur and the development of a blurred dataset allowed us to train a regressive deep learning model instead of a classification model as other deep learning methods implement. With regression we can estimate more accurate blur kernel parameters which are not limited by the model needing a priori information about the kernel size. We demonstrate excellent performance in estimation of blur kernel parameters with a coefficient of determination R20.89superscript𝑅20.89R^{2}\geq 0.89italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0.89 for both length and angle prediction. The robustness of the estimation to additive noise was studied for a wider range of noise than previously considered, σ2{0.001,0.01,0.1}superscript𝜎20.0010.010.1\sigma^{2}\in\{0.001,0.01,0.1\}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0.001 , 0.01 , 0.1 }, corresponding to SNR values of {30,20,10}302010\{30,20,10\}{ 30 , 20 , 10 } dB. The regression CNN was found to be very robust to noise, with a 10%percent1010\%10 % drop in the R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT metric for a 10%percent1010\%10 % Gaussian noise (which we note is an order of magnitude larger than the noise level of 1% commonly studied). We further note that the model trained on 10% Gaussian noise was robust to noise levels less than 10%, indicating that single model can serve across a wide range of noise scenarios. Using the estimated blur kernels in a non-blind deblurring method, we find sum of SSD error ratios of 1.25 or less for most test images, significantly outperforming the MAP-based comparison methods and competitive with a recent deep-learning based method.

In future work, we will use this exploration of linear motion blur kernels as a baseline and foundation for spatially-varying blurs. We will expand this to a patch level approach to decompose a spatially-varying blur image into a superposition of locally uniform blur patches. Extending the approach to non-uniform blur will provide a better means to estimate the spatially varying blur resulting from atmospheric turbulence in images. Additionally, it would be interesting to directly compare our approach to other deep learning approaches which either directly estimate length and angle or which assume locally linear motion blur kernels, e.g., [22, 24, 15, 26].

Disclosures

The authors have no conflicts of interest to disclose.

Code, Data, and Materials Availability

The code necessary to implement the proposed method of kernel estimation as well as to generate the blurred dataset from the publicly available COCO dataset is available GitHub repository at https://github.com/DuckDuckPig/Regression_Blur.

Acknowledgments

The authors gratefully acknowledge Office of Naval Research grant N00014-21-1-2430 which supported this work.

References

  • [1] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph,” in ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, (New York, NY, USA), p. 787–794, Association for Computing Machinery, 2006.
  • [2] S. Dai and Y. Wu, “Motion from blur,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2008.
  • [3] B. R. Hunt, A. L. Iler, C. A. Bailey, and M. A. Rucci, “Synthesis of atmospheric turbulence point spread functions by sparse and redundant representations,” Optical Engineering, vol. 57, no. 2, p. 024101, 2018.
  • [4] É. Thiébaut, L. Dénis, F. Soulez, and R. Mourya, “Spatially variant PSF modeling and image deblurring,” in Adaptive Optics Systems V, vol. 9909, pp. 2211–2220, SPIE, 2016.
  • [5] Y. Bahat, N. Efrat, and M. Irani, “Non-uniform blind deblurring by reblurring,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3306–3314, 2017.
  • [6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [7] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1964–1971, IEEE, 2009.
  • [8] L. Xu, S. Zheng, and J. Jia, “Unnatural L0 sparse representation for natural image deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013.
  • [9] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in CVPR 2011, pp. 233–240, 2011.
  • [10] S. Cho and S. Lee, “Fast motion deblurring,” in ACM SIGGRAPH Asia 2009 Papers, SIGGRAPH Asia ’09, (New York, NY, USA), Association for Computing Machinery, 2009.
  • [11] T. S. Cho, S. Paris, B. K. P. Horn, and W. T. Freeman, “Blur kernel estimation using the radon transform,” in CVPR 2011, pp. 241–248, 2011.
  • [12] J. H. Money and S. H. Kang, “Total variation minimizing blind deconvolution with shock filter reference,” Image and Vision Computing, vol. 26, no. 2, pp. 302–314, 2008.
  • [13] J. Jia, “Single image motion deblurring using transparency,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.
  • [14] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Efficient marginal likelihood optimization in blind deconvolution,” in CVPR 2011, pp. 2657–2664, 2011.
  • [15] R. Yan and L. Shao, “Blind image blur estimation via deep learning,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1910–1921, 2016.
  • [16] Y. Zhang, Y. Xiang, and L. Bai, “Generative adversarial network for deblurring of remote sensing image,” in 2018 26th International Conference on Geoinformatics, pp. 1–4, 2018.
  • [17] L. Li, J. Pan, W.-S. Lai, C. Gao, N. Sang, and M.-H. Yang, “Learning a discriminative prior for blind image deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [18] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Non-uniform deblurring for shaken images,” International journal of computer vision, vol. 98, pp. 168–186, 2012.
  • [19] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schölkopf, “Fast removal of non-uniform camera shake,” in 2011 International Conference on Computer Vision, pp. 463–470, 2011.
  • [20] M. Hirsch, S. Sra, B. Schölkopf, and S. Harmeling, “Efficient filter flow for space-variant multiframe blind deconvolution,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 607–614, IEEE, 2010.
  • [21] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper-Laplacian priors,” Advances in neural information processing systems, vol. 22, 2009.
  • [22] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  • [23] X. Xu, J. Pan, Y.-J. Zhang, and M.-H. Yang, “Motion blur kernel estimation via deep learning,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 194–205, 2018.
  • [24] A. V. Nasonov and A. A. Nasonova, “Linear blur parameters estimation using a convolutional neural network,” Pattern Recognition and Image Analysis, vol. 32, no. 3, pp. 611–615, 2022.
  • [25] G. Carbajal, P. Vitoria, M. Delbracio, P. Musé, and J. Lezama, “Non-uniform blur kernel estimation via adaptive basis decomposition,” arXiv preprint arXiv:2102.01026, 2021.
  • [26] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. van den Hengel, and Q. Shi, “From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
  • [28] J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, p. 25–30, Mar 1965.
  • [29] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, 2021.
  • [30] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in 2011 international conference on computer vision, pp. 479–486, IEEE, 2011.
  • [31] R. Friedman, “EPLL PyTorch Implementation.” https://github.com/friedmanroy/torchEPLL. Accessed: 2023-02-05.
  • [32] S. Wu, G. Wang, P. Tang, F. Chen, and L. Shi, “Convolution with even-sized kernels and symmetric padding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
  • [33] A. Levin, Y. Weiss, F. Durand, and W. Freeman, “Efficient marginal likelihood optimization in blind deconvolution.” https://webee.technion.ac.il/people/anat.levin/. Accessed: 2023-02-05.
  • [34] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure.” https://dilipkay.wordpress.com/blind-deconvolution/. Accessed: 2023-02-05.
  • [35] G. Carbajal, P. Vitoria, M. Delbracio, and P. Musé, “Non-uniform motion blur kernel estimation via adaptive decomposition.” https://github.com/GuillermoCarbajal/NonUniformBlurKernelEstimation. Accessed: 2024-02-12.