Estimation of motion blur kernel parameters using regression convolutional neural networks^1,2

Luis G. Varela, Laura E. Boucheron, Steven Sandoval, David Voelz, and Abu Bucker Siddik
Klipsch School of Electrical and Computer Engineering
New Mexico State University
Las Cruces, NM 88001, USA
{varelal,lboucher,spsandov,davvoelz,siddik}@nmsu.edu

Abstract

Many deblurring and blur kernel estimation methods use a maximum a posteriori (MAP) approach or deep learning-based classification techniques to sharpen an image and/or predict the blur kernel. We propose a regression approach using convolutional neural networks (CNNs) to predict parameters of linear motion blur kernels, the length and orientation of the blur. We analyze the relationship between length and angle of linear motion blur that can be represented as digital filter kernels. A large dataset of blurred images is generated using a suite of blur kernels and used to train a regression CNN for prediction of length and angle of the motion blur. The coefficients of determination for estimation of length and angle are found to be greater than or equal to 0.89, even under the presence of significant additive Gaussian noise, up to a variance of 10% (SNR of 10 dB). Using our estimated kernel in a non-blind image deblurring method, the sum of squared differences error ratio demonstrates higher cumulative histogram values than comparison methods, with most test images yielding an error ratio of less than or equal to 1.25.

¹¹footnotetext: The official version of this paper appears as Luis G. Varela, Laura E. Boucheron, Steven Sandoval, David Voelz, Abu Bucker Siddik, “Estimation of motion blur kernel parameters using regression convolutional neural networks,” J. Electron. Imaging 33(2), 023062 (2024); doi: 10.1117/1.JEI.33.2.023062.²²footnotetext: Copyright 2024 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

1 Introduction

Linear motion blur has been studied as a model for camera shake, camera platform movement, and moving objects during imaging [1, 2]. The studies presented within were designed as an initial exploration of a regression convolutional neural network (CNN) to estimate linear motion blur parameters from blurry images. The effects of atmospheric turbulence in an image are generally represented by a spatially-varying blur [3, 4], while recent work has demonstrated that spatially-varying blurred images can be modeled as a superposition of locally linear motion blurs [5]. One way to parametrize linear motion blur kernels is by length and orientation. In this work, we study the ability to accurately estimate the length and angle parameters of a uniform linear motion blurred image. The results may then be used as a foundation upon which to build methods to estimate spatially-varying blur parameters and to serve as a baseline in subsequent studies.

A uniform blurred image can be described by

I_{\text{b}}=K*I_{\text{s}}+N,

(1)

where $I_{\text{b}}$ is the blurred image, $K$ is the blur kernel, $I_{\text{s}}$ is the sharp latent image, $N$ is additive noise, and $*$ is the convolution operator. This formulation assumes a uniform blur because the same kernel $K$ is applied across the entire image.

The main contributions of this paper are a thorough exploration of linear motion blur kernels and development of a regression CNN for blind prediction of motion blur parameters from blurry images. We study the space of motion blur parametrized by length and angle, specifically the subset of linear motion blur that can be described as 2D discrete motion blur kernels. Furthermore, we include motion blur kernels that are not described as square odd-sized kernels and study complications that arise in defining 2D discrete motion blur kernels. To that end we create a dataset of blurred images spanning a large range of possible blur kernels for use in training and testing a regression CNN based on the VGG16 [6] network to predict linear motion blur kernel parameters. The network is analyzed using the coefficient of determination $R^{2}$ score, which is a common metric to quantify performance of regression analysis, as well as a deconvolved error ratio (the ratio of the error when deconvolving using the predicted blur kernel to the error when deconvolving using the actual blur kernel) [7].

The organization of this paper is as follows. Section 2 gives a discussion on previous work including classical deblurring and deep learning deblurring methods. Section 3 contains a detailed description of the exploration of the linear blur kernel parameter space as well as the creation of a blurred dataset for deep learning training. Section 4 presents the proposed regression CNN for blur parameter prediction and Sec. 5 presents results of our regression prediction method along with comparison to previous work. Finally, in Sec. 6 we provide a conclusion and briefly discuss our future work.

2 Related Work

Deblurring and kernel blur estimation have been extensively studied in computer vision with applications of recovering the sharp image from blurry images caused by camera shake, fast moving objects in frame, or atmospheric turbulence. Prior to deep learning methods, many researchers used a maximum a posteriori (MAP) approach for both blur kernel estimation and deblurring. Two commonly used varieties of MAP are implicit regularization as seen in [8, 9] and explicit edge prediction based approaches like those in [10, 11, 12, 13, 1]. While many approaches have used MAP to estimate both the latent image and the blur kernel, Levin et al. [14, 7] prove that this approach tends to favor the no-blur explanation (i.e., that the kernel is an impulse and the “deblurred” image is the blurry image) instead of converging to the true blur kernel and latent sharp image. Moreover, it is advocated in [14, 7] that estimating only the blur kernel is a better approach since there are fewer parameters to determine than if one were to also estimate the latent sharp image. Levin et al. [14, 7] use an expectation-maximization (EM) framework to optimize kernel prediction with either a Gaussian or sparse prior. More recently methods have considered deep learning approaches to estimate the blur kernel [15], estimate the latent sharp image [16], or both [17].

In the first variety of MAP-based methods, edge-based approaches in a MAP framework use extracted edges as an image prior in the MAP optimization. Cho et al. [10] introduced a gradient computation using derivatives of the image to compute edges. Money & Kang [12] used shock filters for edge detection. Cho et al. [11] used the Radon transform for edge based analysis; image deblurring may use a MAP algorithm or the inverse Radon transform informed by the detected edges. Jia [13] used object boundary transparency as an estimation of edge location. Fergus et al. [1] introduced natural image statistics defined by distributions of image gradients as a prior.

In the second variety of MAP-based methods, implicit regularization MAP approaches use different regularization terms to enforce desired image priors. Xu et al. [8] incorporated a regularization term to approximate the $L_{0}$ cost which improves computational speeds over alternative implicit sparse regularizations. The framework in [8] alternated between estimating the latent sharp image and the blur kernel in each iteration. Krishnan et al. [9] used a ratio of $L_{1}/L_{2}$ norms as a regularization to estimate the kernel. This helps with the attenuation of high frequencies that blur introduces in an image.

Although many of the previous works discussed above involve the estimation of generalized blur kernels, some work specializes in motion blur kernel prediction. Whyte et al. [18] proposed a new method to estimate motion blur by parametrizing a geometric model in terms of rotational velocity of the camera during exposure time. They modified the Fergus et al. [1] algorithm to implement non-uniform deblurring. Hirsch et al. [19] combined projective motion path blur models with efficient filter flow [20] to estimate motion blur at a patch level and deblur by modifying the Krishnan & Fergus [21] algorithm.

Recent methods using deep learning have used various architectures including convolutional neural networks (CNNs) [22, 23, 17, 15, 24], encoder-decoder networks [25], generative adversarial networks (GANs) [16], and fully convolutional networks (FCNs) [26] to tackle the problems of blur kernel prediction and deblurring. Sun et al. [22] used a CNN to classify the best-fit motion blur kernel for each image patch. They used 73 different motion blur kernels and were able to expand to 361 kernels by rotation of their input image. Xu et al. [23] used a CNN to recover sharp edges from blurred images and then estimated the blur kernel from the blurry and recovered edges via a MAP optimization. Li et al. [17] used a mixture of deep learning and MAP estimation to deblur an image, using a binary classification CNN trained to predict the probability of whether the input is blurry or sharp. The classifier is used as a regularization term of the latent image prior in a MAP framework to deblur. Yan & Shao [15] used a two step process to predict blur parameters: first, a deep neural network classified the type of blur and second, a general regression neural network predicted the parameters of the blur. Nasonov and Nasonova [24] used a CNN to predict length and angle for linear motion blur with several formulations of the output values, although they noted difficulty in the simultaneous estimation of length and angle. Carbajal et al. [25] used an encoder and two decoders to predict motion kernel bases for the image and mixing coefficients for each pixel. Zhang et al. [16] used a conditional GAN to deblur images, combining an adversarial loss and a perceptually-motivated content loss. Gong et al. [26] used an FCN to predict a motion flow map where each pixel has its own motion vector estimate. While many of these approaches are able to handle spatially varying blur, we focus on a thorough exploration of motion kernel parameters in uniform blur; this will serve as a framework for spatially varying blur in future work.

There are three main contributions to this work. First, we study complications that arise in defining a motion blur kernel and, in particular, we analyze which motion blur kernels exist within different kernel shapes rather than using square odd-sized kernels as often implicitly assumed in the implementation—likely for ease of coding. Second, by exploring all motion blur kernel possibilities for a suite of length and angle combinations, we train a CNN which uses regression prediction instead of classification. By formulating the prediction as a regression, we train two output nodes (one for length and one for angle) which can span the full range of possible parameter values and can be trained on any granularity of parameters. Use of a classification network would necessitate 13,034 output nodes for the granularity considered here (99 lengths and 180 angles) and modification of the network architecture to train on any other granularity or post-hoc computation (e.g., as in [22]) to define predictions at another granularity. Use of regression also may alleviate some of the issues in simultaneous prediction of length and angle noted in [24]. Third, we expand the range of additive noise beyond that studied for other methods [14, 11, 23, 17, 9, 18] to analyze Gaussian noise up to a variance of 10% (10 dB SNR).

3 Blur Kernels and Blurred Dataset

A linear blur kernel can be parametrized by the length and orientation angle of a line. To formulate a discrete blur kernel for application to a digital image using Eq. (1), we specifically distinguish between a continuous line and a discrete pixel line. Next, we explore angle and length combinations that exist within linear motion blur kernels. Finally, we describe how the 2014 COCO dataset [27] is used to create a new blurred dataset for deep learning training and testing.

3.1 Continuous and Pixel Lines

Here, the term “line” refers to a line segment, $r$ denotes a Euclidean distance (or a continuous or discrete line), $r_{\infty}$ is a Chebyshev distance for a discrete line, $\theta\in(-90,90]$ denotes the orientation (angle) of a continuous line with respect to the horizontal axis in units of degrees, and $\phi\in(-90,90]$ is the angle of a discrete pixel line with the same conventions. For a given length $r$ , the set of possible continuous line angles is denoted $\Theta\in(-90,90]$ and the set of possible discrete pixel line angles is $\Phi\subseteq\Theta$ . A line in a 2D continuous domain $\mathbb{R}^{2}$ is a 1D object with zero width and length $r$ . On the other hand, a pixel line in a 2D discrete domain $\mathbb{Z}^{2}$ is defined on a discrete pixel grid which gives the line a non-zero width and length $r$ . Figure 1 shows a continuous line with parameters $(r,\theta)=(3,90)$ and a pixel line with parameters $(r,\phi)=(3,90)$ .

Refer to caption — Figure 1: (a) A continuous line with parameters $(r,\theta)=(3,90^{\circ})$ and (b) pixel line with parameters $(r,\phi)=(3,90^{\circ})$ . Note that the pixel line has a width of 1 due to its definition on a 2D discrete grid.

One limitation in defining pixel lines is the interrelation between length and angle. In essence, a shorter pixel line will have a limited number of unambiguous angles. Figure 2(a) illustrates how multiple continuous lines with angle $\theta\in[0,30]$ can be interpreted by the same horizontal pixel line of length $r_{\infty}=2$ . Similarly, a pixel line of length $r_{\infty}=2$ and angle $\phi=45$ [Fig. 2(c)] can describe many intermediate lines with $\theta\in(30,60)$ . The four pixel lines in Fig. 2 show the only pixel lines that can be described for length $r_{\infty}=2$ . This means that, for shorter length motion blurs, there will be gaps in the angles that can be represented. For example, a length $r_{\infty}=2$ pixel line can only have angles $\phi\in\{-45,0,45,90\}$ as shown in Fig. 2.

Conventional convolutional kernels are typically assumed to be square and odd in dimension. The preference for square kernels is related to the horizontal and vertical symmetry of features. The preference for odd-sized kernels is to have an unambiguous center point, as the center point is commonly assumed to be associated with the pixel being processed. The assumption of square or odd-sized kernels in the blur model of Eq. (1) is not mathematically necessary, however, and we consider blur kernels that may be non-square and/or even in one or more dimension.

3.2 Blur Kernel Creation

To explore the gaps in angles resulting from representation of continuous lines as pixel lines, we assume a range of Euclidean lengths $r=2,3,\ldots,100$ and angles $\theta=-89,-88,\ldots,90$ , and gather the resulting unique $(r,\phi)$ discrete line parameter pairs, illustrated in Fig. 3. We assume the $h\times w$ kernel corresponding to an $(r,\theta)$ continuous line has $h$ and $w$ defined as

h=\begin{cases}\lceil r\cos(\theta)\rceil,&\cos(\theta)\neq 0\\ 1,&\cos(\theta)=0\\ \end{cases},

(2)

w=\begin{cases}\lceil r\sin(\theta)\rceil,&\sin(\theta)\neq 0\\ 1,&\sin(\theta)=0\\ \end{cases},

(3)

where $\lceil\cdot\rceil$ is the ceiling operator. The conditions $\cos(\theta)=0$ and $\sin(\theta)=0$ explicitly define a height or width of one for the horizontal and vertical cases, respectively. We use $r$ as the Euclidean distance labels and draw an $(r,\theta)$ continuous line (e.g., the red lines in Fig. 2) through an $h\times w$ 2D discrete pixel grid (e.g., the $1\times 2$ , $2\times 1$ , and $2\times 2$ pixel grids in Fig. 2), resulting in a discrete pixel line (e.g., the white pixels in Fig. 2). The ceiling operator introduces a quantization error; since $r$ may result in a line ending mid-pixel, the ceiling operator corresponds to the choice to use the pixel where $r$ ends as part of the motion blur kernel. The worst case for the quantization error is for angles $\theta=\pm 45$ with the Euclidean and Chebyshev distances differing by a factor of $\sqrt{2}$ .

We use the line function from the skimage.draw library in Python to draw a continuous line through the pixel grid, resulting in a discrete pixel line consistent with the Bresenham digital line algorithm [28]. If $\theta$ is positive, the line is drawn from the lower-left corner $(h,0)$ to the upper-right corner $(0,w)$ , e.g., Fig. 2(c), and if $\theta$ is negative, the line is drawn from the upper-left corner $(0,0)$ to the lower-right corner $(h,w)$ , e.g., Fig. 2(d).

For each length $r=2,3,\ldots,100$ , we generate a discrete blur kernel associated with a continuous line of length $r$ and angle $\theta=0,1,\ldots,90$ . Noting that multiple angles $\theta$ may result in the same discrete blur kernel (see Fig. 2), a single discrete angle $\phi$ must be defined for each unique discrete blur kernel for use as a training label. We thus quantize the continuous angle $\theta$ for a given blur kernel as

\phi=\begin{cases}0,&0\in\Theta\\ 90,&90\in\Theta\\ \lceil\text{median}(\Theta)\rceil,&\text{else}\end{cases},

(4)

where $\Theta$ is the set of continuous angles $\theta$ resulting in the given blur kernel and where $\lceil\cdot\rceil$ is the ceiling operator necessary to define an integer angle in cases that $\Theta$ contains an even number of elements. Note that this definition of $\phi$ has two special cases. If $0\in\Theta$ , the kernel is consistent with a horizontal line [see Fig. 2(a)] and we assign $\phi=0$ . Similarly, if $90\in\Theta$ , the kernel is consistent with a vertical line [see Fig. 2(b)] and we assign $\phi=90$ . These special cases avoid the median operator assigning an erroneous label to a horizontal or vertical line.

We compute all possible $(r,\phi)$ combinations by generating the blur kernel for $r=2,3,\ldots,100$ and $\theta=0,1,\ldots,90$ and computing the resulting blur kernel angle $\phi$ using Eq. (4). We span only positive angles $\theta\in[0,90]$ since kernels with negative angles will be symmetric versions of the kernels with positive angles. The resulting values $r$ and $\phi$ that yield unique discrete pixel lines are denoted as $(r,\phi)_{\text{u}}$ . These parameter values are also used as labels to train a network to predict the blur parameters from a blurry image (Sec. 4) using $(r,\phi)=(1,0)$ as labels for a non-blurred image.

Figure 3 illustrates the $(r,\phi)_{\text{u}}$ combinations as a scatter plot, where we note the existence of gaps where a continuous line cannot be described by a pixel line. As expected, gaps are larger for smaller lengths, indicating limited unique pixel lines (blur kernels) for shorter blur lengths. Figure 3 plots the number of unique angles $\phi$ versus length $r$ where we note that a line must have approximately 70 pixels or more in order to represent all orientations $\phi=-89,-88,\ldots,90$ . This exploration of the $(r,\phi)$ space explored 17,820 possible $(r,\theta)$ lines resulting in 13,034 unique $(r_{,}\phi)_{\text{u}}$ kernels; this is orders of magnitude larger than other explorations, e.g., the 361 kernels in [22].

3.3 COCO Blurred Dataset

We use the 2014 COCO dataset [27] to create a blurred dataset for training and validating a model for blind estimation of length and angle given only a blurry image. The 2014 COCO dataset consists of a training dataset with 82,783 images and a validation set of 40,504 images. The training dataset is used to create a blurred training dataset and the validation dataset is split (with no overlap) to create blurred validation and test datasets.

The blurred dataset is created by defining a blur generator that loops through the labels $(r,\phi)_{\text{u}}$ , creating the corresponding blur kernel using the skimage.draw function (see Sec. 3.2), and convolving the blur kernel with an image from the COCO dataset. A random COCO image is chosen (without replacement) for each length $r$ to provide a variety of images in the blurred dataset. There is a dataset imbalance in lengths and/or angles represented in the blurred dataset due to the uneven representation of angles for shorter lengths (see Fig. 3). We improve the balance by creating a minimum of 175 blurred images per length, implying that more unique COCO images are used for shorter blur lengths. Each run of the blur generator creates 21,789 blurred images. The creation of the training set used 12 parallel threads of the blur generator, each operating on a subset of the COCO training dataset, creating a total of 261,468 blurred images. The validation and testing sets were each created with one run of the blur generator operating on the validation and testing splits of COCO validation dataset, creating a total of 21,789 blurred images for each. Figure 4 shows the number of unique COCO images versus length and angle for the training, validation, and testing datasets. We notice a spike in the number of unique images for length $r=1$ and for angle $\phi=0$ due to the non-blurred images that are part of the dataset. In general, more unique images are used for shorter lengths as the blur generator needs to loop through more images to complete the minimum threshold of images per length. We notice also that small length blur kernels create peaks at angles of $\phi=\{-45,0,45,90\}$ since smaller length kernels are limited in the angles they are capable of representing.

There is an interdependence between angle and length, resulting in an inability to completely balance both length and angle in the blurred dataset. Figure 5 shows the distribution of labeled blurred images with respect to angle and length. Our choice to generate a minimum of 175 blurred images per length in the blur generator favors a more uniform length distribution [Fig. 5(b)]. This, however, results in peaks in the angle distribution [Fig. 5(a)] due to the over-representation of certain angles for smaller lengths (see Fig. 3). A choice to favor a more uniform angle distribution, however, would result in smaller representation of shorter blur lengths, creating a more problematic data imbalance. While there are other choices that could be made regarding data imbalance here, we find that this blurred dataset can be used to train a network to accurately predict both length and angle.

4 Uniform Blur Prediction

We formulate blur length and angle estimation as a deep learning regression problem using the VGG16 [6] architecture as the backbone of our model. VGG16 was trained from scratch with TensorFlow’s default layer initializations to predict both the length and angle parameters for a uniform linear motion blurred image. The last layer of VGG16 was modified to have two output nodes (corresponding to prediction of length and angle) with sigmoid activations. Since the sigmoid activation outputs values in the range $[0,1]$ , length and angle are normalized to be in the range $[0,1]$ to train the model. The predicted length and angle parameters are re-scaled to their native ranges ( $r\in[1,100]$ , $\phi\in(-90,90]$ ) for validation.

Due to the fixed input size of VGG16 ( $224\times 224$ pixels), we created a TensorFlow data generator to randomly crop the blurred COCO images. We crop instead of resizing the image to maintain accuracy in the blur angle labels since a resize could change the aspect ratio of the image and thus the blur angle. If a blurred image was smaller than the input size for VGG16, it was skipped and not used in training; in total there were 892 training, 5 validation, and 13 test images that were skipped. This results in less than $0.3\%$ of images skipped for each dataset.

We use the Adam optimizer to with a learning rate of 0.1 and epsilon of 0.1 and the mean squared error (MSE) as the loss function. A batch size of 50 was empirically determined to mitigate convergence issues. The model is trained for 50 epochs, saving the weights for the best model throughout training; training is terminated if the MSE performance has not improved within the previous 5 epochs. Training takes about 25 minutes per epoch and about 12 hours to fully train on an NVIDIA RTX-3090.

We performed multiple trainings of the network for different noise levels, resulting in four different models. The first model is trained with no noise and the other three models are trained with different levels of additive white Gaussian noise with variance $\sigma^{2}\in\{0.001,0.01,0.1\}$ , corresponding to signal-to-noise (SNR) values of $\{30,20,10\}$ dB, respectively. Noise is added in our data generator after the blurred image is cropped and normalized to the range $[0,1]$ . Each model is tested by predicting blur parameters for the noiseless test set and for three additional test runs in which three noise levels are added to the test set.

5 Experiments and Results

This section presents results using the $R^{2}$ coefficient of determination metric for evaluating the regression model. Next, we consider a scenario with additive noise and present results our model’s predictions. Finally we compare with other methods of blur kernel estimation [9, 14, 25] and evaluate by assessing deblurred images using the error ratio score as introduced by [7].

5.1 Metrics

We validate the performance of blur estimation using metrics that measure the accuracy of the parameter estimation itself and also the quality of an image deblurred using the estimated kernel.

5.1.1 Accuracy of Parameter Estimation

In testing, we measure performance using the coefficient of determination, $R^{2}$ , which measures the goodness of fit between actual known values of variable $y_{i}$ and estimated values $x_{i}$ :

R^{2}=1-\frac{\sum_{i=1}^{n}(y_{i}-x_{i})^{2}}{\sum_{i=1}^{n}(y_{i}-\bar{y})^{% 2}},

(5)

where $\bar{y}$ is the mean of the known variables $y_{i}$ and $n$ is the number of samples [29]. The numerator $\sum_{i=1}^{n}(y_{i}-x_{i})^{2}$ is the sum of squares of the residual prediction errors and the denominator $\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}$ is proportional to the variance of the known data. A perfect model will have zero residual errors and thus an $R^{2}=1$ . A naïve model that always predicts the average of the data $\bar{y}$ will have equal numerator and denominator and thus an $R^{2}=0$ . Models that have predictions worse than the naïve model will have an $R^{2}<0$ .

5.1.2 Quality of Deblurred Image

In addition to studying the accuracy of blur kernel estimation, we measure the quality of the image deblurred using the estimated blur kernel in a non-blind deblurring method (see Sec. 5.3). To measure the quality of the deblurred image, we use the error ratio as presented in [7]. The error ratio is motivated by the fact that, even with a perfectly estimated blur kernel, one may not be able to perfectly predict the latent sharp image. The error ratio $E_{\hat{K}}/E_{\vphantom{\hat{K}}{K}}$ is computed by considering the error $E_{\hat{K}}$ between the true sharp image and the latent sharp image recovered using the estimated blur kernel $\hat{K}$ and the error $E_{\vphantom{\hat{K}}{K}}$ between the true sharp image and the latent sharp image recovered using the true blur kernel $K$ . An error ratio of 1 indicates that the images deblurred using the estimated and true kernel are identical.

We use the sum of squared differences (SSD) as in [7] to define the error ratio. Error ratios are generally presented as a cumulative histogram for error ratios binned in the range $[1,4]$ . As noted in [7], SSD error ratios above 2 tend to indicate significant perceptually noticeable distortion present in the image deconvolved with the estimated kernel as compared to the image deconvolved with the true kernel.

5.2 Accuracy of Parameter Estimation

5.2.1 Noise-Free Predictions

A VGG16 based model (Sec. 4) was first trained on blurred images without any additive noise. Scatter plots of estimated versus actual length and angle are shown in Fig. 6 along with the corresponding $R^{2}$ scores. We find the model to be highly accurate for prediction of both length and angle as demonstrated by the $R^{2}$ scores of 0.9869 and 0.9935, respectively. In Fig. 6(a) we note a larger spread in estimated values as the length increases. This is not surprising, as larger blur lengths are expected to be more difficult to accurately estimate. In Fig. 6(b) we note certain angles have a larger spread in estimated values. This is most notable for $\phi\in\{-45,0,45,90\}$ , but can be noted for other angles. This is due to errors in prediction for the smaller length kernels which have a limited set of angles. It is important to recall, however, that many of these incorrect angle predictions will result in a correct blur kernel. As an example, an image blurred with kernel parameters $(r,\phi)=(2,0)$ may have an angle prediction of $\hat{\phi}=27$ . However, since blur kernels of length $r=2$ can only be represented by $\phi\in\{-45,0,45,90\}$ , the blur kernel created with parameters $(r,\phi)=(2,27)$ will generate a blur kernel estimate of $(r,\phi)_{\text{u}}=(2,0)$ which is correct.

5.2.2 Predictions with Additive Noise

We used the model from Sec. 5.2.1, trained on noise-free blurred images, and tested it on images with additive white Gaussian noise with variance $\sigma^{2}\in\{0.001,0.01,0.1\}$ , corresponding to SNR values of $\{30,20,10\}$ dB, respectively. Results for this experiment are shown in the first row of Tables 1 and 2 for length and angle prediction, respectively. We note the estimation of the length parameter is more susceptible to noise than angle, resulting in a complete failure of prediction ( $R^{2}$ scores less than 0) for even the smallest level of additive noise, $\sigma^{2}=0.001$ . We hypothesize that additive noise can alter the intensity distribution along the blur path in a blurry image, creating the appearance of artificially shorter or longer blur paths. Those blur paths, however, will likely retain more characteristics of their angle for the same level of noise.

Table 1:

R^{2}

score for length prediction for training and testing under different levels of additive noise.

	Testing $\sigma^{2}$
Training $\sigma^{2}$	0	0.001	0.01	0.1
0	0.9869	-0.26	-3.17	-3.17
0.001	0.9607	0.9557	-1.14	-3.12
0.01	0.9527	0.9539	0.9523	-2.91
0.1	0.8923	0.8932	0.8960	0.8772

Table 2:

R^{2}

score for angle prediction for training and testing under different levels of additive noise.

	Testing $\sigma^{2}$
Training $\sigma^{2}$	0	0.001	0.01	0.1
0	0.9935	0.8772	0.3935	0
0.001	0.9758	0.9754	0.6509	-0.16
0.01	0.9733	0.9735	0.9682	0.0128
0.1	0.8999	0.9009	0.9010	0.8834

We trained three additional models using noisy blurred images with additive white Gaussian noise with variance $\sigma^{2}\in\{0.001,0.01,0.1\}$ and tested each of those models on noise-free and noisy images, i.e., for $\sigma^{2}\in\{0,0.001,0.01,0.1\}$ . Results for those three models are in the second through fourth rows of Tables 1 and 2 for length and angle prediction, respectively. We again note a higher sensitivity to noise in the prediction of length (Table 1) than angle (Table 2). We further note that the $R^{2}$ score decreases by $\sim 0.1$ for the model trained on the highest level of noise $\sigma^{2}=0.1$ and tested on noise-free images, compared to the noise-free model tested on noise-free images, but that the same model is robust to varying levels of noise. Finally we note that models trained on noisy data appear to be robust to noise levels less than or equal to the noise level on which they are trained. This implies that a single model trained on a single noise level can yield accurate predictions even for smaller noise levels not seen in training.

Most other methods [14, 11, 23, 17] that test under additive noise test up to a Gaussian noise of $\sigma^{2}=0.01$ to simulate sensor noise, while some [9, 18] test noise up to $\sigma^{2}=0.02$ . In comparison, our model is tested up to $\sigma^{2}=0.1$ and demonstrates a higher tolerance for noise which can be an advantage when modeling atmospheric turbulence.

5.3 Quality of Deblurred Images

5.3.1 Deconvolution and Comparison Methods

We quantify the performance of our blur parameter estimation by using the expected patch log likelihood (EPLL) method [30] to deconvolve the images using our predicted motion blur kernel and the ground truth blur kernel. We used the Python implementation available at [31], replacing the Gaussian kernel with a linear motion blur kernel. EPLL only accepts square, odd-sized blur kernels, necessitating a padding of kernels to be square and odd-sized. We symmetrically zero pad the shortest side of the kernel to keep the kernel centered and pad to match the size of the longest side. If the longest side is even, however, this results in an even-sized square kernel. We use a similar approach to [32], creating four odd-sized kernels each zero padded with the line asymmetrically offset toward a different corner. The blurred image is deconvolved with each of the four kernels and the average of the four deconvolved images is considered the resulting deblurred image.

Additionally, we compare performance with the blur kernel estimation methods from Levin et al. [14], Krishnan et al. [9], and Carbajal et al. [25] These works also include methods for estimating the deblurred image. For fairness of comparisons, we use only the blur kernel estimate from [14, 9, 25] and use those blur kernel estimates in the same EPLL [30] deblurring framework as described above. For the method in [14], we used the Matlab implementation provided at [33], using the deconv_diagfe_filt_sps function with a kernel prediction size of $(101,101)$ to allow prediction of the largest kernel size in our blurred dataset. Since the method in [14] estimates a blur kernel for each of the three channels in a color image, we applied EPLL to each channel using the kernel estimated for that channel. For the method in [9], we used the Matlab implementation provided at [34], again with kernel size of $(101,101)$ . For the method in [25], we used the python implementation provided at [35] with the fixed kernel size of $(33,33)$ ; we note that this will put this method at a disadvantage in predicting longer blur lengths. Additionally, we estimated a single blur kernel for the uniformly blurred images by using the image-averaged mixing coefficients in the superposition of the kernel bases; this allows us to use the same EPLL deblurring framework for comparison to the other methods.

5.3.2 Reduced Test Set

Due to the computational complexity of the EPLL deconvolution [30], as well as the kernel estimation methods in [9, 14, 25], we generated reduced-size test datasets for these experiments. From the test dataset (Sec. 3.3), two subsets are created to span length and angle, respectively. The first subset, subsequently referred to as Length 5 (L5), uses $5$ randomly selected blurred images for each length, totaling $500$ images. The second subset, subsequently referred to as Angle 3 (A3) uses $3$ randomly selected blurred images for each angle, totaling $540$ images. All images in these reduced test sets are noise free.

5.3.3 Error Ratio Comparisons

We calculated SSD error ratios for our proposed blur kernel estimation as well as those from Levin et al. [14], Krishnan et al. [9], and Carbajal et al. [25] as seen in Fig. 7. Recall that an error ratio of 1 indicates that deblurring with the estimated kernel results in an identical image to that deblurred with the ground truth kernel. An error ratio of 2 or higher is considered unacceptable as it was shown in [7] that such SSD error ratios indicate the presence of significant perceptually noticeable distortions in the latent image estimated using the estimated kernel. An error ratio $<1$ indicates that the image deblurred with the estimated kernel achieves a better match to the true sharp image than deblurring with the true kernel.

Our proposed method has the highest cumulative histogram values for the error ratio at 1 for both A3 and L5 datasets and yields an error ratio of 1.25 or less for most images in the A3 and L5 test sets. This means that our model is able to predict more accurate kernels compared to the other methods. Since our method creates the kernel using linear parameters, we are less prone to additive noise in the kernel. Levin et al.’s [14] kernel prediction has noise added in the kernel since many of the pixels that are supposed to be zero are instead small numbers close to zero. This noise can be seen to affect its results in kernel prediction. Krishnan et al. [9] thresholds the small elements of the kernel to zero which increases robustness to noise. Carbajal et al. [25] has competitive performance for error ratios $\geq$ 1.25; the method is at a disadvantage at longer blur lengths due to its limitation to $33\times 33$ kernels as noted above which may contribute to the diminished performance at the lowest error ratio.

Figures 8 and 9 show two qualitative examples of images deblurred using the kernels estimated with our proposed method and the methods in [9, 14, 25]. The example in Fig. 8 uses a blurred image as an input and the example in Fig. 9 has a sharp image as an input for demonstration of the methods in the absence of blur. We note similar qualitative results between the deblurred images using our estimated blur kernel and the blur kernel estimated by the method in [9, 25], and that both methods yield similar results to the image deblurred with the truth kernel. We do note, however, some ringing in the images deblurred using the kernel from [9]; this is particularly noticeable in the areas around the upper power line in the image in Fig. 8(b) and the darker snow shadows in Fig. 9(b). The blur kernel estimated by the method in [14] introduces significant artifacts, especially apparent in Fig. 9(c) but also apparent in Fig. 8(c) as dark regions in the sign. The blur kernel estimated by [25] yields very good qualitative results, perhaps indicating the advantage of a kernel that is not constrained to linear motion blur when considering very large image regions. Overall, the proposed method of kernel estimation appears to yield deblurred images close to that which could be achieved with the ground truth kernel which validates this approach as the foundation for future work in estimation of motion-blur parameters in spatially-varying blur.

6 Conclusions

In this paper, we have studied in detail the limitation in representation of linear blur kernels, particularly for shorter blur lengths. We find an interdependence between length and angle in representing blur kernels, meaning that develo** a dataset that is balanced in both length and angle is not possible. Furthermore, while much existing research in linear blur prediction has implicitly assumed square odd-sized kernels, we relax this assumption and allow for non-square even-sized kernels. A blurred dataset was created from the 2014 COCO dataset using a suite of blur kernels for length $r\in[1,100]$ and angle $\phi\in(-90,90]$ , providing a more comprehensive variety of blur kernels than previously studied.

This thorough definition of linear motion blur and the development of a blurred dataset allowed us to train a regressive deep learning model instead of a classification model as other deep learning methods implement. With regression we can estimate more accurate blur kernel parameters which are not limited by the model needing a priori information about the kernel size. We demonstrate excellent performance in estimation of blur kernel parameters with a coefficient of determination $R^{2}\geq 0.89$ for both length and angle prediction. The robustness of the estimation to additive noise was studied for a wider range of noise than previously considered, $\sigma^{2}\in\{0.001,0.01,0.1\}$ , corresponding to SNR values of $\{30,20,10\}$ dB. The regression CNN was found to be very robust to noise, with a $10\%$ drop in the $R^{2}$ metric for a $10\%$ Gaussian noise (which we note is an order of magnitude larger than the noise level of 1% commonly studied). We further note that the model trained on 10% Gaussian noise was robust to noise levels less than 10%, indicating that single model can serve across a wide range of noise scenarios. Using the estimated blur kernels in a non-blind deblurring method, we find sum of SSD error ratios of 1.25 or less for most test images, significantly outperforming the MAP-based comparison methods and competitive with a recent deep-learning based method.

In future work, we will use this exploration of linear motion blur kernels as a baseline and foundation for spatially-varying blurs. We will expand this to a patch level approach to decompose a spatially-varying blur image into a superposition of locally uniform blur patches. Extending the approach to non-uniform blur will provide a better means to estimate the spatially varying blur resulting from atmospheric turbulence in images. Additionally, it would be interesting to directly compare our approach to other deep learning approaches which either directly estimate length and angle or which assume locally linear motion blur kernels, e.g., [22, 24, 15, 26].

Disclosures

The authors have no conflicts of interest to disclose.

Code, Data, and Materials Availability

The code necessary to implement the proposed method of kernel estimation as well as to generate the blurred dataset from the publicly available COCO dataset is available GitHub repository at https://github.com/DuckDuckPig/Regression_Blur.

Acknowledgments

The authors gratefully acknowledge Office of Naval Research grant N00014-21-1-2430 which supported this work.

References

[1] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph,” in ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, (New York, NY, USA), p. 787–794, Association for Computing Machinery, 2006.
[2] S. Dai and Y. Wu, “Motion from blur,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2008.
[3] B. R. Hunt, A. L. Iler, C. A. Bailey, and M. A. Rucci, “Synthesis of atmospheric turbulence point spread functions by sparse and redundant representations,” Optical Engineering, vol. 57, no. 2, p. 024101, 2018.
[4] É. Thiébaut, L. Dénis, F. Soulez, and R. Mourya, “Spatially variant PSF modeling and image deblurring,” in Adaptive Optics Systems V, vol. 9909, pp. 2211–2220, SPIE, 2016.
[5] Y. Bahat, N. Efrat, and M. Irani, “Non-uniform blind deblurring by reblurring,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3306–3314, 2017.
[6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[7] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1964–1971, IEEE, 2009.
[8] L. Xu, S. Zheng, and J. Jia, “Unnatural L0 sparse representation for natural image deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013.
[9] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in CVPR 2011, pp. 233–240, 2011.
[10] S. Cho and S. Lee, “Fast motion deblurring,” in ACM SIGGRAPH Asia 2009 Papers, SIGGRAPH Asia ’09, (New York, NY, USA), Association for Computing Machinery, 2009.
[11] T. S. Cho, S. Paris, B. K. P. Horn, and W. T. Freeman, “Blur kernel estimation using the radon transform,” in CVPR 2011, pp. 241–248, 2011.
[12] J. H. Money and S. H. Kang, “Total variation minimizing blind deconvolution with shock filter reference,” Image and Vision Computing, vol. 26, no. 2, pp. 302–314, 2008.
[13] J. Jia, “Single image motion deblurring using transparency,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.
[14] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Efficient marginal likelihood optimization in blind deconvolution,” in CVPR 2011, pp. 2657–2664, 2011.
[15] R. Yan and L. Shao, “Blind image blur estimation via deep learning,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1910–1921, 2016.
[16] Y. Zhang, Y. Xiang, and L. Bai, “Generative adversarial network for deblurring of remote sensing image,” in 2018 26th International Conference on Geoinformatics, pp. 1–4, 2018.
[17] L. Li, J. Pan, W.-S. Lai, C. Gao, N. Sang, and M.-H. Yang, “Learning a discriminative prior for blind image deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[18] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Non-uniform deblurring for shaken images,” International journal of computer vision, vol. 98, pp. 168–186, 2012.
[19] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schölkopf, “Fast removal of non-uniform camera shake,” in 2011 International Conference on Computer Vision, pp. 463–470, 2011.
[20] M. Hirsch, S. Sra, B. Schölkopf, and S. Harmeling, “Efficient filter flow for space-variant multiframe blind deconvolution,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 607–614, IEEE, 2010.
[21] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper-Laplacian priors,” Advances in neural information processing systems, vol. 22, 2009.
[22] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[23] X. Xu, J. Pan, Y.-J. Zhang, and M.-H. Yang, “Motion blur kernel estimation via deep learning,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 194–205, 2018.
[24] A. V. Nasonov and A. A. Nasonova, “Linear blur parameters estimation using a convolutional neural network,” Pattern Recognition and Image Analysis, vol. 32, no. 3, pp. 611–615, 2022.
[25] G. Carbajal, P. Vitoria, M. Delbracio, P. Musé, and J. Lezama, “Non-uniform blur kernel estimation via adaptive basis decomposition,” arXiv preprint arXiv:2102.01026, 2021.
[26] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. van den Hengel, and Q. Shi, “From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
[28] J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, p. 25–30, Mar 1965.
[29] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Computer Science, vol. 7, 2021.
[30] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in 2011 international conference on computer vision, pp. 479–486, IEEE, 2011.
[31] R. Friedman, “EPLL PyTorch Implementation.” https://github.com/friedmanroy/torchEPLL. Accessed: 2023-02-05.
[32] S. Wu, G. Wang, P. Tang, F. Chen, and L. Shi, “Convolution with even-sized kernels and symmetric padding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
[33] A. Levin, Y. Weiss, F. Durand, and W. Freeman, “Efficient marginal likelihood optimization in blind deconvolution.” https://webee.technion.ac.il/people/anat.levin/. Accessed: 2023-02-05.
[34] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure.” https://dilipkay.wordpress.com/blind-deconvolution/. Accessed: 2023-02-05.
[35] G. Carbajal, P. Vitoria, M. Delbracio, and P. Musé, “Non-uniform motion blur kernel estimation via adaptive decomposition.” https://github.com/GuillermoCarbajal/NonUniformBlurKernelEstimation. Accessed: 2024-02-12.

Estimation of motion blur kernel parameters using regression convolutional neural networks1,2