IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement

Pranjali Singh
Centre for Intelligent Cyber Physical Systems, Indian Institute of Technology Guwahati,
[email protected]
&

Prithwijit Guha
Dept. of Electronics and Electrical Engg., Indian Institute of Technology Guwahati,
[email protected]

Abstract

Underwater image quality is affected by fluorescence, low illumination, absorption, and scattering. Recent works in underwater image enhancement have proposed different deep network architectures to handle these problems. Most of these works have proposed a single network to handle all the challenges. We believe that deep networks trained for specific conditions deliver better performance than a single network learned from all degradation cases. Accordingly, the first contribution of this work lies in the proposal of an iterative framework where a single dominant degradation condition is identified and resolved. This proposal considers the following eight degradation conditions – low illumination, low contrast, haziness, blurred image, presence of noise and color imbalance in three different channels. A deep network is designed to identify the dominant degradation condition. Accordingly, an appropriate deep network is selected for degradation condition-specific enhancement. The second contribution of this work is the construction of degradation condition specific datasets from good quality images of two standard datasets (UIEB and EUVP). This dataset is used to learn the condition specific enhancement networks. The proposed approach is found to outperform nine baseline methods on UIEB and EUVP datasets.

Keywords Image Enhancement $\cdot$ Deep Neural Network $\cdot$ Underwater Image Enhancement

1 Introduction

Poor visibility conditions in the world’s oceans have limited our understanding of these environments. To address this challenge, underwater image enhancement techniques are employed [1]. With approximately 70% of the Earth’s surface covered by water, there is increasing interest in exploring underwater realms. Clear images are essential for monitoring marine species, underwater mountains, and plants. Additionally, the effects of color in underwater images are significant. Light reflection varies greatly depending on the sea’s structure, with water capable of bending light to create crinkle patterns or diffusing it. The quality of underwater photos is influenced by several factors, including restricted visibility range, uneven lighting, unwanted noise, and reduced color fidelity [2].

1.1 Application

Underwater image enhancement has numerous practical applications in various fields, including oceanography, underwater archaeology, underwater robotics, underwater exploration, and more [3]. Some specific applications are outlined below:

1.

Marine Life: Underwater image enhancement aids in the identification and tracking of marine life, such as fish, corals, and other organisms. This is crucial for scientific research on the health and behavior of underwater ecosystems [2].
2.

Oceanography: Enhanced underwater images improve the study of ocean currents, tides, and underwater topography.
3.

Underwater Archaeology: Enhancing images of submerged structures, like shipwrecks, assists in identifying and studying historical artifacts and structures.
4.

Underwater Security and Surveillance: The accuracy and effectiveness of security and surveillance systems in underwater environments are enhanced through underwater image enhancement. This aids in detecting and tracking intruders, suspicious objects, and potential threats to underwater infrastructure, such as pipelines and oil rigs [2].
5.

Underwater Robotics: Underwater robots, such as remotely operated vehicles (ROVs) and autonomous underwater vehicles (AUVs), are equipped with cameras and sensors for object detection and navigation. Underwater image enhancement improves the quality of images captured by these sensors, facilitating the detection and tracking of marine life, underwater structures, and potential hazards.
6.

Underwater Photography and Videography: Enhancing the quality of underwater images and videos makes them more appealing to audiences and enhances the immersive experience. This is particularly important for promoting dive sites and other underwater attractions.
7.

Underwater Map** and Navigation: Image enhancement increases the accuracy and detail of maps and navigation systems used for underwater exploration and research, aiding in the discovery and exploration of new dive sites and other underwater environments.
8.

Underwater Tourism through Virtual Reality: Enhancing the quality of images and videos used for virtual reality (VR) experiences provides a more immersive and realistic experience for users, enabling safe and realistic exploration of underwater environments [4].

Improving the quality of underwater images can significantly impact our understanding of the underwater world and enhance our ability to explore and interact with it [2].

Refer to caption — Figure 1: Application areas of underwater image processing, highlighting its critical roles in marine life identification, oceanography, underwater archaeology, security and surveillance, robotics, photography and videography, map** and navigation, and virtual reality tourism [5].

1.2 Challenges

Light attenuation refers to the reduction in light intensity as it travels through a medium, resulting from absorption, scattering, and reflection by particles and molecules within that medium. The degree of light attenuation is influenced by the medium’s properties, such as its composition, density, and scattering characteristics.

In water, light attenuation is significantly greater than in air due to the higher density and greater concentration of particles and molecules. Water molecules, suspended particles, and dissolved substances like salts and organic matter all contribute to the attenuation of light, as illustrated in Fig 2.

The extent of light attenuation in water varies with the wavelength of the light. Shorter wavelengths, such as blue and green light, are attenuated more strongly than longer wavelengths, such as red and infrared light. This phenomenon explains why objects underwater appear bluer and darker compared to their appearance in air; blue light, having a shorter wavelength, is absorbed and scattered more than red light, which has a longer wavelength.

In contrast, light attenuation in air is much lower due to the lower density and smaller concentration of particles and molecules in the atmosphere. However, atmospheric conditions like fog, haze, and pollution can also contribute to light attenuation, especially for longer wavelengths of light, such as red and infrared.

The underwater environment encompasses areas submerged in water, whether in natural or artificial bodies such as oceans, seas, reservoirs, rivers, or aquifers. It is the cradle of life on Earth and is vital for sustaining diverse life forms, serving as a natural habitat for numerous organisms. Many human activities occur within accessible regions of the underwater environment. Consequently, understanding the characteristics of the underwater imaging model is essential for conducting research across various fields [33].

1.2.1 Absorption and Scattering

The Lambert-Beer empirical law states that the decay in light intensity depends on the properties of the medium through which it travels. In water, light intensity decays exponentially through a process known as attenuation. Attenuation results from the combined effects of absorption and scattering, leading to a loss of light energy and a change in the direction of electromagnetic energy. This attenuation poses a significant challenge for underwater imaging by creating a hazy effect that complicates image processing applications in marine environments. In clear water, attenuation limits visibility to approximately 20 meters, whereas in turbid water, visibility is reduced to only 5 meters. Additionally, light absorption in water varies with wavelength; as depth increases, different colors of light are absorbed at different rates. Red, with the longest wavelength, is absorbed first, while blue, with the shortest wavelength, penetrates the farthest, resulting in a bluish tint in underwater images as shown in Fig 3.

In an underwater medium, the presence of dust particles leads to scattering phenomena. When light reflects off an object’s external surface and reaches the camera, it interacts with the floating particles in the medium, causing a scattering effect. There are two types of scattering that affect underwater images: forward scattering and backward scattering as shown in Fig 4.

The model is based on the principles of linear superposition and the water medium modeling defined in the Jaffe–McGlamery model [2]. The irradiance entering the camera is a linear combination of three distinct components: the direct component ( $E_{d}$ ), the forward-scattered component ( $E_{f}$ ), and the backscatter component ( $E_{b}$ ). The total irradiance ( $E_{T}$ ) can be expressed as follows:

E_{T}=E_{d}+E_{f}+E_{b}

(1)

The direct component, denoted as $E_{d}$ , refers to the light that is reflected by an object and reaches the camera without undergoing any scattering. Forward scatter, represented by $E_{f}$ , occurs when the light reflected from an object scatters in its direction before reaching the camera. In contrast, backscatter happens when the light scatters directly towards the camera after reflecting off particles in the water. These models are often used for image restoration but require high-speed computations and longer execution times.

$E_{d}$ signifies the light that is directly reflected by the object without any scattering in the water. This component is particularly beneficial for underwater imaging and can be expressed as:

E_{d}(x,y)=E(x,y)e^{-cd(x,y)}

(2)

The expression $E(x,y)$ represents the irradiance at position $(x,y)$ . The total attenuation coefficient (c) of the medium quantifies the combined effects of scattering and absorption on light loss within the medium. The variable $d(x,y)$ denotes the distance between the object and the camera. Furthermore, $E_{f}$ refers to the forward scatter component, which is light reflected by an object and scattered at a small angle before reaching the camera:

E_{f}(x,y)=E_{d}(x,y)*g(x,y)

(3)

To denote the convolution operator, the symbol $\ast$ is used, and $g(x,y)$ represents the point spread function (PSF). To avoid the mathematically complex issue of solving the deconvolution through PSF estimation, researchers typically assume that the underwater scene is close to the camera, thereby neglecting the impact of forward scattering.

$E_{b}$ represents the backscattered light reflected by particles in the water. This component does not include light from the object itself, as it is primarily caused by the scattering of floating particles. $B_{\infty}$ denotes the underwater background light.

E_{b}(x,y)=B_{\infty}(\lambda)(1-e^{-cd(x,y)})

(4)

1.2.2 Suspended Particles

The presence of suspended particles in water can be mathematically modeled using the radiative transfer equation, which describes the interaction of light with matter. For underwater images, this equation can model the propagation of light through the water column, including the effects of scattering and absorption by suspended particles [46].

A common approach to enhancing underwater images involves using a dehazing algorithm that estimates the transmission map of the image. This map represents the fraction of light that has successfully transmitted through the water column. The transmission map can be estimated using the following equation:

t(x)=e^{(-\beta d(x))}

(5)

where $t(x)$ is the transmission at pixel $x$ , $d(x)$ is the distance between the camera and pixel $x$ , and $\beta$ is the scattering coefficient of the water. The scattering coefficient depends on the concentration and size distribution of suspended particles in the water and can be estimated using empirical or theoretical models.

Once the transmission map is estimated, it can be used to remove the effects of haze and recover the original colors and contrast of the image using the following equation:

I(x)=\frac{(I(x)-A)}{t(x)}+A

(6)

where $I(x)$ is the intensity of the image at pixel $x$ , $A$ represents the atmospheric light (the color of the light in the absence of scattering), and $t(x)$ is the estimated transmission at pixel $x$ .

Color correction algorithms can also be employed to compensate for the color distortion caused by suspended particles. A common approach is to estimate the color of the ambient light in the underwater environment using a white-balancing algorithm and then adjust the color balance of the image accordingly.

In summary, the key to mathematically enhancing underwater images lies in modeling the effects of suspended particles on light transmission using the radiative transfer equation and applying appropriate image enhancement techniques to mitigate these effects.

1.2.3 Non-Uniform Illumination

Absorption and scattering of light in water can lead to blurriness, reduced contrast, and an overall decline in image quality. These effects are further exacerbated in high-turbidity underwater conditions or when powerful artificial light sources are used [51]. Such light sources can cause non-uniform lighting in fluorescence, resulting in reflections that obscure image details and create bright spots as shown in Fig 5.

A common method to model non-uniform illumination in underwater environments is by using the Beer-Lambert law. This law describes how light intensity attenuates as it travels through a medium, stating that the intensity of light decreases exponentially with distance:

I=I_{0}*e^{(-k*d)}

(7)

where $I$ is the intensity of the light after passing through the medium, $I_{0}$ is the initial intensity of the light, $k$ is the extinction coefficient of the medium (a measure of how much the medium absorbs or scatters light), and $d$ is the distance the light has traveled through the medium.

In underwater environments, the extinction coefficient can vary depending on factors such as water depth, water clarity, and the presence of suspended particles or plankton. Thus, the Beer-Lambert law can be used to model the non-uniformity of underwater illumination [36].

Other factors contributing to non-uniform illumination in underwater environments include the angle of incidence of the light, the direction and intensity of light fluorescence, and the presence of shadows and reflections. Modeling these factors may require more complex mathematical formulas, such as ray tracing or radiative transfer models.

1.2.4 Fluorescence

Fluorescence is a phenomenon where certain materials absorb light at one wavelength and emit it at a longer wavelength. However, underwater image processing provides methods to overcome these challenges. As shown in Fig 6, visual information can be combined with quantitative assessment to effectively address these issues.

To address these challenges, various techniques and algorithms have been developed for underwater image enhancement. These include classical methods such as histogram equalization and Retinex, as well as deep learning-based approaches utilizing CNNs, GANs, and U-Net [47]. These techniques aim to improve contrast, sharpness, and color balance in images while minimizing the effects of scattering, absorption, and other factors. However, there is still significant work required to further enhance the quality and clarity of underwater images, particularly under challenging conditions [29].

2 Major Contribution

Most existing works have designed a single deep network for image quality improvement. In contrast, this work proposes an Iterative Framework for Degradation Aware Underwater Image Enhancement (IDA-UIE).

IDA-UIE identifies a dominant degradation condition and appropriately enhances it. Correction of one degradation may reveal another degradation condition. Thus, the enhanced image is further subjected to degradation identification and subsequent enhancement. This system attempts to improve the image quality through degradation-aware enhancement iterations.

This section details the significant contributions made in this project, which focus on enhancing underwater images through an innovative framework and specialized deep networks.

1.

Iterative Framework for Degradation Aware Underwater Image Enhancement : One of the primary contributions is the proposal of an iterative framework specifically designed for degradation-aware underwater image enhancement. Traditional methods often employ a single deep network to improve image quality. However, these approaches can fall short when dealing with complex and varied degradation types found in underwater images.

Our iterative framework, named Iterative Degradation Aware Underwater Image Enhancement (IDA-UIE), addresses this by identifying the dominant degradation condition in an image and enhancing it accordingly. The process is iterative because enhancing one type of degradation can reveal another underlying issue. Thus, after the initial enhancement, the image is re-evaluated for additional degradations, which are then corrected in subsequent iterations. This iterative approach ensures a comprehensive enhancement process, gradually improving the image quality through multiple refinement steps.
2.

Deep Network for Identifying Dominant Degradation : To support the iterative framework, we designed a deep network, denoted as $\mathbf{\Phi}_{DC}$ , for identifying the dominant degradation in underwater images. This network is critical as it drives the entire enhancement process by accurately pinpointing the most significant degradation affecting the image.

The $\mathbf{\Phi}_{DC}$ network is trained to recognize eight specific types of degradation: low illumination, low contrast, haziness, blur, noise, and color imbalances in the red, green, and blue channels. Additionally, it can identify if an image is not degraded. This identification step is crucial for ensuring that each image receives the appropriate type of enhancement.
3.

Eight Deep Networks for Condition-Specific Underwater Image Enhancement : Following the identification of the dominant degradation, the framework employs one of eight specialized deep networks to enhance the image. Each of these networks is tailored to address a specific type of degradation:

$\mathbf{\Phi}_{IC}$ : Illumination Correction - Enhances images with low illumination, improving visibility and detail.

$\mathbf{\Phi}_{CE}$ : Contrast Enhancement - Increases the contrast in images, making features more distinguishable.

$\mathbf{\Phi}_{DH}$ : Removes haziness to clarify images.

$\mathbf{\Phi}_{DB}$ : Sharpens images to correct blur.

$\mathbf{\Phi}_{DN}$ : Reduces noise to produce cleaner images.

$\mathbf{\Phi}_{CBR}$ : Color Balance for Red Channel - Corrects color imbalances in the red channel.

$\mathbf{\Phi}_{CBB}$ : Color Balance for Blue Channel - Corrects color imbalances in the blue channel.

$\mathbf{\Phi}_{CBG}$ : Color Balance for Green Channel - Corrects color imbalances in the green channel.

Each network has been meticulously designed and trained to excel at its specific enhancement task, ensuring that the iterative framework can effectively improve various aspects of underwater images.
4.

Construction of Two Datasets with Condition-Specific Degradations : To train the nine deep networks (one for degradation identification and eight for specific enhancements), we constructed two extensive datasets: UIEB-D8 and EUVP-X-D8. These datasets are based on standard underwater image datasets but have been augmented with condition-specific degradations to simulate real-world underwater conditions more accurately.

Each image in these datasets has been systematically degraded to reflect one of the eight targeted conditions. This detailed and condition-specific dataset construction ensures that the networks are well-trained to recognize and correct each type of degradation effectively.

UIEB-D8 Dataset The UIEB-D8 dataset is derived from the UIEB dataset [14], with images subjected to controlled degradations to create training examples for each of the eight conditions. This dataset provides a robust foundation for training the enhancement networks.

EUVP-X-D8 Dataset Similarly, the EUVP-X-D8 dataset is based on the EUVP dataset [13] and includes images with various degradations. By using these two diverse datasets, the networks are trained to handle a wide range of underwater image conditions, enhancing their generalizability and effectiveness.

3 Related Work

Here, presents a classification and summary of existing techniques for enhancing underwater images, mainly categorized into traditional and deep learning-based methods. The underwater image enhancement (UIE) techniques are broadly categorized in Figure 8.

3.1 Traditional Methods

Traditional methods include both model-based and non-model methods. Non-model methods, such as the histogram algorithm, enhance visual effects through pixel adjustments without considering imaging principles. Model-based methods, also known as image restoration techniques, estimate the relationship between clear, blurry, and transmission images based on an imaging model to produce clear images. An example of a model-based method is the dark channel prior (DCP) algorithm, as shown in Figure fig:technique [2].

3.1.1 Image Denoising

Image denoising is a technique used to reduce or remove noise from digital images. It aims to improve the visual quality of an image by suppressing unwanted noise while preserving important details and structures [27].

3.1.2 Contrast Enhancement Techniques

Image quality is often evaluated based on the level of contrast in the image. Contrast refers to the difference in luminance reflected from two adjacent planes and is a key factor in making objects distinguishable from the background. Vision is more sensitive to contrast than absolute luminance, which allows us to perceive the world despite variations in illumination conditions. If an image has highly concentrated contrast in a particular range, such as being very dark, critical information may be lost in those areas. Therefore, optimizing the contrast is necessary to represent all the details in the input image. To address issues related to contrast in underwater image processing, numerous algorithms for achieving contrast enhancement have been developed [2].

3.1.3 Color Correction Techniques

The colors present in underwater images are mainly blue and green due to their shorter wavelengths. The histogram distribution of these images indicates that the green channel’s mean is more significant than that of the red channel, and the RGB channels’ distribution range does not cover the full range of [0, 255]. To correct the issue of color cast, color correction techniques are used to improve the visual information content of underwater images. A manual correction approach is found to be better than automatic enhancement techniques in terms of the significance level. An enhancement method that uses fuzzy logic and bacterial foraging optimization is proposed to remove the color cast, which gives better results than existing algorithms. Additionally, a method for non-uniform illumination correction is proposed, which uses maximum-likelihood estimation to map the image to Rayleigh distribution. An adaptive linear stretch method that adjusts regions with low light distributions with a threshold depending on the histogram is also proposed.

3.1.4 Histogram Equalization Method

Underwater images often require image enhancement for improved quality. As such, there are several methods available in the literature to address this issue. In this study, a new underwater image enhancement method is proposed. This method employs the HSV, V transform algorithm, and histogram equalization techniques. Initially, the RGB image is separated into its R, G, and B components, and then converted into the HSV color space. The V element is then extended within a specified interval before converting the image back to the RGB color space. Histogram equalization is then applied to each of the R, G, and B components, and the components are combined to form a color image. Finally, a Gaussian low-pass filter is applied to the image. The performance of the proposed method is compared to that of other studies using mean value and entropy metric, which demonstrate that the proposed method significantly improves underwater image quality [31].

3.1.5 CLAHE

Ordinary AHE tends to over-amplify the contrast in near-constant regions of the image. It is originally developed for the enhancement of low-contrast images [34]. CLAHE is a variant of adaptive histogram equalization in which contrast amplification is limited to reduce this problem of noise amplification. In order to limit noise amplification, CLAHE is used [44]. In CLAHE, the contrast-limited procedure is applied to each neighborhood from which the transformation function is derived. Rather than taking the whole image, CLAHE prevents over-amplification by dividing the image into small data regions called tiles and performing contrast enhancement [4]. These tiles are then rejoined to get an overall enhanced image. It is applied to both grayscale and colored images [30] [4].

3.1.6 Retinex Based Method

Underwater images often suffer from low contrast and color distortion due to the variable attenuation of light and non-uniform absorption of red, green, and blue components. To address these issues, a Retinex-based approach for underwater image enhancement has been proposed. The approach involves using contrast-limited adaptive histogram equalization (CLAHE) to enhance the contrast of the darker components of the underwater image while limiting noise, which may blur visual information. Next, a Retinex-based enhancement is performed on the CLAHE-processed image to restore distorted colors [30] [4]. To restore distorted edges and achieve smoothing of the blurred parts of the image, bilateral filtering is performed on the Retinex-processed image. To optimize the individual strengths of CLAHE, Retinex, and bilateral filtering algorithms within a single framework, suitable parameter values are determined. Comparing the performance of the proposed approach with existing methods, both qualitatively and quantitatively, indicates that it results in better enhancement of underwater images [29].

3.1.7 Dark Channel Prior

Haze arises from particles suspended in bodies of water such as sand, minerals, and plankton. This phenomenon disrupts the clarity of underwater images by reducing contrast, causing poor visibility, absorbing natural light, and limiting color variation. Enhancing the quality and visibility of underwater images requires the dehazing process [22]. This research introduces the Dark Channel Prior (DCP) algorithm, which capitalizes on the observation that most local patches in haze-free outdoor images contain pixels with very low intensity in at least one color channel. By utilizing DCP, underwater images exhibit significantly improved visibility and superior color accuracy. Moreover, this approach reduces computational complexity and enhances dehazing efficiency. Underwater images experience distortions primarily due to light dispersion and color effects. The dispersion of light and its scattering in water reduces the visibility and contrast of captured images. Additionally, color changes caused by the presence of particles such as sand, minerals, and plankton in the water, along with the absorption and scattering of natural light, further impact underwater images. When light reflects from objects in the water, it encounters suspended particles, leading to light absorption and scattering [22]. To address these issues, the Dark Channel Prior (DCP) method is applied. This method estimates the atmospheric light and utilizes a mathematical function to handle both sky and non-sky regions. It identifies affected patches in the images, estimates the scene depth, and removes the haze to enhance the clarity of the image. To improve the accuracy of the depth map generated by the block-based dark channel prior, image matting is employed. This combination of techniques enhances accuracy and enables more precise identification of object contours [22]. The application of image matting to the underwater depth map, derived through the general dark-channel methodology, represents a novel approach. Subsequently, the following section presents a list of existing works in this field.

3.1.8 Other Methods

Underwater images often suffer from low contrast and poor visibility, making it crucial to enhance them before further processing. Image enhancement techniques aim to improve the quality and contrast of degraded underwater photos and videos. Standard cameras used for capturing underwater scenes face challenges such as limited available light, low resolution, and blurriness, necessitating the improvement of the initial images or videos obtained from image processing equipment. Researchers have proposed various solutions to address these challenges.

One commonly used approach for enhancing underwater images is the dark channel prior (DCP), which aims to improve the Peak Signal to Noise Ratio (PSNR). However, DCP has significant drawbacks, including the tendency to darken images, reduce contrast, and introduce halo effects. To overcome these limitations, the suggested technique incorporates contrast-limited adaptive histogram equalization (CLAHE) and the Adaptive Color Correction technique.

To evaluate the proposed approach, experiments were conducted using photographs obtained from the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) as well as from the internet. Performance measures such as entropy (MOE), enhancement (EME), mean square error (MSE), and PSNR were used during the evaluation. The results demonstrate that the proposed framework outperforms other methods in terms of MSE and PSNR, achieving values of 0.26 and 32, respectively.

Mean Filter The mean filter is a method employed to decrease image noise. It involves performing a local averaging operation, making it one of the most basic linear filters. In this technique, each pixel’s value is substituted with the average value of all the pixels in its surrounding neighborhood. If we denote a noisy image as $f(i,j)$ , the resulting smoothed image can be obtained as $g(x,y)$ by following this process.

g(x,y)=\frac{1}{n}_{i,j\in S}\sum f(i,j)

(8)

Bilateral Filter A bilateral filter is a non-linear image-smoothing filter that preserves edges while reducing noise. It operates by replacing the intensity of each pixel with a weighted average of the intensities of nearby pixels. The weights are determined using a Gaussian distribution.

BF[I]_{p}=\sum_{q\in S}G_{\sigma_{s}}(||p-q||)G_{\sigma_{r}}(|I_{p}-I_{q}|)I_{q}

(9)

Gaussian Filter A Gaussian Filter serves as a low-pass filter employed to diminish noise (high-frequency components) and blur specific areas within an image. This filter is constructed as an Odd-sized Symmetric Kernel (a Matrix in Digital Image Processing terms), which is applied to each pixel in the Region of Interest to achieve the intended outcome. The kernel is designed to be gentle regarding significant colour changes (edges), as the pixels near the centre of the kernel hold more significance in determining the final value compared to those at the edges.

G(x,y)=\frac{1}{2\pi\sigma^{2}}e^{-\frac{x^{2}+y^{2}}{2\sigma^{2}}}

(10)

Median Filter The median filter, frequently employed in digital filtering, is a non-linear technique aimed at eliminating noise from images or signals. It serves as a common pre-processing step to enhance subsequent processing outcomes, such as image edge detection.

3.2 Deep Learning

Here, proposes a CNN-based network for enhancing underwater images, which can learn a map** to estimate the color-corrected image and transmission map without requiring extra labels on the target source. The report employs a pixels-disrupting strategy to suppress the interference of tiny textures in local patches, resulting in improved convergent speed and accuracy during the learning process. The proposed framework is trained on a synthesis dataset of 200,000 underwater images using the underwater imaging model presented in this report and demonstrates superior generalization ability on real-source underwater images.

Deep underwater image enhancement algorithms can be categorized into two primary types: CNN-based and GAN-based algorithms. The CNN algorithms focus on preserving the authenticity of the original underwater image, while the GAN-based algorithms strive to enhance the visual quality of the images. However, this classification is simplistic, so classify the networks based on their architectural distinctions.

3.2.1 Encoder-Decoder Models

The following models benefit from the well-known encoder–decoder architecture to advance underwater image enhancement research. P2P Network Recently, [21] proposed an approach to improve the quality of underwater images using pixel-to-pixel (P2P) networks. Their model, resembling REDNet [20], adopts a symmetric architecture consisting of an encoder and a decoder. The encoder is constructed with three convolutional layers, while the decoder is formed by three deconvolutional layers. ReLU activation is applied to each network element except for the last one as shown in Fig 9.

To train the model, the authors utilized a dataset of 3359 real-world underwater images. They introduced degradation levels by adding 30, 50, and 70 ml of milk to 1 $m^{3}$ of water, representing low, medium, and high degradation, respectively. Among the dataset, 10,000 images were chosen for training purposes, and an additional 2000 images were reserved for testing.

To achieve a data-driven image enhancement model, the super-parameters of our network play a crucial role. The convolutional part retains the first three layers while discarding the fully connected layers. The reason behind this decision is that the full connection layers are designed for feature map** from two dimensions to one, primarily for input to classifiers. However, the objective is to create a pixel-to-pixel network for image enhancement, which differs from classification tasks. Utilizing full connection layers would result in the loss of important two-dimensional information, making it unsuitable for underwater image enhancement[21]. Additionally, chose to abandon the pooling layers. Although pooling and unpooling layers can enhance object recognition and semantic segmentation by sharpening object edges, they are unnecessary and detrimental to image enhancement and denoising tasks. This is primarily because pooling layers lead to denser feature graphs during the multi-to-one map** operation, causing the loss of spatial information within a receptive field. Furthermore, the corresponding unpooling layers introduce considerable noise information. During the unpooling map**, only one value originates from the original feature map, while the remaining values are artificially generated (typically filled with zeros) [21].

3.2.2 U-Net

The improvement of U-Net is based on network structure. The specific structure diagram is shown in Fig. 10. The convolutional block attention module (CBAM) was added to the first U-Net as an attention mechanism module that combines spatial and channel. By applying attention to both the channel and spatial dimensions, it can be embedded into most current mainstream networks, and the feature extraction ability of the network model can be improved without significantly increasing the amount of computation and the number of parameters. A latent image representing the underwater image after compensating for the red light was estimated by using a U-Net, and another U-Net was used to estimate the transmission image from the input grey-scale image. To avoid losing details during network map**, the CBAM was added [28]. The first U-Net consists of an encoder stage and a decoder stage. The encoder stage consists of five network layers, with each layer containing two convolution layers. A kernel size of 3 is used in each convolutional layer, and each convolutional layer is followed by a LeakyReLU activation function and a BatchNorm2d function.

A combination of multi-scale structure similarity and L1 is used for the loss function. To calculate SSIM, the appropriate selection of the size of the Gaussian kernel to compute the image mean value and variance is particularly crucial. If the selection is small, the local structure of the image cannot be ll-maintained by the calculated SSIM loss, and artifacts will appear. If the selection is large, the noise will be generated by the network at the edge of the image.

3.2.3 Conditional Generative Adversarial Network

Underwater images are crucial for obtaining and interpreting information about the underwater environment. The reliability of underwater intelligent systems depends on high-quality underwater images. Unfortunately, these images often suffer from low contrast, color casts, blurring, low light, and uneven illumination, which severely limit their usefulness. To address this issue, numerous methods have been proposed, including those that utilize deep learning technologies. Hover, the performance of these methods is often unsatisfactory due to a lack of sufficient training data and effective network structures [26].

To tackle these challenges, this report proposes a conditional generative adversarial network (cGAN) for enhancing underwater images. The proposed approach uses a multi-scale generator to produce clear underwater images and a dual discriminator to capture local and global semantic information, ensuring that the generated results are both realistic and natural. Experimental results, obtained from both real-world and synthetic underwater images, show that the proposed method outperforms existing state-of-the-art underwater image enhancement methods [26].

Multi-Scale Generator. cGAN’s multi-scale generator comprises three main components: a multi-scale feature extraction unit, a feature refinement unit, and a residual map estimation unit. The multi-scale feature extraction unit is constructed using three sets of multi-scale convolutions with different kernel sizes (7x7, 5x5, and 3x3), each set consisting of five convolutional layers with increasing filter numbers ranging from 16 to 256 [26]. A non-linear activation ReLU follows each convolutional layer. The multi-scale feature extractor aims to obtain statistical information from inputs on various scales by acquiring different receptive fields. The multi-scale features are then down-sampled by half of their original size, concatenated, and fed to the feature refinement unit to capture global features and reduce computational costs. The refined features are processed through successive convolutional layers before being down-sampled and fed to three successive convolutional layers, each with 64 filters, and then up-sampled to their original size. Finally, a residual map is estimated by a convolutional layer without non-linear activity, which is used to achieve the final enhanced result via element-wise addition. Zero padding is applied to each convolutional layer to maintain input and output sizes. With the exception of the multi-scale feature extractor’s convolutional layers, all convolutional layers have 3x3 kernel sizes. Unlike the common encoder-decoder and cGAN network structures, the generator includes a multi-scale feature extraction unit designed to enhance network capability and adapt to varying underwater sources. Additionally, the generator has a shallow and lightweight structure and does not use skip connections.

Dual Discriminator. The dual discriminator comprises two sub-discriminators with identical network structures but different weights. Additionally, the inputs to these sub-discriminators have different sizes - one is the original size, while the other is half the original size. The dual discriminator aims to guide the generator in producing realistic images at both the global semantic and local detail levels. This design is necessary because the existing discriminator cannot effectively guide the generator to create realistic details. By providing multi-resolution inputs to different discriminators, the visual quality of the results can be improved. Specifically, the sub-discriminator contains eight convolutional layers with an increasing number of 3x3 filters, increasing from 64 to 512 by a factor of 2. Stridden convolutions are used to reduce the image resolutions, and the 512 feature maps are fed to two fully connected layers to predict the probability of the inputs being real or fake. Unlike the multi-scale generator, the first convolution in the sub-discriminator is followed by Leaky ReLU non-linear activation, while the other convolutions are followed by batch normalization and Leaky ReLU. The last fully connected layer uses the Sigmoid non-linear activation to predict the probability, which is commonly used in image classification tasks. These two sub-discriminators are employed to guide the multiscale generator [26].

3.2.4 Cycle GAN

A variation of the standard GAN network structure is the cycle-consistent adversarial network (CycleGAN), which uses two mirror-symmetric GAN generators and two matching discriminators arranged in a ring network. The CycleGAN framework involves training two GAN networks, denoted as G and F, along with two discriminators, $D_{x}$ and $D_{y}$ . The generators G and F are utilized to discover the map** relationships between the X and Y domains and the Y and X domains, respectively. The necessary conditions for the input picture and the produced image to correlate are F(G(x)) $\approx$ x and G(F(y)) $\approx$ y. To achieve cyclic consistency, Cyc1e GAN is suggested as the cyclic consistency loss function. This network structure overcomes the challenge faced by GANs, which require paired data for training, and performs all with underwater photos that do not have paired data [33]. The CycleGAN is a GAN designed for unpaired image-to-image translation, where the task is to translate images from a Source domain X to a target domain Y. It consists of two GANs, one for translating from domain X to Y and one from Y to X [32]. The two discriminators represent the functions:

D_{A}:X\to R;D_{B}:Y\to R

(11)

the two generators represent the function:

G_{A}:X\to Y;G_{B}:Y\to X

(12)

Discriminator The structure of the discriminators used in cycleGAN is rather conventional: fully convolutional neural networks with five-layer blocks, each of which has an instance normalization layer, a leaky reLU layer, and a 2D convolution layer with a kernel size of 4x4 and stride of 2. (except the output block which uses a Sigmoid Layer as activation) [48]. Each of the first, five-layer blocks will reduce the size of the picture by half and increase the number of channels every time an image is fed into a discriminator. The input will thus have 512 channels and a size of 16x16 after the fifth layer (The model input has a size of 256x256). The output layer block will finally combine all 512 channels into a single 16x16 channel.

Generator A generator’s goal is to alter the input picture and produce it as the output. A neural network structure is made up of three parts an encoder, a transformer, and a decoder. While increasing the number of channels, the encoder reduces the size of the input pictures. It is made up of 3 layers of blocks, similar to the Discriminator, with a 2D Convolution Layer, an Instance Normalization Layer, and a Leaky ReLU Layer in each block. The first layer block just adds 64 input channels; it has no effect on the image’s size. However, each of the next 2 layers of blocks reduces the input size by 50% while increasing the number of channels. The transformer then receives the altered input [45] [47]. The transformer maintains the input’s size while adding the needed characteristics [43]. It has six ResNet blocks, also known as residual netblocks. Each ResNet has two layers blocks: the first layer block has a Leaky ReLU Layer, an Instance Normalization Layer, and a 2D Convolution Layer (with stride=1). A 2D Convolution Layer (with stride=1) and an Instance Normalization Layers are both included in the second layer block. The decoder receives the modified input after that [41]. To create the final output image, the decoder shrinks the input to its original size and collapses all channels into RGB. Two Transpose Convolution Layers are stacked together to accomplish the enlargement operation. A transpose convolution layer might be thought of as a simple combination of a 2D Up-sampling layer and a 2D convolution layer with stride=1. In general, it will reduce the number of channels while increasing the size of the input. The output layer will eventually receive the 256x256 pixel data with 64 channels generated by the two transpose convolution layers, collapse the channels into RGB, and output it as the final output image [38] [39].

Deep learning techniques have shown promise in enhancing underwater images, but there are gaps in the research that need to be addressed. One challenge is the high number of parameters involved, leading to overfitting and reduced generalization ability. There is a need for research to develop efficient deep-learning models with fewer parameters that still achieve good performance. Another gap is the interpretability of deep learning models, which are often considered "black-box" models, making it difficult to understand how they make decisions. There is a need for develo** methods to interpret these models to identify strengths and weaknesses and improve performance. Overall, the research gap in deep learning for underwater image enhancement is in develo** efficient and interpretable models with good performance.

Several methods have been proposed to tackle challenges in Underwater Image Enhancement (UIE). Challenges like light attenuation and scattering often result in color casts and diminished visibility [15]. One particularly noteworthy approach introduced a novel quality assessment method centered around colorfulness, contrast, and visibility metrics, providing an effective means to evaluate UIE outcomes [15]. However, the diverse underwater landscapes pose a challenge to existing color constancy methods. To address this, an adaptive UIE technique leveraging hue channel statistics and deep learning networks trained on authentic datasets with ground truth annotations was developed in [16].

Texture and color enhancement are pivotal for effective underwater image enhancement, and the Texture-Aware and Color-Consistent Network (TACC-Net) has emerged as a standout performer in this regard. By decoupling features to enhance texture and ensure color consistency, TACC-Net has significantly improved visual quality [17]. Meanwhile, issues such as light absorption and turbulence continue to impair image quality in underwater target imaging, affecting clarity and resolution. To address these challenges, a study has proposed a block mixed filter denoising technique and underscored the importance of objective quality evaluation for image enhancement methods [18].

3.3 Baseline Methods:

3.3.1 Fusion Based

This paper [23] introduces a novel strategy to enhance underwater videos and images using fusion principles. The unique aspect of this strategy is that it derives both the inputs and the weight measures solely from the degraded version of the image, without the need for specialized hardware or prior knowledge of underwater conditions or scene structure.

The approach involves the derivation of two inputs from the original underwater image or frame. The first input is a color-corrected version that addresses the color distortion commonly caused by underwater environments. The second input is a contrast-enhanced version, which aims to improve the visibility of details often lost in the hazy underwater images. These inputs help to mitigate the color and contrast issues inherent in underwater imaging [23].

Additionally, four weight maps are defined to increase the visibility of distant objects, which are usually degraded due to medium scattering and absorption in underwater environments. These weight maps help in selectively emphasizing important features in the image, enhancing the overall clarity.

The fusion framework integrates these inputs and weight maps to produce an enhanced image. This approach ensures that the finest details and edges in the image are significantly improved. The enhanced images and videos are characterized by a reduced noise level, as effective edge-preserving noise reduction strategies are applied to minimize noise while retaining important details. Dark areas in the image are better exposed, making hidden details more visible. The overall contrast of the image is enhanced, making it more visually appealing and informative.

For videos, the framework also ensures temporal coherence between adjacent frames. This means that the enhancement process maintains consistency across frames, preventing flickering or abrupt changes that can distract viewers.

The utility of this enhancement technique is demonstrated across several challenging applications, showing its versatility and effectiveness in various underwater imaging scenarios.

3.3.2 UGan-Based

Autonomous underwater vehicles (AUVs) rely on a variety of sensors, including acoustic, inertial, and visual sensors, for intelligent decision-making. Among these, vision is particularly attractive due to its non-intrusive, passive nature and high information content, especially at shallower depths. However, several factors adversely affect the quality of visual data obtained underwater. Light refraction and absorption, suspended particles in the water, and color distortion all contribute to producing noisy and distorted images. Consequently, AUVs that depend on visual sensing face significant challenges and often exhibit poor performance on vision-driven tasks [50].

This paper [50] proposes a method to enhance the quality of visual underwater scenes using Generative Adversarial Networks (GANs). The goal is to improve the visual input for vision-driven behaviors further down the autonomy pipeline of AUVs. GANs are well-suited for this task because of their ability to generate high-quality images that closely resemble real-world scenes, making them ideal for underwater image restoration [50].

The key challenges in underwater visual data include light refraction and absorption, suspended particles, and color distortion. Underwater environments significantly alter light paths, causing refraction and absorption that lead to reduced clarity and visibility. Particles in the water scatter light, creating a hazy appearance and further degrading image quality. The underwater medium absorbs different wavelengths of light at different rates, causing color distortions that affect the accuracy of visual data [50].

The proposed method leverages the power of GANs to address these challenges. GANs consist of two neural networks: a generator and a discriminator. The generator creates enhanced images from the degraded input, while the discriminator evaluates the authenticity of the generated images, driving the generator to produce increasingly realistic enhancements. This adversarial process results in images that are not only visually appealing but also more useful for subsequent vision-driven tasks.

To train the GANs effectively, a dataset specifically tailored for underwater image restoration is required. Recently proposed methods allow for the generation of such datasets by simulating various underwater conditions and degradations. This synthetic dataset includes images with different types of distortions commonly found in underwater environments, providing a comprehensive training set for the GANs.

For visually-guided underwater robots, improving the quality of visual data can lead to increased safety and reliability. Enhanced visual perception enables AUVs to perform better in tasks such as navigation, object detection, and diver tracking. The proposed GAN-based approach not only generates visually appealing images but also enhances the accuracy of vision-driven algorithms.

The effectiveness of the proposed method is demonstrated through both quantitative and qualitative evaluations. Enhanced images show significant improvements in clarity, color accuracy, and detail preservation compared to the original degraded images. Additionally, these improvements translate to increased accuracy for a diver tracking algorithm, showcasing the practical benefits of the enhanced visual data.

3.3.3 FUnIE-GAN

In this work, [13] a conditional generative adversarial network-based model is presented for real-time underwater image enhancement. The model’s adversarial training is supervised by an objective function that evaluates perceptual image quality based on global content, color, local texture, and style information. A large-scale dataset, EUVP, is introduced, consisting of paired and unpaired collections of underwater images of varying quality, captured using seven different cameras under various visibility conditions during oceanic explorations and human-robot collaborative experiments.

Several qualitative and quantitative evaluations were performed, demonstrating that the proposed model effectively learns to enhance underwater image quality from both paired and unpaired training datasets. The enhanced images improve the performance of standard models for underwater object detection, human pose estimation, and saliency prediction. These results validate the suitability of the proposed model for real-time preprocessing in the autonomy pipeline of visually-guided underwater robots[13].

3.3.4 Deep- SESR

In this paper [12], the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision is introduced and tackled, providing an efficient solution for near real-time applications. The proposed solution, Deep SESR, is a generative model based on a residual-in-residual network that learns to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. The model is trained using a multi-modal objective function that addresses chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. Additionally, the model is supervised to learn salient foreground regions in the image, which guides it to enhance global contrast.

An end-to-end training pipeline is designed to jointly learn saliency prediction and SESR on a shared hierarchical feature space for fast inference. This approach ensures that the model can process images quickly, making it suitable for near real-time applications [12].

The paper [12] also introduces UFO-120, the first dataset designed to facilitate large-scale SESR learning, containing over 1500 training samples and a benchmark test set of 120 samples. Experimental evaluations on UFO-120 and other standard datasets demonstrate that Deep SESR outperforms existing solutions for underwater image enhancement and super-resolution. The model’s generalization performance is validated on several test cases, including underwater images with diverse spectral and spatial degradation levels and terrestrial images with unseen natural objects.

Furthermore, the computational feasibility of Deep SESR for single-board deployments is analyzed, demonstrating its operational benefits for visually-guided underwater robots. The model’s ability to enhance and super-resolve images in near real-time provides significant advantages for underwater robotics, enabling more accurate and reliable visual perception in challenging underwater environments.

3.3.5 WaterNet

Underwater image enhancement is vital for marine engineering and aquatic robotics, but existing algorithms are mainly tested on synthetic datasets or limited real-world images. To evaluate these algorithms’ real-world performance, a comprehensive perceptual study using large-scale real-world images is conducted. This study introduces the Underwater Image Enhancement Benchmark (UIEB), containing 950 real-world underwater images, with 890 having corresponding reference images and 60 considered challenging due to the lack of satisfactory references [14].

The study also proposes Water-Net, an underwater image enhancement network trained on the UIEB [24]. The benchmark evaluations and Water-Net demonstrate the strengths and limitations of current algorithms, providing insights for future research. This work advances the assessment and benchmarking of underwater image enhancement algorithms, contributing to the field’s progress [14].

3.3.6 MSSCE-GAN

Enhancing underwater images is crucial for applications such as underwater exploration. Traditional methods often rely on paired underwater and reference images for training, which are challenging to acquire. These methods frequently suffer from information loss, resulting in blurred details and limited applicability across diverse underwater conditions [49].

This paper [49] introduces a novel approach using the Multi-Scale Structural and Color Enhanced Generative Adversarial Network (MSSCE-GAN) for unpaired underwater image enhancement. The method includes modules for detail feature recovery and attention enhancement, addressing various distortions prevalent in underwater imagery.

Key to this approach is its ability to generate superior enhanced images without requiring paired training data. Experimental evaluations demonstrate significant improvements over existing techniques in terms of effectiveness and generalizability across multiple underwater image datasets.

3.3.7 Deep WaveNet

Underwater images typically suffer from low contrast and significant color distortions due to varying light attenuation as it travels through water. This phenomenon affects different colors asymmetrically, complicating image restoration tasks. Despite numerous attempts using deep learning for underwater image restoration (UIR), existing methods often overlook this asymmetry in network design [11].

This article introduces two novel contributions to address these challenges in UIR. Firstly, it proposes adapting receptive field sizes based on the wavelength-dependent attenuation of color channels, aiming for improved performance. Secondly, it incorporates an attentive skip mechanism to refine multi-contextual features effectively, enhancing model representational power while suppressing irrelevant features.

The proposed framework, Deep WaveNet, is optimized using pixel-wise and feature-based cost functions. Extensive experiments demonstrate its superiority over state-of-the-art methods on benchmark datasets. Furthermore, the study validates the enhanced images through various high-level vision tasks, such as underwater image semantic segmentation and diver’s 2D pose estimation [11].

4 Dataset

4.1 UIEB

The Underwater Image Enhancement Benchmark (UIEB) dataset comprises 950 real-world underwater images, each with a size of $256\times 256$ pixels. Among these, 890 images have corresponding reference images available for evaluation, while the remaining 60 images lack satisfactory reference images, presenting a challenge for analysis, showing in Table 1. This dataset serves as a crucial resource for conducting comprehensive studies on underwater image enhancement algorithms, enabling both qualitative and quantitative assessments of algorithm performance.

Table 1: Summary of Underwater Dataset UIEB [14]

Dataset Characteristics	Details
Number of Real-world Images	950
Number of Images with Reference	890
Number of Challenging Images	60

4.2 EUVP

4.2.1 Paired Dataset

Underwater Dark:

This dataset comprises 5550 pairs of images for training, each with a size of $256\times 256$ pixels. Each pair consists of two images, one contains poor-quality or gray images, and the other contains enhanced or colored images. The filenames for each pair are identical. Additionally, 570 images are set aside for validation. In total, the dataset contains 11,670 images as shown in Table 2 .

Underwater ImageNet:

The Underwater ImageNet dataset consists of 3700 pairs of images for training, each with a size of $256\times 256$ pixels. Similar to the Underwater Dark dataset, one contains poor-quality images and the other contains enhanced or better-quality images. The filenames for corresponding pairs match. The dataset also includes 1270 images for validation, resulting in a total of 8670 images.

Underwater Scenes:

This dataset comprises 2185 pairs of images, each with a size of $320\times 240$ pixels for training, with each pair containing a poor-quality image and a corresponding enhanced or better-quality image. The filenames for corresponding pairs are consistent. Additionally, 130 images are allocated for validation purposes. In total, the dataset encompasses 4500 images.

Table 2: Summary of Underwater Datasets EUVP (paired data) [13]

Dataset Name	Training Pairs	Validation	Total Images
Underwater Dark	5550 pairs	570	11670
Underwater ImageNet	3700 pairs	1270	8670
Underwater Scenes	2185 pairs	130	4500

4.2.2 Unpaired Data

In the dataset for unpaired training, there are 3195 images representing poor quality images, while the set comprises 3140 images of enhanced or better quality. These images come in sizes of $960\times 540$ , $640\times 480$ , and $320\times 240$ pixels. Additionally, there are 330 images allocated for validation purposes. These images are not paired, meaning that there is no one-to-one correspondence between the poor-quality and enhanced-quality images. This dataset arrangement allows for the training of models aimed at enhancing image quality without relying on direct paired examples as shown in Table 3.

Table 3: Distribution of images in the dataset (Unpaired data)

Poor quality	Good quality	Validation	Total Images
3195	3140	330	6665

5 Performance Evaluation

Underwater image quality assessment is a challenging task that is used to evaluate the quality of the image accurately and automatically. Image quality assessment (IQA) methods are employed to automatically evaluate the quality of images. IQA approaches are broadly classified into (a) objective and (b) subjective image quality assessment. Subjective image quality assessments are expensive and time-consuming and hence not suitable for real-time applications. Objective assessment techniques use statistical and mathematical models based on the human visual system (HVS) to automatically estimate image quality. Based on the availability of the original image, objective IQA methods can be classified into three categories (1) full reference IQA (FR) where the reference image is available, (2) reduced reference IQA (RR) where partial information of the reference image is available and (3) no reference IQA (NR) in which the reference image is not available. In addition to the standard performance evaluation parameters, to assess underwater image quality effectively, specialized metrics are proposed in the literature.

The performance of various underwater image enhancement and restoration techniques is analyzed using different qualitative and quantitative parameters. The qualitative evaluation involves the visual enhancement of the image by comparing histograms. The quantitative performance framework deals with various quality metric parameters which include:

•

Mean square error (MSE): MSE computes the cumulative squared error between the enhanced and the original image. The lower the MSE, the better the quality (low error) and is given as:

MSE=\frac{1}{MN}\sum_{i=1}^{M}\sum_{j=1}^{N}\big{[}F(i,j)-E(i,j)\big{]}^{2}

(13)

where F(i, j) is the original image, E(i, j) is the enhanced image, and M × N is image size.

•

Peak-signal-to-noise ratio (PSNR): : It is the measure of the peak error and computed as

PSNR=20\log_{10}\Big{(}\frac{MAX_{F}}{\sqrt{MSE}}\Big{)}

(14)

where maximum pixel value of the image is represented by $MAX_{F}$ and is 255 for gray level image.

•

Entropy : Entropy is a measure of information content present in the image and is given as:

H(F)=-\sum_{i=0}^{255}p_{i}\log_{2}p_{i}

(15)

where $p_{i}$ is the probability of occurrence of intensity i at a pixel in image F

•

Structure similarity index measure (SSIM): SSIM measures the similarity between original image patches and enhanced patches at locations x and y from three aspects: brightness, contrast, and structure

SSIM(F,E)=\frac{\big{(}2\mu_{x}\mu_{y}+C_{1}\big{)}\big{(}2\sigma_{xy}+C_{2}% \big{)}}{\big{(}\mu_{x}^{2}+\mu_{y}^{2}+C_{1}\big{)}\big{(}\sigma_{x}^{2}+% \sigma_{y}^{2}+C_{2}\big{)}}

(16)

where $\mu_{x},\mu_{y}$ are the mean values and $\sigma_{x},\sigma_{y}$ are the standard deviation values of the pixels in patch x and y respectively. $\sigma_{xy}$ is the covariance of patches x and y and $C1=(k_{1}L)^{2}$ and $C2=(k_{2}L)^{2}$ are small constants to avoid instability while the denominator is close to zero. L is the dynamic range of pixel values, $k_{1}=0.01$ and $k_{2}$ = 0.03. The higher the SSIM value, the smaller the distortion and the better the enhancement.

•

Colour enhancement factor (CEF): It helps in the representation of the effect of enhancement and is given as

CEF=\frac{CM(\tilde{I)}}{CM(I)}

(17)

$CM(I)=\sqrt{\sigma_{\alpha}^{2}+\sigma_{\beta}^{2}}+0.3\sqrt{\mu_{\alpha}^{2}+% \mu_{\beta}^{2}}$ where $\sigma_{\alpha}^{2},\sigma_{\beta}^{2}$ represent the standard deviations and $\mu_{\alpha}^{2}$ and $\mu_{\beta}^{2}$ are the average values of $\alpha$ and $\beta$ respectively. $CM(\tilde{I})$ is used to denote enhanced image and CM(I) the original image.

•

Contrast to noise ratio (CNR): This metric describes the amplitude of the signal relative to the surrounding noise in an image. CNR is computed by using

CNR(I,I^{\prime})=\frac{(\mu_{i}-\mu_{n})}{\sigma_{n}}

(18)

$\mu_{i}$ represents the mean value of original image and $\mu_{n}$ is mean value of enhanced image and $\sigma_{n}$ denotes the standard deviation.

•

Image enhancement metric (IEM): This metric gives information about the sharpness and the improvement in the contrast after the process of enhancement. It is computed as follows

IEM=\frac{\sum_{l=1}^{k1}\sum_{m=1}^{k2}\sum_{n=1}^{8}|I_{e,c}^{m,l}-I_{e,n}^{% m,l}|}{\sum_{l=1}^{k1}\sum_{m=1}^{k2}\sum_{n=1}^{8}|I_{o,c}^{m,l}-I_{o,n}^{m,l% }|}

(19)

k1 and k2 denote the non-overlap** blocks. o and e represent the original and enhanced images respectively. The intensities of the centre pixel is denoted by $I_{o,c}^{m,l},I_{e,c}^{m,l},I_{e,n}^{m,l},I_{o,n}^{m,l}$ are the intensities of the neighbours from the centre pixel.

•

Absolute mean brightness error(AMBE): AMBE helps to compute the brightness content that is preserved after the process of image enhancement. It is given as

AMBE(o,e)=|\mu_{o}-\mu_{e}|

(20)

where F(i, j) is the original image, E(i, j) is the enhanced image, and M × N is the image size, the equation represents the absolute difference between the mean of original $\mu_{o}$ and enhanced images $\mu_{e}$ . Median values of the AMBE metric indicate good preservation of brightness

•

Spatial spectral entropy based quality index (SSEQ): SSEQ is a highly efficient no reference (NR) IQA model proposed by. SSEQ can assess the quality of an image that is distorted across various distortion categories. SSEQ can be calculated by

E=-\sum_{i}\sum_{j}P_{i,j}\log_{2}P_{i,j}

(21)

where P(i, j) is the spectral probability map given as

P(i,j)=\frac{C(i,j)^{2}}{\sum_{i}\sum_{j}C(i,j)^{2}}

(22)

C is a coefficient matrix computed on (i,j) pixels.

•

Measure of enhancement (EME): EME calculates the contrast of the images and aids in the optimum selection of processing parameters. It is computed as:

EME_{m_{1}m_{2}}=max\Big{(}\frac{1}{m_{1}m_{2}}\sum_{l=1}^{m_{1}}\sum_{n=1}^{m% _{2}}20log\frac{X_{max;n,l}^{\omega}}{X_{min;n,l}^{\omega}}\Big{)}

(23)

where $X_{max;n,l}^{\omega}and{X_{min;n,l}^{\omega}}$ represent the maximum value and minimum value of the image within the block $\omega_{n,l}$

•

Root mean square error (RMSE): RMSE is computed by calculating the square root of MSE. It is given as

RMSE=\sqrt{\frac{1}{MN}\sum_{i=1}^{M}\sum_{j=1}^{N}\big{[}F(i,j)-E(i,j)]^{2}}

(24)

•

Measure of enhancement by entropy (EMEE): EMEE is computed by

EMEE_{m_{1}m_{2}}=max\Big{(}\frac{1}{m_{1}m_{2}}\sum_{l=1}^{m_{1}}\sum_{n=1}^{% m_{2}}\alpha\frac{X_{max;n,l}{\theta}^{\alpha}}{X_{min;n,l}{\theta}}\frac{X_{% max;n,l}{\theta}}{X_{min;n,l}{\theta}}\Big{)}

(25)

Good image quality is indicated by the high value of EMEE. m1 and m2 represent the blocks in which the image is divided.

•

Underwater color image quality evaluation metric (UCIQE): UCIQE was specifically designed to quantify the effects of non-uniform color cast, low contrast and issues of blurring that affect underwater images. UCIQE for an image X in CIELab space is calculated as:

UCIQE=c1*\sigma_{chroma}+c2*{contrast_{l}}+c3*\mu_{saturation}

(26)

where c1 c2 c3 represents the weighted coefficients, $\sigma_{chroma}$ denotes the standard deviation, $contrast_{l}$ is the contrast and the average value of saturation is denoted by $\mu_{s}aturation$ . Higher values of UCIQE signify that the image has good equilibrium among chroma, contrast, and saturation.

•

Underwater Image Colorfulness Measure (UICM): Underwater images often exhibit a color-casting problem wherein colors are gradually attenuated based on their wavelength as the water depth increases. The color red, which has the shortest wavelength, disappears first, resulting in a bluish or greenish appearance of the images. In addition, inadequate lighting conditions can also lead to significant color de-saturation. To address this, an effective algorithm for enhancing underwater images must ensure good color rendition. The human visual system (HVS) captures colors in the opponent color plane, and hence, the chrominance RG and YB components, which are associated with the two opponent color planes, are utilized in the UICM technique as illustrated in the reference.

RG-R-G

(27)

YB=\frac{R+G}{2}-B

(28)

Due to the heavy noise in underwater images, the traditional statistical values are not suitable for measuring their colorfulness. As a result, asymmetric alpha-trimmed statistical values are used instead. The mean can be expressed as:

\mu_{\alpha,RG}=\frac{1}{K-T_{\alpha L}-T_{\alpha R}}\sum_{i=T_{\alpha L+1}}^{% K-T_{\alpha R}}intensity_{RG,i}

(29)

The second-order statistic variance $\sigma^{2}$ in:

\sigma^{2}_{\alpha,RG}=\frac{1}{N}\sum_{p=1}^{N}(Intensity_{RG,p}-\mu_{\alpha,% RG})^{2}

(30)

The overall colorfulness metric used for measuring underwater image colorfulness is demonstrated in

UICM=-0.2868\sqrt{\mu^{2}_{\alpha,RG}+\mu^{2}_{\alpha,YB}}+0.1586\sqrt{\sigma^% {2}_{\alpha,RG}+\sigma^{2}_{\alpha,YB}}

(31)

•

Underwater Image Sharpness Measure (UISM): Sharpness pertains to the quality of preserving fine details and edges in an image. In underwater images, forward scattering often causes significant blurring, resulting in a loss of image sharpness. To quantify sharpness on edges, the Sobel edge detector is initially applied to each RGB color component, and the resulting edge map is multiplied with the original image to generate a grayscale edge map. This preserves only the pixels on the edges of the original underwater image. To measure the sharpness of these edges, the enhancement measure estimation (EME) method is suitable for images with uniform backgrounds and exhibits non-periodic patterns. Hence, EME is utilized to calculate the sharpness of edges. The UISM is:

UISM=\sum_{c=1}^{3}\lambda_{c}EME(grayscaleedge_{c})

(32)

EME=\frac{2}{k1k2}\sum_{l=1}^{k1}\sum_{k=1}^{k2}\log\frac{I_{max,k,l}}{I_{min,% k,l}}

(33)

•

Underwater Image Contrast Measure (UIConM): Studies have demonstrated a correlation between contrast and underwater visual capabilities, including stereoscopic acuity. In the case of underwater imagery, contrast deterioration is typically attributed to backward scattering. The intensity image is evaluated using the logAMEE measure to determine the contrast.

UIConM=\log AMEE(intensity)

(34)

The logAMEE in

logAMEE=\frac{2}{k1k2}\sum_{l=1}^{k1}\sum_{k=1}^{k2}\frac{I_{max,k,l},\ominus I% _{min,k,l}}{I_{max,k,l}\oplus I_{min,k,l}}*\log\frac{I_{max,k,l},\ominus I_{% min,k,l}}{I_{max,k,l}\oplus I_{min,k,l}}

(35)

•

Underwater image quality measure (UIQM): UIQM is based on the human visual system model and works without a reference image. UIQM comprises three main measures, UICM the underwater image colorfulness measure, UISM the underwater image sharpness measure, and UIConM the underwater image contrast measure. UIQM is calculated as follows:

UIQM=Coeff_{1}*UICM+Coeff_{2}*UISM+Coeff_{3}*UIConM

(36)

Higher values of UIQM indicate good levels of enhancement.

•

Colourfulness contrast fog density index (CCF): No-reference IQA method is proposed to predict underwater color image quality. using CCF metric. CCF metric is a weighted combination of colorfulness index, contrast index, and fog density index which is computed as,

CCF=\omega_{1}*Colorfulness+\omega_{2}*Contrast+\omega_{3}*Fogdensity

(37)

Colorfulness index due to absorption, blurring because of forward scattering and fog density due to backward scattering is examined in the CCF computation.

•

Average gradient (AG): Average gradient is a full reference metric that is used to define the sharpness of the given image. It represents the change in the rate of minute details present in the image. It is computed as,

AG=\frac{1}{(L-1)(M-1)}\sum_{i=1}^{L-1}\sum_{j=1}^{M-1}\sqrt{\big{(}\nabla_{x}% I(i,j)\big{)}^{2}+\sqrt{\big{(}\nabla_{y}I(i,j)\big{)}}}

(38)

where L and M denote the width and height of the image and $\nabla_{x}$ and $\nabla_{y}$ represent the the gradient in the x and y directions respectively [2].

•

Patch based contrast quality index (PCQI): PCQI is defined as,

PCQI(i,j)=\frac{1}{P}\sum_{k=1}^{P}l_{r}(i_{k},j_{k})l_{s}(i_{k},j_{k})l_{t}(i% _{k},j_{k})

(39)

where P is the number of patches present in the image and $l_{r}$ , $l_{s}$ , and $l_{t}$ represent the comparison functions. Higher values of PCQI indicate good contrast.

In this section, conducted experiments on two datasets, UIEB and EUVP, to evaluate the performance of various underwater image enhancement methods in terms of both qualitative and quantitative metrics. The UIEB dataset comprises 890 real underwater images, while the EUVP dataset contains paired and unpaired compilations of underwater images. selected five images from each dataset for evaluation purposes. used several typical methods for underwater image enhancement, including AHE, CLAHE, ICM, UCM, Gray World, Wavelet fusion, and the Recursive adaptive histogram modification method.

6 Datasets: UIEB-D8 and EUVP-X-D8

This work has used the two standard datasets UIEB [14] and EUVP [13] that are available in the public domain and are widely used in UIE research. The UIEB dataset has 890 paired images where each pair consists of a good quality image along with a degraded one. EUVP dataset has both paired and unpaired images. EUVP has three different paired datasets – Underwater Dark, Underwater ImageNet and Underwater Scenes.

6.1 Formation of Datasets

To diversify the dataset, 8 different degradation techniques were applied to the ground truth images:

6.1.1 Illumination Degradation

Low illumination in images can result from various factors such as poor lighting conditions. Simulating low illumination is crucial for testing the robustness of image processing algorithms in real-world scenarios. The degradation is achieved by reducing the overall brightness of the image, mimicking the effect of dim lighting conditions. This reduction in brightness can lead to loss of details and visibility of objects in the image and variation of illumination shown in Fig 22.

The equation to simulate low illumination is as follows:

I_{ID}(x,y)=s_{b}\times I(x,y),\quad\forall s_{b}\sim\cup(a,b)

(40)

where $I_{ID}$ is the modified image after applying varying illumination. This factor determines the extent of brightness reduction. A lower value results in a darker image.

6.1.2 Contrast Degradation

High contrast simulates images with intense differences between light and dark areas. This effect is achieved by adjusting the pixel values to increase the contrast.The equation for increasing contrast is:

I_{CD}(x,y)=\alpha\times I(x,y)+\beta,\quad\forall\alpha\sim U(a,b),\quad\beta=m

(41)

By multiplying the pixel values by alpha and adding beta, the contrast of the image is increased while also adjusting its brightness as shown in Fig 23. This results in an image $I_{CD}(x,y)$ with intensified differences between light and dark areas, creating a high contrast effect.

6.1.3 Hazy Degradation

The hazy effect simulates the presence of haze or fog in the image. It is achieved by adding a semi-transparent haze layer over the original image. The mathematical expression for applying the haze effect to the image is:

I_{DH}(x,y)=(1-\gamma)\times I(x,y)+\gamma\times\gamma_{c}(x,y),\quad\forall% \gamma\sim U(a,b),\quad\forall\gamma_{c}\sim U(l,m)

(42)

Here, $\gamma_{L}(x,y)$ creates a haze layer with the same dimensions as the degraded image, where each pixel is set to the randomly generated haze color.

Blend the original image and the haze layer using the formula above that generate $I_{DH}$ . The original image is multiplied by $(1-\gamma)$ to reduce its intensity, and the haze layer is multiplied by $\gamma$ to control the strength of the haze effect as shown in Fig 24. The resulting image represents the original scene with the added haze effect. This process mimics the visual appearance of images captured in hazy conditions, where distant objects appear less distinct due to scattering of light by haze particles in the atmosphere.

6.1.4 Blurry Degradation

The blurry effect is simulated using a Gaussian blur filter applied to the entire image. The equation for applying Gaussian blur to an image is as follows:

G(x,y)=\frac{1}{2\pi\sigma^{2}}\exp\left(-\frac{x^{2}+y^{2}}{2\sigma^{2}}\right)

(43)

where $(x,y)$ are the coordinates in the kernel, and $\sigma$ is the standard deviation of the Gaussian distribution. The Gaussian kernel $G(x,y)$ is typically normalized so that the sum of all elements equals 1. The convolution operation between the input image $I$ and the Gaussian kernel $G$ is denoted by $I*G$ . It’s defined as:

(I*G)(x,y)=\sum_{i=-k}^{k}\sum_{j=-l}^{l}I(i,j)\cdot G(x-i,y-j)

(44)

where $(x,y)$ are the coordinates of the output pixel, $(i,j)$ are the coordinates of the input pixel, and $I(i,j)$ is the intensity of the input pixel. The convolution operation involves sliding the Gaussian kernel over each pixel of the input image and computing a weighted sum of pixel intensities in the neighborhood defined by the kernel as shown in Fig 25. $I_{DB}(x,y)$ is blurred image and defined as:

I_{DB}(x,y)=(I*G)(x,y)

(45)

The GaussianBlur function convolves the image with a $G(x,y)$ to compute the blurred result. The standard deviation of the $G(x,y)$ is implicitly determined by the kernel size. Overall, the Gaussian blur operation smooths out the sharp transitions between pixel values in the input image, resulting in a blurred version of the original image. The degree of blurring is controlled by the standard deviation $\sigma$ of the Gaussian kernel, with larger values of $\sigma$ resulting in more significant blurring.

6.1.5 Noisy Degradation

It is modeled as Gaussian (normal) distribution and is added to the pixel values of the image to simulate the effect of random fluctuations in the image acquisition process or transmission. Mathematically, Gaussian noise $\mathcal{N}(x,\mu,\sigma)$ can be expressed as:

\mathcal{N}(x,y;\mu,\sigma)=\frac{1}{2\pi\sigma^{2}}e^{-\frac{(x^{2}+y^{2})^{2% }}{2\sigma^{2}}}

(46)

$x$ is the random variable representing the noise amplitude. $\mu$ is the mean (average) of the distribution, indicating the central tendency of the noise values. It’s typically set to 0 for zero-mean noise. $\sigma$ is the standard deviation of the distribution, which controls the spread or variability of the noise values around the mean. It determines the scale of the noise.

Additive Gaussian noise $\mathcal{N}(0,1)$ is the pixel-wise Gaussian noise at coordinates (x,y), drawn from a Gaussian distribution with mean $\mu=0$ and standard deviation $\sigma=1$ . The degraded image $I_{ND}(x,y)$ resulting from the addition of Gaussian noise to the original image, can be mathematically represented as: The function $I_{ND}(x,y)$ is defined as:

I_{ND}(x,y)=\begin{cases}0&\text{if }I(x,y)+N(x,y)<0\\ I(x,y)+N(x,y)&\text{if }0\leq I(x,y)+N(x,y)\leq 255\\ 255&\text{if }I(x,y)+N(x,y)>255\end{cases}

(47)

where $N(x,y)\sim\mathcal{N}(0,\sigma^{2})$ .

This equation illustrates how additive Gaussian noise alters the pixel values of the original image, resulting in a degraded image with stochastic fluctuations that mimic real-world imaging artifacts. Adjusting the standard deviation parameter $\sigma$ controls the intensity and spread of the noise, influencing the perceptual quality of the degraded image as shown in Fig 26.

6.1.6 Color Balance Degradation

Consider an original image $I$ represented as a three-dimensional array where each pixel contains intensity values for red, green, and blue channels.

\begin{bmatrix}R_{11}&G_{11}&B_{11}\\ R_{12}&G_{12}&B_{12}\\ \vdots&\vdots&\vdots\\ R_{mn}&G_{mn}&B_{mn}\end{bmatrix}

where $R_{ij}$ , $G_{ij}$ and $B_{ij}$ represent the intensity values of red, green, and blue channels respectively at pixel (i,j), and m and n denote the dimensions of the image.

1. Reddish Tint: The reddish tint degradation replicates a deviation in color balance within the image, biasing the color distribution towards red tones. This deviation can occur due to several factors such as environmental lighting, white balance inaccuracies, or sensor characteristics. When a reddish tint afflicts an image, the prominence of red hues intensifies while the contributions of green and blue hues decrease. The blue and green channels are attenuated to induce a reddish tint, while the red channel remains unaltered.

\begin{bmatrix}R_{11}&0&0\\ R_{12}&0&0\\ \vdots&\vdots&\vdots\\ R_{mn}&0&0\end{bmatrix}

$R_{ij}$ represents the intensity value of the blue channel at pixel (i,j) and the $I_{\text{reddish}}(x,y,0)$ as:

I_{\text{reddish}}(x,y,0)=\begin{cases}0,&\text{if }I(x,y,0)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,0)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 0)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,0)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}

(48)

	$\displaystyle\forall\,\text{Factor}\sim\cup(a,b);$
	$\displaystyle I_{\text{reddish}}(x,y,1)=I(x,y,1);$
	$\displaystyle I_{\text{reddish}}(x,y,2)=I(x,y,2);$

2. Greenish Tint: The greenish tint degradation emulates an imbalance in color distribution within the image, favoring green hues. This effect can arise from various factors such as environmental lighting conditions, inaccuracies in white balance, or characteristics of the imaging sensor. When an image is affected by a greenish tint, the intensity of green hues is accentuated while the contributions of red and blue hues diminish. To introduce a greenish tint, the red and blue color channels are suppressed, while the green channel remains unaffected and $I_{\text{greenish}}(x,y,1)$ is defined as:

\begin{bmatrix}0&G_{11}&0\\ 0&G_{12}&0\\ \vdots&\vdots&\vdots\\ 0&G_{mn}&0\end{bmatrix}

$G_{ij}$ represents the intensity value of the blue channel at pixel (i,j).

I_{\text{greenish}}(x,y,1)=\begin{cases}0,&\text{if }I(x,y,1)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,1)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 1)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,1)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}

(49)

	$\displaystyle\forall\,\text{Factor}\sim\cup(a,b);$
	$\displaystyle I_{\text{greenish}}(x,y,0)=I(x,y,0);$
	$\displaystyle I_{\text{greenish}}(x,y,2)=I(x,y,2);$

3. Bluish Tint : The bluish tint degradation emulates a color imbalance in the image, skewing the color distribution towards blue hues. This phenomenon can occur due to various factors such as lighting conditions, white balance inaccuracies, or sensor characteristics. When an image is affected by a bluish tint, the intensity of blue color dominance increases while the contribution of red and green colors diminishes. To introduce a bluish tint, the red and green color channels are attenuated, while the blue channel remains unchanged.

\begin{bmatrix}0&0&B_{11}\\ 0&0&B_{12}\\ \vdots&\vdots&\vdots\\ 0&0&B_{mn}\end{bmatrix}

$B_{ij}$ represents the intensity value of the blue channel at pixel (i,j).

I_{\text{bluish}}(x,y,2)=\begin{cases}0,&\text{if }I(x,y,2)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,2)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 2)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,2)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}

(50)

	$\displaystyle\forall\,\text{Factor}\sim\cup(a,b);$
	$\displaystyle I_{\text{bluish}}(x,y,0)=I(x,y,0);$
	$\displaystyle I_{\text{bluish}}(x,y,1)=I(x,y,1);$

Here, $I_{CB_{R,G,B}}$ is modified image with an amplified Red, blue, green hue. This process increases the intensity of Red, blue, green hues across the image, effectively introducing a bluish cast.

6.2 Dataset Distribution:

The tables provide statistics for your datasets, UIEB-D8 and EUVP-X-D8, detailing the distribution of images with eight types of degradation across different subcategories.

In the first table, the columns represent the datasets (UIEB and various EUVP subsets) and their respective total image counts. Each type of image degradation—Illumination, Contrast, Hazy, Blurry, Noisy, Reddish/Greenish/Bluish—is split into three subcategories (a, b, and c) for different type of degradation. The total number of images in each dataset is listed at the end.

For the UIEB dataset, which contains 890 images, the images are evenly distributed among the three subcategories (a, b, c) for each degradation type, with approximately 296-298 images per subcategory. The total count for all images across all degradation types is 5340.

The EUVP_P (U_Dark) dataset consists of 3138 images, with each subcategory within the degradation types having exactly 1046 images, leading to a total image count of 18,828.

The EUVP_P (U_ImageNet) dataset contains 3700 images, with each subcategory for the degradation types having exactly 1233-1234 images, making the total number of images 22,200.

The EUVP_P (U_Scenes) dataset has 2185 images, with each subcategory for the degradation types having exactly 728-729 images, resulting in a total of 13,110 images.

The EUVP_Un dataset includes 3140 images, with each subcategory for the degradation types having 1046-1048 images, totaling 18,840 images.

The second table summarizes the overall counts of referenced and degraded images: there are 13,053 referenced images and 78,318 degraded images. This data shows how images are categorized and distributed among different degradation types and their subcategories within your datasets, hel** to understand the distribution and quantity of images available for each type of degradation.

Table 4: Dataset statistics for various types of image degradation

Dataset	Illumination			Contrast			Hazy			Blurry			Noisy			Reddish/ Greenish/ Bluish			Total
	a	b	c	a	b	c	a	b	c	a	b	c	a	b	c	a	b	c
UIEB (890)	296	296	298	296	296	298	296	296	298	296	296	298	296	296	298	296	296	298	7120
EUVP_P (U_Dark) (3138)	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	1046	25,104
EUVP_P (U_ImageNet) (3700)	1233	1233	1234	1233	1233	1234	1233	1233	1234	1233	1233	1234	1233	1233	1234	1233	1233	1234	29,600
EUVP_P (U_Scenes) (2185)	728	728	729	728	728	729	728	728	729	728	728	729	728	728	729	728	728	729	17,480
EUVP_Un (3140)	1046	1046	1048	1046	1046	1048	1046	1046	1048	1046	1046	1048	1046	1046	1048	1046	1046	1048	25,120

This table summarizes the overall counts of referenced and degraded images as shown in Table 5 :

Table 5: Total Referenced and Degraded Images

Category	Count
Total Referenced Image	13,053
Total Degraded Image	1,04,424

7 Methodology

The proposed Iterative framework for Degradation Aware Underwater Image Enhancement (IDA-UIE) progressively enhances the input image. In each iteration, a degradation classifier network $\mathbf{\Phi}_{DC}$ identifies the dominant degradation condition in the image. Being degradation aware, helps in choosing the corresponding deep network for enhancing the image to remove the effect of degradation. The removal of the present dominant degradation might reveal the efect of another degradation. Thus, the output image is again processed in the next iteration for further enhancement. The process can be continued till (a) the classifier flags the absence of any degradation or (b) a maximum number of iterations are completed. This work uses the second criterion to limit the maximum number of floating point operations. Here, IDA-UIE is operated with a maximum of 3 iterations. The sub-networks used for degradation classification and sub-sequent enhancement are described next.

7.1 Design of Degradation Classification and Enhancement Networks

Degradation Classification Network – This network $\mathbf{\Phi}_{DC}$ identifies the category (one of eight classes) of dominant degradation in an underwater image. Additionally, it recognizes the absence of degradation. Thus, it is a $8+1=9$ categtory classifier. It is trained on the UIEB-D8 and EUVP-D8 datasets. The custom neural network in Fig 32 using PyTorch for image classification with a default of 9 classes. The network starts with an initial $3\times 3$ convolution layer, followed by two parallel paths: a $1\times 1$ convolution and a $3\times 3$ convolution. The outputs of these paths are concatenated and passed through another $1\times 1$ convolution layer. This result is added to the initial convolution output, similar to a residual connection.

Next, the network uses weighted average pooling to reduce the feature maps to a $1x1$ spatial dimension. The pooled features are then flattened and passed through a fully connected layer to produce the final classification output. This architecture combines convolutional layers, parallel processing paths, and pooling to effectively extract and classify features from the input image. The network architecture has a cascade of two modules in Fig 31, each containing $1\times 1$ and $3\times 3$ convolution kernels in parallel with residual connections. The convolution layer output is flattened and processed by fully connected layers for the final classification. A Winner-Take-All strategy is applied to select the dominant degradation. Accordingly, a suitable deep network is selected for image enhancement. This is an iterative process which checks for different degradations. If no degradation is detected, iteration stops. The network is shown in Figure 31 and Table 7.

7.1.1 Ablation Study

Training of Models:

In the first ablation study (Figure 32), we designed a network constructed using a single convolutional layer, represented as $CL(3\times 3\times 3@128;1,1)$ . This layer aims to extract and assimilate information from the input image. A LeakyReLU activation function follows the initial convolution operation, enhancing the network’s ability to capture non-linear features. Here, $CL(m\times n\times k@q;s,p)$ refers to $q$ number of $m\times n\times k$ convolution kernels with stride $s$ and padding $p$ , followed by LeakyReLU. Following this, we employed another layer, $CS(128\times 128\times 3@3;1,1)$ , which involves a convolution followed by a Sigmoid activation to produce the final output. Despite these efforts, the results shown in Table 8 indicate that this architecture did not yield satisfactory performance.

For the second ablation study (Figure 33), we expanded the network by incorporating additional convolutional layers, specifically $CL(3\times 3\times 3@128;1,1)$ and $CL(128\times 128\times 3@256;1,1)$ . These layers were designed to further extract and assimilate information from the input image. As before, a LeakyReLU activation function was applied after each convolution operation to enhance feature capture. Subsequently, we added a layer $CS(256\times 256\times 3@3;1,1)$ , involving a convolution followed by a Sigmoid activation. However, despite these modifications, the results presented in Table LABEL:tab:2nd_Ablation still showed that the architecture did not achieve satisfactory performance.

In the third ablation study (Figure 34), we explored the use of fully connected layers ( $FCL$ ) encapsulated with LeakyReLU activations. The output from this setup was connected to a fully connected layer with Sigmoid activation ( $FCS$ ) and then unflattened to reconstruct the enhanced image matching the original image dimensions $(h\times w\times 3)$ . While fully connected layers are capable of modeling complex, non-linear transformations necessary to enhance various degradations, the results in Table 10 indicated that this approach also did not yield satisfactory performance.

In the fourth ablation study (Figure 35), we designed a deep network $\mathbf{\Phi}_{IC}$ specifically for enhancing images with low illumination. This network is constructed through a cascade of two fully connected layers ( $FCL$ ) encapsulated with LeakyReLU. The output from this setup is connected to a fully connected layer with Sigmoid activation ( $FCS$ ) and then unflattened to match the original image dimensions $(h\times w\times 3)$ . Low illumination correction often involves adjusting the overall brightness and contrast of the image, which can be efficiently learned by fully connected layers as they consider all pixel values at once. Fully connected layers can model complex, non-linear transformations that might be needed to enhance the illumination of the entire image, especially when the correction requires considering the entire image context. This approach has shown good results in illumination correction, as evidenced in Table 11.

The illumination-specific network demonstrated promising results, guiding the development of more advanced network structures that integrate the strengths of fully connected layers.

In the fifth ablation study, we constructed a network (Figure 37) using a convolutional layer represented as $CL(3\times 3\times 3@64;1,1)$ . This layer aims to extract and assimilate information from the input image. Following the convolution operation, a LeakyReLU activation function is applied. Subsequently, a transposed convolutional layer, denoted as $CTS(64\times 64\times 3@3;1,1)$ , is employed, which includes a Sigmoid activation. Despite these efforts, the results shown in Table 12 indicate that this architecture did not yield satisfactory performance.

In the sixth ablation study, we further refined the network $\mathbf{\Phi}_{CB_{RGB}}$ (Figure 37) to address color imbalances in images. This network is constructed through a cascade of convolutional layers: $CL(3\times 3\times 3@64;1,1)$ and $CL(3\times 3\times 64@64;1,1)$ . These layers aim to extract and assimilate information from the input image, with a LeakyReLU activation function applied after each convolution operation. Following this, a transposed convolutional layer, $CTS(64\times 64\times 3@3;1,1)$ , is employed, incorporating a Sigmoid activation as shown in Table 13.

The $\mathbf{\Phi}_{CB_{RGB}}$ network is particularly well-suited for color correction tasks due to its simple yet effective architecture. It leverages convolutional operations to capture local color relationships and uses non-linear activations to learn complex color map**s. The transposed convolutional layer helps maintain the original image resolution. This end-to-end learning approach makes the network adaptable to various color correction challenges, demonstrating its potential in addressing color imbalances effectively.

Through these ablation studies, we observed the varying efficacy of different architectures in tackling specific image enhancement challenges. While some architectures did not perform satisfactorily, the insights gained guided the refinement and development of more advanced network structures capable of addressing complex underwater image enhancement tasks.

In the seventh ablation study, the deep network, as shown in Figure 38, utilizes an encoder-decoder architecture. The encoder includes convolutional layers $CL(3\times 3\times 3@32;1,1)$ and one fewer $CL(3\times 3\times 32@64;2,1)$ layer, reducing the parameter count. After encoding, the feature map is flattened and passed through a fully connected layer with Sigmoid activation ( $FCS$ ), reducing the dimensionality to 100. The decoder reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using $FCS$ , and a Sigmoid activation function scales the pixel values between 0 and 1, restoring the image to its original dimensions $(h\times w\times 3)$ . Despite these efforts, the results, as shown in Table 14, indicate unsatisfactory performance.

The advantages of using convolution layers in the encoder include:

1.

Spatial Hierarchies: The convolutional encoder captures essential spatial features, making it effective for dehazing tasks that rely on understanding spatial dependencies.
2.

Compact Representation: The output of convolution layers represents a high-level, compact representation of the input image, transforming it into a manageable latent space that facilitates efficient reconstruction.
3.

Detail Preservation: Fully connected layers alone were insufficient for capturing the spatial details required for high-quality dehazing and contrast correction, leading to suboptimal performance.

In the eighth ablation study, the deep network (Figure 39 and Table 15) employs an encoder-decoder framework. The encoder includes convolutional layers $CL(3\times 3\times 3@32;1,1)$ . This network includes one fewer $CL(3\times 3\times 32@64;2,1)$ layer, reducing the parameter count. Once the image is encoded, the feature map is flattened and passed through a fully connected layer with Sigmoid activation ( $FCS$ ), reducing the dimensionality to 100. The decoder then reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. The latent vector (size 100) is initially expanded to 500 dimensions using $FCS$ , followed by a Sigmoid activation function to scale the pixel values between 0 and 1, restoring the output to the original image dimensions $(h\times w\times 3)$ . However, the results shown in Table 15 indicate unsatisfactory performance.

In the ninth ablation study, the deep network $\mathbf{\Phi}_{DB}$ (Figure 40 and Table 16) gives satisfactory result for blurry dataset, realized in an encoder-decoder framework. The encoder includes convolutional layers $CL(3\times 3\times 3@32;1,1)$ . This network includes one fewer $CL(3\times 3\times 32@64;2,1)$ layer, reducing the parameter count. Once the image is encoded, the feature map is flattened and passed through a fully connected layer ( $FCL$ ), reducing the dimensionality to 100. The decoder then reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using $FCL$ , followed by another fully connected layer that adjusts the output to the original image dimensions $(h\times w\times 3)$ . Finally, a Sigmoid activation function is applied to scale the pixel values between 0 and 1, ensuring the deblurred image maintains proper intensity levels.

Deep Network for Dehazing and Enhancing High-contrast Images – Ideally, two networks, $\mathbf{\Phi}_{DH}$ for dehazing and $\mathbf{\Phi}_{CE}$ for contrast enhancement, would be designed to handle these respective tasks. However, the $\mathbf{\Phi}_{DH}$ network often produces noisy images for high contrast and hazy datasets. To address this, we have designed a single network, $\mathbf{\Phi}_{DHCE}$ , which performs both dehazing and contrast enhancement (shown in Figure 41 and Table 17).

The model architecture comprises convolutional layers ( $CL$ ) in the encoder. The initial layer is represented as $CL(3\times 3\times 3@32;1,1)$ , where each convolution ( $C$ ) is paired with LeakyReLU ( $L$ ) for non-linearity. This encoder downsamples the input image, producing a compressed latent representation that retains essential details while eliminating noise. This representation is further refined with two additional layers: $CL(3\times 3\times 32@64;1,1)$ and $CL(3\times 3\times 64@128;1,1)$ , each followed by LeakyReLU activation.

The encoded feature map is then flattened and passed through a fully connected layer ( $FCL$ ), reducing the dimensionality to 100. The decoder reconstructs the dehazed and enhanced image from this latent representation. The decoder includes fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using $FCL$ . This is followed by another fully connected layer, which expands the output to match the original image dimensions $(h\times w\times 3)$ . Finally, a sigmoid activation function scales the pixel values between 0 and 1, ensuring a properly reconstructed image that is both dehazed and enhanced.

In the eleventh ablation study (Figure 42 and Table 18), the network architecture begins with an encoder that vectorizes the incoming $h\times w\times c$ image. This vectorized image is then sequentially processed through four fully connected layers ( $FCL$ s). The input dimensionality is progressively reduced: starting from $h\times w\times c$ to 128 neurons in the first layer, then 64 neurons in the second layer, and finally 32 neurons in the third layer, creating a compact latent space representation. The decoder’s objective is to reconstruct the image from this compressed latent space, mirroring the encoder’s structure by expanding the 32-dimensional latent vector back to the original image size $h\times w\times c$ . A Sigmoid activation function is applied to the final output to ensure pixel values are scaled between 0 and 1.

The choice of a network comprising fully connected layers for this task is driven by several key reasons:

1.

Effective Noise Reduction: Fully connected layers in the encoder-decoder architecture inherently filter out noise during training. They achieve this by learning a compressed representation of the input data that focuses on essential features while disregarding noise, which tends to be less structured and influential in the learning process.
2.

Direct Feature Map**: Unlike convolutional layers that excel at capturing spatial hierarchies and local features, fully connected layers treat each pixel uniformly across the image. This uniform treatment allows them to effectively learn and map the relationship between noisy input and clean output without heavily relying on spatial dependencies.
3.

Compact Representation: By reducing the dimensionality of the input through sequential linear transformations in the encoder, the model learns to encapsulate relevant image features in a more condensed form. This latent representation tends to minimize noise components, leading to clearer and more refined reconstructions in the decoder.
4.

Flexibility and Reconstruction Quality: The fully connected layers in the decoder enable flexible and nonlinear reconstruction of the denoised image. This capability ensures that the model can generate smooth and visually appealing outputs by effectively filling in missing or distorted information caused by noise.
5.

Proven Effectiveness: Empirical evidence and research in image processing tasks, including denoising, demonstrate that fully connected autoencoders can achieve impressive results. This is also evident from our experiments on the datasets used. They significantly reduce noise levels while preserving important image details, making them a reliable choice for enhancing image quality.

Deep Network for Denoising – The deep network ( $\mathbf{\Phi}_{DN}$ ) (Figure 43 and Table 19) for denoising the data follows a similar architecture. The encoder vectorizes the incoming $h\times w\times c$ image and processes it through a sequence of four $FCL$ s. The input dimensionality is progressively reduced: from $h\times w\times c$ to 128 neurons in the first layer, 64 neurons in the second, 32 neurons in the third, and finally 16 neurons in the fourth layer, creating a compact latent space representation. The decoder then reconstructs the image from this compressed latent space, mirroring the encoder’s structure by expanding the 16-dimensional latent vector back to the original image size $h\times w\times c$ . Finally, a Sigmoid activation function is applied to the output, ensuring the pixel values are scaled between 0 and 1.

This figure Fig. 44 illustrates the selection of models based on their Peak Signal-to-Noise Ratio (PSNR) values. PSNR is a metric used to measure the quality of reconstructed images compared to their original versions, with higher values indicating better performance. The figure compares various models, highlighting how each performs in terms of PSNR, thereby guiding the selection of the most effective model for image quality enhancement.

8 Experimental Results and Discussion

Baseline Methods – The proposed approach IDA-UIE is benchmarked on the UIEB [14] and EUVP [13] datasets against nine state-of-art methods. IDA-UIE is compared with WaterNet [24], Fusion-based [23], MSSCE-GAN [49], Deep Wavenet [11] on UIEB dataset. IDA-UIE is compared with UGAN [6], UGAN-P [6], Funie-GAN [25], Funie-GAN-UP [25], Deep SESR [12], Deep WaveNet [11] on EUVP dataset.

Evaluation Metrics – This work has incorporated both reference and reference-less image quality metrics for quantitative performance analysis. The following evaluation metrics are used – Mean-Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), SSIM, Underwater Image Quality Measure (UIQM) [9], Natural Image Quality Evaluator (NIQE) [10], Patch-based Contrast Quality Index (PCQI) [8], Underwater Image Sharpness Measure (UISM) [9], Average Entropy (E), Average Gradient, Underwater Image Contrast Measure (UIConM) [9], and Underwater Color Image Quality Evaluation (UCIQE) [7]. Additionally, the sub-network sizes (parameters in millions) and associated floating point operations (in GFLOPs).

8.1 Quantitative Performance Analysis –

In terms of qualitative evaluation, presented the results obtained by applying the aforementioned methods to a single image from the UIEB and EUVP datasets and analyzed the histograms to assess the effects of enhancement. The degradation classifier was trained on the combined UIEB-D8 and EUVP-X-D8 datasets. It achieved an overall accuracy of $97.63\%$ . The first performance analysis experiment studies the proportion of images categorized in different dominant degradation conditions (or absence of degradation) for UIEB and EUVP test sets. The results are reported in Table 20 in terms of percentage for all three iterations. The second experiment evaluated the performance of the individual sub-networks. The image enhancement sub-networks were trained on the combined training subsets of UIEB-D8 and EUVP-UWD-D8 datasets. Their performances were validated on the combined test-sets of UIEB-D8 and EUVP-UWD-D8. The results of this experiment are reported in Table 21. The network sizes (parameters in millions), floating point operations (in GFLOPs) are reported along with enhancement performance (in terms of MSE and PSNR). The third experiment compares the performance of the proposed model IDA-UIE with four baseline approaches. The results are reported in terms of PSNR and SSIM in Table 3. The fourth experiment presents the comparative performance IDA-UIE and six state-of-art approaches. The results are reported in Table 23 in terms of eleven different evaluation metrics.

Table 20: The degradation classifier identifies the necessity of either of illumination correction (IC), contrast enhancement (CE), dehazing (DH), deblurring (DB), denoising (DN), color imbalance correction in red (CBR), green (CBG) or blue (CBG) channel. Additionally, it may also detect the case no further enhancement (NE). The table presents the proportion of images (reported in percentage) from the UIEB and EUVP test sets that are detected for the different kinds of dominant degradation correction (or NE) in all three iterations.

	IC		CE		DH		DB		DN		CBR		CBG		CBB		NE
	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP	UIEB	EUVP
Iteration 1	11.79%	12.04%	11.11%	11.77%	9.92%	11.69%	10.70%	11.71%	10.48%	11.76%	10.42%	12.36%	10.58%	11.71%	10.58%	11.61%	14.38%	5.37%
Iteration 2	11.11%	12.13%	11.04%	11.87%	10.86%	11.65%	10.70%	11.87%	10.86%	12.00%	11.01%	8.90%	10.04%	11.78%	10.98%	11.99%	13.35%	4.79%
Iteration 3	10.42%	18.56%	10.73%	11.58%	9.89%	11.78%	10.73%	11.67%	11.01%	11.85%	10.36%	11.51%	10.79%	11.78%	10.92%	11.75%	15.01%	6.11%

Table 21: Individual performance of the image enhancement sub-networks.The network parameters (in millions (M), floating point operations (in GFLOPs) and performance (in terms of MSE and PSNR) are reported.

Degradation	Parameter(M)	GFLOPs	MSE	PSNR
Bluish	0.04	0.166	0.00058	36.78
Reddish	0.04	0.166	0.00052	36.44
Greenish	0.04	0.166	0.00009	37.67
Noisy	50.55	0.050	4.65e-06	48.72
Contrast	151.07	0.822	0.00006	39.81
Blurry	203.43	0.571	0.00006	38.02
Illumination	50.54	0.050	4.88e-06	49.33
Hazy	151.07	0.822	0.00008	40.27

Table 22: Comparison of the proposed model IDA-UIE with four state-of-art approaches on UIEB dataset

Method	GFLOPs	PSNR	SSIM
WaterNet [24]	12.37	19.11	0.79
Fusion-based [23]	34.98	21.23	0.78
MSSCE-GAN [49]	192	21.62	0.81
Deep Wavenet [11]	18.15	21.68	0.80
IDA-UIE (ours)	16.83	28.87	0.90

8.2 Qualitative Performance Analysis –

The qualitative performance analysis of the proposed model IDA-UIE are presented in Figures 45 and 46. Sample images from UIEB and EUVP test-sets are progressively enhanced by correcting the dominant degradations in each iteration. The final output obtained after three iterations is visually compared against the ground-truth good quality image.

Table 23: Comparison of the proposed model IDA-UIE with state-of-art approaches on EUVP dataset in terms of different performance metrics.

Method	GFLOPs	MSE	PSNR	SSIM	UIQM	NIQE	PCQI	UISM	Entropy	AG	UIConM	UCIQE
UGAN [6]	143	0.36	26.55	0.80	2.89	49.90	0.700	6.84	7.52	7.48	0.79	0.581
UGAN-P [6]	143	0.36	26.54	0.80	2.93	50.17	0.704	6.83	7.54	7.58	0.79	.590
Funie-GAN [25]	70.34	0.39	26.22	0.79	2.97	50.51	0.706	6.90	7.55	8.58	0.84	0.590
Funie-GAN-UP [25]	70.34	0.60	25.22	0.78	2.93	52.87	0.702	6.86	7.80	7.80	0.79	0.588
Deep SESR [12]	30	0.34	27.08	0.80	3.09	55.68	0.679	7.06	7.40	7.57	0.78	0.572
Deep WaveNet [11]	18.15	0.29	28.62	0.83	3.04	44.89	0.694	7.06	7.38	7.00	0.77	0.559
IDA-UIE (ours)	16.83	0.0005	33.75	0.91	3.89	40.34	0.876	9.34	9.45	8.78	0.89	0.784

The plot Fig 47 displays the relationship between Peak Signal-to-Noise Ratio (PSNR) and frequency values, which are used to assess the quality of image enhancement methods. Higher PSNR values typically indicate better image quality. The region inside the red square highlights the failure cases, where the image enhancement method did not perform well. In these instances, the PSNR values are significantly lower, indicating that the enhanced images still contain substantial noise or distortion and thus fail to achieve the desired quality improvements. This analysis helps in identifying specific conditions or frequencies where the enhancement method needs further improvement.

9 Failure Case

Due to its severity, the model struggled to eliminate a specific type of degradation. As the enhancement process is sequential, with iteration 2 and iteration 3 depending on the results of iteration 1, any shortcomings in the initial enhancement adversely impact the subsequent iterations. Consequently, the failure to adequately enhance the image in the first iteration propagates through the sequence, leading to progressively degraded results. This issue is illustrated in Figure 48, where the cascading effect of the initial enhancement failure is evident in the overall quality of the enhanced images.

10 Conclusion

This paper presents an iterative framework for enhancing underwater images with degradation awareness, which identifies and enhances the dominant degradation condition using specific enhancement networks. Unlike single-network approaches, IDA-UIE progressively performs degradation-aware enhancements. A classifier identifies one of eight degradation types (including low illumination, low contrast, haziness, blur, noise, and color imbalances), or no degradation, and deploys the corresponding enhancement network. Trained on condition-specific degradations applied to UIEB and EUVP datasets, IDA-UIE outperforms nine state-of-the-art methods on eleven evaluation metrics.

This framework can also be adapted for general image enhancement problems by incorporating condition classifiers and specific enhancement sub-networks, with future research focusing on designing lightweight networks for each component.

References

[1] Pooja Sahu, Neelesh Gupta, and Neetu Sharma. A survey on underwater image enhancement techniques. International Journal of Computer Applications, 87(13), 2014.
Raveendran et al. [2021] S. Raveendran, M. D. Patil, and G. K. Birajdar, "Underwater image enhancement: a comprehensive review, recent trends, challenges and applications," Artificial Intelligence Review, vol. 54, pp. 5413-5467, 2021.
[3] Oscar C. Au, Lin Sun, Ruobing Zou, Wei Dai, and Si** Li. An improved method for color images enhancement considering HVS. In 2012 International Conference on Audio, Language and Image Processing, pages 117–122. IEEE, 2012.
[4] Z. A. Hasibuan, P. N. Andono, D. Pujiono, R. I. M. Setiadi, et al. Contrast limited adaptive histogram equalization for underwater image matching optimization use SURF. In Journal of Physics: Conference Series, volume 1803, number 1, page 012008. IOP Publishing, 2021.
[5] Achmad Basuki and Nana Ramadijanti. Improving auto level method for enhancement of underwater images. In 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC), pages 120–125. IEEE, 2016.
Fabbri et al. [2018] Fabbri, Cameron, Islam, Md Jahidul, and Sattar, Junaed. Enhancing underwater imagery using generative adversarial networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7159-7165. IEEE, 2018.
Yang and Sowmya [2015] Yang, Miao, and Sowmya, Arcot. An underwater color image quality evaluation metric. IEEE Transactions on Image Processing, 24(12):6062-6071, 2015.
Wang et al. [2015] Wang, Shiqi, Ma, Kede, Yeganeh, Hojatollah, Wang, Zhou, and Lin, Weisi. A patch-structure representation method for quality assessment of contrast changed images. IEEE Signal Processing Letters, 22(12):2387-2390, 2015.
Panetta et al. [2015] Panetta, Karen, Gao, Chen, and Agaian, Sos. Human-visual-system-inspired underwater image quality measures. IEEE Journal of Oceanic Engineering, 41(3):541-551, 2015.
Mittal et al. [2012] Mittal, Anish, Soundararajan, Rajiv, and Bovik, Alan C. Making a "completely blind" image quality analyzer. IEEE Signal Processing Letters, 20(3):209-212, 2012.
Sharma et al. [2023] Sharma, Prasen, Bisht, Ira, and Sur, Arijit. Wavelength-based attributed deep neural network for underwater image restoration. ACM Transactions on Multimedia Computing, Communications and Applications, 19(1):1-23, 2023.
Islam et al. [2020] Islam, Md Jahidul, Luo, Peigen, and Sattar, Junaed. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv preprint arXiv:2002.01155, 2020.
Islam et al. [2020] M. J. Islam, Y. Xia, and J. Sattar, "Fast Underwater Image Enhancement for Improved Visual Perception," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3227-3234, 2020.
Li et al. [2019] C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, "An underwater image enhancement benchmark dataset and beyond," IEEE Transactions on Image Processing, vol. 29, pp. 4376-4389, 2019.
Yi et al. [2024] X. Yi, Q. Jiang, and W. Zhou, "No-reference quality assessment of underwater image enhancement," Displays, vol. 81, pp. 102586, 2024.
Li et al. [2024] Y. Li, D. Li, Z. Gao, S. Wang, Q. Jiao, and others, "Underwater image enhancement utilizing adaptive color correction and model conversion for dehazing," Optics & Laser Technology, vol. 169, pp. 110039, 2024.
Hu et al. [2024] S. Hu, Z. Cheng, G. Fan, M. Gan, and C. L. P. Chen, "Texture-aware and color-consistent learning for underwater image enhancement," Journal of Visual Communication and Image Representation, vol. 98, pp. 104051, 2024.
Xiao et al. [2024] S. Xiao, X. Shen, Z. Zhang, J. Wen, M. Xi, and J. Yang, "Underwater image classification based on image enhancement and information quality evaluation," Displays, vol. 82, pp. 102635, 2024.
Zheng et al. [2024] R. Zheng, J. Miao, H. Zhang, X. Liu, and D. Tan, "An illumination adaptive underwater image enhancement method," in International Conference on Algorithm, Imaging Processing, and Machine Vision (AIPMV 2023), vol. 12969, pp. 442-449, 2024.
Mao et al. [2016] X. Mao, C. Shen, and Y.-B. Yang, "Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections," Advances in Neural Information Processing Systems, vol. 29, 2016.
Sun et al. [2019] X. Sun, L. Liu, Q. Li, J. Dong, E. Lima, and R. Yin, "Deep pixel-to-pixel network for underwater image enhancement and restoration," IET Image Processing, vol. 13, no. 3, pp. 469-474, 2019.
Kumar [2020] V. S. Kumar, "An Underwater Image Dehazing Method using Dark Channel Prior," Journal, vol. XX, pp. XX-XX, 2020.
[23] Cosmin Ancuti, Codruta Orniana Ancuti, Tom Haber, and Philippe Bekaert. Enhancing underwater images and videos by fusion. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 81–88. IEEE, 2012.
[24] Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, and Dacheng Tao. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Transactions on Image Processing, 29:4376–4389, 2020.
[25] Md Jahidul Islam, Youya Xia, and Junaed Sattar. Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters, 5(2):3227–3234, 2020.
Yang et al. [2020] M. Yang, K. Hu, Y. Du, Z. Wei, Z. Sheng, and J. Hu, "Underwater image enhancement based on conditional generative adversarial network," Signal Processing: Image Communication, vol. 81, pp. 115723, 2020.
Hu et al. [2022] K. Hu, C. Weng, Y. Zhang, J. **, and Q. Xia, "An overview of underwater vision enhancement: from traditional methods to recent deep learning," Journal of Marine Science and Engineering, vol. 10, no. 2, pp. 241, 2022.
Wang et al. [2020] Z. Wang, X. Xue, L. Ma, and X. Fan, "Underwater image enhancement based on dual U-net," in 2020 8th International Conference on Digital Home (ICDH), pp. 141-146, 2020.
[29] W. N. J. H. W. Yussof, M. S. Hitam, E. A. Awalludin, and Z. Bachok. Performing contrast limited adaptive histogram equalization technique on combined color models for underwater image enhancement. International Journal of Interactive Digital Media, 1(1):1–6, 2013.
[30] Najmul Hassan, Sami Ullah, Naeem Bhatti, Hasan Mahmood, and Muhammad Zia. The Retinex based improved underwater image enhancement. Multimedia Tools and Applications, 80:1839–1857, 2021.
[31] Omer Deperlioglu, Utku Kose, and G. Emre Guraksin. Underwater image enhancement with HSV and histogram equalization. Image, 1(4):461–465, 2018.
[32] Raj S. M. Alex, S. Deepa, and M. H. Supriya. Underwater image enhancement using CLAHE in a reconfigurable platform. In OCEANS 2016 MTS/IEEE Monterey, pages 1–5. IEEE, 2016.
[33] Cosmin Ancuti, Codruta Orniana Ancuti, Tom Haber, and Philippe Bekaert. Enhancing underwater images and videos by fusion. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 81–88. IEEE, 2012.
[34] Kamil Zakwan Mohd Azmi, Ahmad Shahrizan Abdul Ghani, Zulkifli Md Yusof, and Zuwairie Ibrahim. Natural-based underwater image color enhancement through fusion of swarm-intelligence algorithm. Applied Soft Computing, 85:105810, 2019.
[35] Bastiaan J. Boom, Phoenix X. Huang, Cigdem Beyan, Concetto Spampinato, Simone Palazzo, Jiyin He, Emmanuelle Beauxis-Aussalet, Sun-In Lin, Hsiu-Mei Chou, Gayathri Nadarajan, et al. Long-term underwater camera surveillance for monitoring and analysis of fish populations. VAIB12, 2012.
[36] Diksha Garg, Naresh Kumar Garg, and Munish Kumar. Underwater image enhancement using blending of CLAHE and percentile methodologies. Multimedia Tools and Applications, 77:26545–26561, 2018.
[37] Evgin Goceri. Challenges and recent solutions for image segmentation in the era of deep learning. In 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), pages 1–6. IEEE, 2019.
[38] Evgin Goceri. Skin disease diagnosis from photographs using deep learning. In VipIMAGE 2019: Proceedings of the VII ECCOMAS Thematic Conference on Computational Vision and Medical Image Processing, October 16–18, 2019, Porto, Portugal, pages 239–246. Springer, 2019.
[39] Manuel Gonzalez-Rivero, Oscar Beijbom, Alberto Rodriguez-Ramirez, Dominic E. P. Bryant, Anjani Ganase, Yeray Gonzalez-Marrero, Ana Herrera-Reveles, Emma V. Kennedy, Catherine J. S. Kim, Sebastian Lopez-Marcano, et al. Monitoring of coral reefs using artificial intelligence: A feasible and cost-effective approach. Remote Sensing, 12(3):489, 2020.
[40] Minjun Hou, Risheng Liu, Xin Fan, and Zhongxuan Luo. Joint residual learning for underwater image enhancement. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 4043–4047. IEEE, 2018.
[41] Kashif Iqbal, Michael Odetayo, Anne James, Rosalina Abdul Salam, and Abdullah Zawawi Hj Talib. Enhancing the low quality images using unsupervised colour correction method. In 2010 IEEE International Conference on Systems, Man and Cybernetics, pages 1703–1709. IEEE, 2010.
[42] Md Jahidul Islam, Youya Xia, and Junaed Sattar. Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters, 5(2):3227–3234, 2020.
[43] Geir Johnsen, Martin Ludvigsen, Asgeir Sørensen, and Lars Martin Sandvik Aas. The use of underwater hyperspectral imaging deployed on remotely operated vehicles-methods and applications. IFAC-PapersOnLine, 49(23):476–481, 2016.
[44] Chongyi Li, Jichang Guo, Chunle Guo, Runmin Cong, and Jiachang Gong. A hybrid method for underwater image correction. Pattern Recognition Letters, 94:62–67, 2017.
[45] Dawei Li, Lihong Xu, and Huanyu Liu. Detection of uneaten fish food pellets in underwater images for aquaculture. Aquacultural Engineering, 78:85–94, 2017.
[46] Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik. No-reference image quality assessment based on spatial and spectral entropies. Signal Processing: Image Communication, 29(8):856–863, 2014.
[47] Jianru Li and Yujie Li. Underwater image restoration algorithm for free-ascending deep-sea tripods. Optics & Laser Technology, 110:129–134, 2019.
[48] Sanparith Marukatat. Image enhancement using local intensity distribution equalization. EURASIP Journal on Image and Video Processing, 2015(1):1–18, 2015.
[49] Lingxin Zhang, Youkun Chen, Jie Lan, and Yuzhen Niu. MSSCE-GAN: Multi-Scale Structural and Color Enhanced Generative Adversarial Network for Unpaired Underwater Image Enhancement. In 2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC), pages 837–841. IEEE, 2023.
[50] Cameron Fabbri, Md Jahidul Islam, and Junaed Sattar. Enhancing Underwater Imagery Using Generative Adversarial Networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7159–7165. IEEE, 2018.
[51] **gyu Lu, Na Li, Shaoyong Zhang, Zhibin Yu, Haiyong Zheng, and Bing Zheng. Multi-scale adversarial network for underwater image restoration. Optics & Laser Technology, 110:105–113, 2019.

Degradation Type	F1_Score
No Degradation	0.824
Bluish	0.757
Blurry	0.869
Contrast	0.802
Greenish	0.924
Hazy	0.816
Illumination	0.820
Noisy	0.843
Reddish	0.826
GFLOPs	1.7448
Number of Parameters	0.0280
Test Accuracy	80.14%

Degradation Type	F1 Score
No Degradation	0.9350
Bluish	0.9907
Blurry	0.9898
Contrast	0.9049
Greenish	0.9980
Hazy	0.9554
Illumination	0.9936
Noisy	0.9929
Reddish	0.9827
GFLOPs	15.1666
Number of Parameters	0.2250 M
Test Accuracy	97.63%

Degradation Type	MSE	PSNR
Illumination	0.01	24.97 dB
Hazy	2.06	10.85 dB
Blurry	0.23	23.85 dB
Noisy	0.71	18.36 dB
Contrast	0.501	21.86 dB
Color Balance	0.005	29.78 dB
GFLOPs: 0.4530	Number of Parameters: 0.0070 M

Degradation Type	MSE	PSNR
Illumination	0.06	22.57 dB
Hazy	2.81	10.95 dB
Blurry	0.65	20.85 dB
Noisy	0.35	22.53 dB
Contrast	0.767	21.51 dB
Color Balance	0.01	25.98 dB
GFLOPs: 20.0068	Number of Parameters: 0.3057 M

Degradation Type	MSE	PSNR
Illumination	3.09	14.07 dB
Hazy	3.90	14.08 dB
Blurry	3.90	14.08 dB
Noisy	3.91	14.07 dB
Contrast	3.91	14.07 dB
Color Balance	3.91	14.07 dB
GFLOPs 0.0503	Number of Parameters: 50.52 M

Degradation Type	MSE	PSNR
Color Balance	0.010	17.71 dB
Hazy	0.017	17.08 dB
Blurry	0.010	18.62 dB
Noisy	0.010	17.71 dB
Contrast	0.014	18.28 dB
Illumination	5.41e-06	52.66 dB
GFLOPs: 0.0503	Number of Parameters: 50.54 M

Degradation Type	MSE	PSNR
Illumination	0.04	24.57 dB
Hazy	2.06	10.85 dB
Blurry	0.82	20.85 dB
Noisy	0.23	22.36 dB
Contrast	0.567	21.64 dB
Color Balance	0.06	23.28 dB
GFLOPs	0.4530	Number of Parameters: 0.0070 M

Degradation Type	MSE	PSNR
Illumination	0.05	24.27 dB
Hazy	1.02	17.08 dB
Blurry	0.01	22.62 dB
Noisy	0.019	22.35 dB
Contrast	0.507	19.46 dB
Color Balance	0.000035	36.45 dB
GFLOPs	0.166	Number of Parameters: 0.4045 M

Degradation Type	MSE	PSNR
Illumination	0.02	22.43 dB
Hazy	37.9	4.20 dB
Color Balance	5.79	8.79 dB
Noisy	5.91	12.27 dB
Contrast	38.9	4.09 dB
Blurry	2.58	10.88 dB
GFLOPs	0.088	Number of Parameters: 72.29 M

Degradation Type	MSE	PSNR
Illumination	0.064	28.32 dB
Hazy	0.165	25.82 dB
Color Balance	0.100	26.98 dB
Noisy	0.130	26.38 dB
Contrast	0.176	25.89 dB
Blurry	0.071	28.42 dB
GFLOPs	0.138	Number of Parameters: 46.0913 M

Degradation Type	MSE	PSNR
Illumination	0.001	38.55 dB
Hazy	0.00276	34.04 dB
Color Balance	0.0014	38.45 dB
Noisy	0.0002	39.05 dB
Contrast	0.001	38.67 dB
Blurry	4.06e-05	40.02 dB
GFLOPs	0.571	Number of Parameters: 203.430 M

Degradation Type	MSE	PSNR
Color Balance	$3.185\times 10^{-5}$	40.96 dB
Blurry	$2.098\times 10^{-5}$	38.92 dB
Noisy	$2.28\times 10^{-5}$	40.38 dB
Illumination	$2.18\times 10^{-5}$	42.46 dB
Contrast	$1.49\times 10^{-5}$	46.39 dB
Hazy	$1.16\times 10^{-5}$	46.03 dB
GFLOPs	0.822	Number of Parameters: 151.07 M

Degradation Type	MSE	PSNR
Illumination	4.14	13.82 dB
Hazy	1.02	12.78 dB
Blurry	0.07	23.82 dB
Color Balance	0.02	25.24 dB
Contrast	0.56	19.38 dB
Noisy	0.00056	36.72 dB
GFLOPs	0.05055	Number of Parameters: 50.54 M