IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement

[Uncaptioned image] Pranjali Singh
Centre for Intelligent Cyber Physical Systems, Indian Institute of Technology Guwahati,
[email protected]
&[Uncaptioned image] Prithwijit Guha
Dept. of Electronics and Electrical Engg., Indian Institute of Technology Guwahati,
[email protected]
Abstract

Underwater image quality is affected by fluorescence, low illumination, absorption, and scattering. Recent works in underwater image enhancement have proposed different deep network architectures to handle these problems. Most of these works have proposed a single network to handle all the challenges. We believe that deep networks trained for specific conditions deliver better performance than a single network learned from all degradation cases. Accordingly, the first contribution of this work lies in the proposal of an iterative framework where a single dominant degradation condition is identified and resolved. This proposal considers the following eight degradation conditions – low illumination, low contrast, haziness, blurred image, presence of noise and color imbalance in three different channels. A deep network is designed to identify the dominant degradation condition. Accordingly, an appropriate deep network is selected for degradation condition-specific enhancement. The second contribution of this work is the construction of degradation condition specific datasets from good quality images of two standard datasets (UIEB and EUVP). This dataset is used to learn the condition specific enhancement networks. The proposed approach is found to outperform nine baseline methods on UIEB and EUVP datasets.

Keywords Image Enhancement  \cdot Deep Neural Network  \cdot Underwater Image Enhancement

1 Introduction

Poor visibility conditions in the world’s oceans have limited our understanding of these environments. To address this challenge, underwater image enhancement techniques are employed [1]. With approximately 70% of the Earth’s surface covered by water, there is increasing interest in exploring underwater realms. Clear images are essential for monitoring marine species, underwater mountains, and plants. Additionally, the effects of color in underwater images are significant. Light reflection varies greatly depending on the sea’s structure, with water capable of bending light to create crinkle patterns or diffusing it. The quality of underwater photos is influenced by several factors, including restricted visibility range, uneven lighting, unwanted noise, and reduced color fidelity [2].

1.1 Application

Underwater image enhancement has numerous practical applications in various fields, including oceanography, underwater archaeology, underwater robotics, underwater exploration, and more [3]. Some specific applications are outlined below:

  1. 1.

    Marine Life: Underwater image enhancement aids in the identification and tracking of marine life, such as fish, corals, and other organisms. This is crucial for scientific research on the health and behavior of underwater ecosystems [2].

  2. 2.

    Oceanography: Enhanced underwater images improve the study of ocean currents, tides, and underwater topography.

  3. 3.

    Underwater Archaeology: Enhancing images of submerged structures, like shipwrecks, assists in identifying and studying historical artifacts and structures.

  4. 4.

    Underwater Security and Surveillance: The accuracy and effectiveness of security and surveillance systems in underwater environments are enhanced through underwater image enhancement. This aids in detecting and tracking intruders, suspicious objects, and potential threats to underwater infrastructure, such as pipelines and oil rigs [2].

  5. 5.

    Underwater Robotics: Underwater robots, such as remotely operated vehicles (ROVs) and autonomous underwater vehicles (AUVs), are equipped with cameras and sensors for object detection and navigation. Underwater image enhancement improves the quality of images captured by these sensors, facilitating the detection and tracking of marine life, underwater structures, and potential hazards.

  6. 6.

    Underwater Photography and Videography: Enhancing the quality of underwater images and videos makes them more appealing to audiences and enhances the immersive experience. This is particularly important for promoting dive sites and other underwater attractions.

  7. 7.

    Underwater Map** and Navigation: Image enhancement increases the accuracy and detail of maps and navigation systems used for underwater exploration and research, aiding in the discovery and exploration of new dive sites and other underwater environments.

  8. 8.

    Underwater Tourism through Virtual Reality: Enhancing the quality of images and videos used for virtual reality (VR) experiences provides a more immersive and realistic experience for users, enabling safe and realistic exploration of underwater environments [4].

Improving the quality of underwater images can significantly impact our understanding of the underwater world and enhance our ability to explore and interact with it [2].

Refer to caption
Figure 1: Application areas of underwater image processing, highlighting its critical roles in marine life identification, oceanography, underwater archaeology, security and surveillance, robotics, photography and videography, map** and navigation, and virtual reality tourism [5].

1.2 Challenges

Light attenuation refers to the reduction in light intensity as it travels through a medium, resulting from absorption, scattering, and reflection by particles and molecules within that medium. The degree of light attenuation is influenced by the medium’s properties, such as its composition, density, and scattering characteristics.

In water, light attenuation is significantly greater than in air due to the higher density and greater concentration of particles and molecules. Water molecules, suspended particles, and dissolved substances like salts and organic matter all contribute to the attenuation of light, as illustrated in Fig 2.

The extent of light attenuation in water varies with the wavelength of the light. Shorter wavelengths, such as blue and green light, are attenuated more strongly than longer wavelengths, such as red and infrared light. This phenomenon explains why objects underwater appear bluer and darker compared to their appearance in air; blue light, having a shorter wavelength, is absorbed and scattered more than red light, which has a longer wavelength.

In contrast, light attenuation in air is much lower due to the lower density and smaller concentration of particles and molecules in the atmosphere. However, atmospheric conditions like fog, haze, and pollution can also contribute to light attenuation, especially for longer wavelengths of light, such as red and infrared.

Refer to caption
Figure 2: Challenges in underwater imaging include significant light attenuation due to absorption, scattering, and reflection by water molecules, suspended particles, and dissolved substances. The attenuation varies with wavelength, causing shorter wavelengths like blue and green to be absorbed and scattered more than longer wavelengths like red. Additionally, underwater images are affected by fluorescence, non-uniform illumination, and reduced visibility, making it essential to enhance image quality for better exploration and study of underwater environments [5].

The underwater environment encompasses areas submerged in water, whether in natural or artificial bodies such as oceans, seas, reservoirs, rivers, or aquifers. It is the cradle of life on Earth and is vital for sustaining diverse life forms, serving as a natural habitat for numerous organisms. Many human activities occur within accessible regions of the underwater environment. Consequently, understanding the characteristics of the underwater imaging model is essential for conducting research across various fields [33].

1.2.1 Absorption and Scattering

The Lambert-Beer empirical law states that the decay in light intensity depends on the properties of the medium through which it travels. In water, light intensity decays exponentially through a process known as attenuation. Attenuation results from the combined effects of absorption and scattering, leading to a loss of light energy and a change in the direction of electromagnetic energy. This attenuation poses a significant challenge for underwater imaging by creating a hazy effect that complicates image processing applications in marine environments. In clear water, attenuation limits visibility to approximately 20 meters, whereas in turbid water, visibility is reduced to only 5 meters. Additionally, light absorption in water varies with wavelength; as depth increases, different colors of light are absorbed at different rates. Red, with the longest wavelength, is absorbed first, while blue, with the shortest wavelength, penetrates the farthest, resulting in a bluish tint in underwater images as shown in Fig 3.

Refer to caption
Refer to caption
Figure 3: Light attenuation in underwater environments, illustrating the exponential decay of light intensity due to absorption and scattering. The diagram shows how red light, with the longest wavelength, is absorbed first, while blue light, with the shortest wavelength, penetrates the farthest, resulting in a bluish tint in underwater images [2]

In an underwater medium, the presence of dust particles leads to scattering phenomena. When light reflects off an object’s external surface and reaches the camera, it interacts with the floating particles in the medium, causing a scattering effect. There are two types of scattering that affect underwater images: forward scattering and backward scattering as shown in Fig 4.

Refer to caption
Figure 4: Absorption and scattering in underwater environments, showing how light interacts with floating particles. The diagram illustrates the effects of forward scattering and backward scattering on the visibility and clarity of underwater images [2].

The model is based on the principles of linear superposition and the water medium modeling defined in the Jaffe–McGlamery model [2]. The irradiance entering the camera is a linear combination of three distinct components: the direct component (Edsubscript𝐸𝑑E_{d}italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT), the forward-scattered component (Efsubscript𝐸𝑓E_{f}italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT), and the backscatter component (Ebsubscript𝐸𝑏E_{b}italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT). The total irradiance (ETsubscript𝐸𝑇E_{T}italic_E start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT) can be expressed as follows:

ET=Ed+Ef+Ebsubscript𝐸𝑇subscript𝐸𝑑subscript𝐸𝑓subscript𝐸𝑏E_{T}=E_{d}+E_{f}+E_{b}italic_E start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT (1)

The direct component, denoted as Edsubscript𝐸𝑑E_{d}italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, refers to the light that is reflected by an object and reaches the camera without undergoing any scattering. Forward scatter, represented by Efsubscript𝐸𝑓E_{f}italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, occurs when the light reflected from an object scatters in its direction before reaching the camera. In contrast, backscatter happens when the light scatters directly towards the camera after reflecting off particles in the water. These models are often used for image restoration but require high-speed computations and longer execution times.

Edsubscript𝐸𝑑E_{d}italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT signifies the light that is directly reflected by the object without any scattering in the water. This component is particularly beneficial for underwater imaging and can be expressed as:

Ed(x,y)=E(x,y)ecd(x,y)subscript𝐸𝑑𝑥𝑦𝐸𝑥𝑦superscript𝑒𝑐𝑑𝑥𝑦E_{d}(x,y)=E(x,y)e^{-cd(x,y)}italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_E ( italic_x , italic_y ) italic_e start_POSTSUPERSCRIPT - italic_c italic_d ( italic_x , italic_y ) end_POSTSUPERSCRIPT (2)

The expression E(x,y)𝐸𝑥𝑦E(x,y)italic_E ( italic_x , italic_y ) represents the irradiance at position (x,y)𝑥𝑦(x,y)( italic_x , italic_y ). The total attenuation coefficient (c) of the medium quantifies the combined effects of scattering and absorption on light loss within the medium. The variable d(x,y)𝑑𝑥𝑦d(x,y)italic_d ( italic_x , italic_y ) denotes the distance between the object and the camera. Furthermore, Efsubscript𝐸𝑓E_{f}italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT refers to the forward scatter component, which is light reflected by an object and scattered at a small angle before reaching the camera:

Ef(x,y)=Ed(x,y)g(x,y)subscript𝐸𝑓𝑥𝑦subscript𝐸𝑑𝑥𝑦𝑔𝑥𝑦E_{f}(x,y)=E_{d}(x,y)*g(x,y)italic_E start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_E start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x , italic_y ) ∗ italic_g ( italic_x , italic_y ) (3)

To denote the convolution operator, the symbol \ast is used, and g(x,y)𝑔𝑥𝑦g(x,y)italic_g ( italic_x , italic_y ) represents the point spread function (PSF). To avoid the mathematically complex issue of solving the deconvolution through PSF estimation, researchers typically assume that the underwater scene is close to the camera, thereby neglecting the impact of forward scattering.

Ebsubscript𝐸𝑏E_{b}italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT represents the backscattered light reflected by particles in the water. This component does not include light from the object itself, as it is primarily caused by the scattering of floating particles. Bsubscript𝐵B_{\infty}italic_B start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT denotes the underwater background light.

Eb(x,y)=B(λ)(1ecd(x,y))subscript𝐸𝑏𝑥𝑦subscript𝐵𝜆1superscript𝑒𝑐𝑑𝑥𝑦E_{b}(x,y)=B_{\infty}(\lambda)(1-e^{-cd(x,y)})italic_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_B start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_λ ) ( 1 - italic_e start_POSTSUPERSCRIPT - italic_c italic_d ( italic_x , italic_y ) end_POSTSUPERSCRIPT ) (4)

1.2.2 Suspended Particles

The presence of suspended particles in water can be mathematically modeled using the radiative transfer equation, which describes the interaction of light with matter. For underwater images, this equation can model the propagation of light through the water column, including the effects of scattering and absorption by suspended particles [46].

A common approach to enhancing underwater images involves using a dehazing algorithm that estimates the transmission map of the image. This map represents the fraction of light that has successfully transmitted through the water column. The transmission map can be estimated using the following equation:

t(x)=e(βd(x))𝑡𝑥superscript𝑒𝛽𝑑𝑥t(x)=e^{(-\beta d(x))}italic_t ( italic_x ) = italic_e start_POSTSUPERSCRIPT ( - italic_β italic_d ( italic_x ) ) end_POSTSUPERSCRIPT (5)

where t(x)𝑡𝑥t(x)italic_t ( italic_x ) is the transmission at pixel x𝑥xitalic_x, d(x)𝑑𝑥d(x)italic_d ( italic_x ) is the distance between the camera and pixel x𝑥xitalic_x, and β𝛽\betaitalic_β is the scattering coefficient of the water. The scattering coefficient depends on the concentration and size distribution of suspended particles in the water and can be estimated using empirical or theoretical models.

Once the transmission map is estimated, it can be used to remove the effects of haze and recover the original colors and contrast of the image using the following equation:

I(x)=(I(x)A)t(x)+A𝐼𝑥𝐼𝑥𝐴𝑡𝑥𝐴I(x)=\frac{(I(x)-A)}{t(x)}+Aitalic_I ( italic_x ) = divide start_ARG ( italic_I ( italic_x ) - italic_A ) end_ARG start_ARG italic_t ( italic_x ) end_ARG + italic_A (6)

where I(x)𝐼𝑥I(x)italic_I ( italic_x ) is the intensity of the image at pixel x𝑥xitalic_x, A𝐴Aitalic_A represents the atmospheric light (the color of the light in the absence of scattering), and t(x)𝑡𝑥t(x)italic_t ( italic_x ) is the estimated transmission at pixel x𝑥xitalic_x.

Color correction algorithms can also be employed to compensate for the color distortion caused by suspended particles. A common approach is to estimate the color of the ambient light in the underwater environment using a white-balancing algorithm and then adjust the color balance of the image accordingly.

In summary, the key to mathematically enhancing underwater images lies in modeling the effects of suspended particles on light transmission using the radiative transfer equation and applying appropriate image enhancement techniques to mitigate these effects.

1.2.3 Non-Uniform Illumination

Absorption and scattering of light in water can lead to blurriness, reduced contrast, and an overall decline in image quality. These effects are further exacerbated in high-turbidity underwater conditions or when powerful artificial light sources are used [51]. Such light sources can cause non-uniform lighting in fluorescence, resulting in reflections that obscure image details and create bright spots as shown in Fig 5.

Refer to caption
Figure 5: Non-uniform illumination and the presence of suspended particles in water, demonstrating how absorption and scattering lead to blurriness, reduced contrast, and loss of image quality. High turbidity and powerful artificial light sources exacerbate these effects, causing reflections and bright spots that obscure image details [35]

A common method to model non-uniform illumination in underwater environments is by using the Beer-Lambert law. This law describes how light intensity attenuates as it travels through a medium, stating that the intensity of light decreases exponentially with distance:

I=I0e(kd)𝐼subscript𝐼0superscript𝑒𝑘𝑑I=I_{0}*e^{(-k*d)}italic_I = italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∗ italic_e start_POSTSUPERSCRIPT ( - italic_k ∗ italic_d ) end_POSTSUPERSCRIPT (7)

where I𝐼Iitalic_I is the intensity of the light after passing through the medium, I0subscript𝐼0I_{0}italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the initial intensity of the light, k𝑘kitalic_k is the extinction coefficient of the medium (a measure of how much the medium absorbs or scatters light), and d𝑑ditalic_d is the distance the light has traveled through the medium.

In underwater environments, the extinction coefficient can vary depending on factors such as water depth, water clarity, and the presence of suspended particles or plankton. Thus, the Beer-Lambert law can be used to model the non-uniformity of underwater illumination [36].

Other factors contributing to non-uniform illumination in underwater environments include the angle of incidence of the light, the direction and intensity of light fluorescence, and the presence of shadows and reflections. Modeling these factors may require more complex mathematical formulas, such as ray tracing or radiative transfer models.

1.2.4 Fluorescence

Fluorescence is a phenomenon where certain materials absorb light at one wavelength and emit it at a longer wavelength. However, underwater image processing provides methods to overcome these challenges. As shown in Fig 6, visual information can be combined with quantitative assessment to effectively address these issues.

Refer to caption
Figure 6: Illustration of fluorescence, a phenomenon where certain materials absorb light at one wavelength and emit it at a longer wavelength, commonly affecting underwater images by causing color distortions. [34]

To address these challenges, various techniques and algorithms have been developed for underwater image enhancement. These include classical methods such as histogram equalization and Retinex, as well as deep learning-based approaches utilizing CNNs, GANs, and U-Net [47]. These techniques aim to improve contrast, sharpness, and color balance in images while minimizing the effects of scattering, absorption, and other factors. However, there is still significant work required to further enhance the quality and clarity of underwater images, particularly under challenging conditions [29].

2 Major Contribution

Most existing works have designed a single deep network for image quality improvement. In contrast, this work proposes an Iterative Framework for Degradation Aware Underwater Image Enhancement (IDA-UIE).

IDA-UIE identifies a dominant degradation condition and appropriately enhances it. Correction of one degradation may reveal another degradation condition. Thus, the enhanced image is further subjected to degradation identification and subsequent enhancement. This system attempts to improve the image quality through degradation-aware enhancement iterations.

This section details the significant contributions made in this project, which focus on enhancing underwater images through an innovative framework and specialized deep networks.

  1. 1.

    Iterative Framework for Degradation Aware Underwater Image Enhancement : One of the primary contributions is the proposal of an iterative framework specifically designed for degradation-aware underwater image enhancement. Traditional methods often employ a single deep network to improve image quality. However, these approaches can fall short when dealing with complex and varied degradation types found in underwater images.

    Our iterative framework, named Iterative Degradation Aware Underwater Image Enhancement (IDA-UIE), addresses this by identifying the dominant degradation condition in an image and enhancing it accordingly. The process is iterative because enhancing one type of degradation can reveal another underlying issue. Thus, after the initial enhancement, the image is re-evaluated for additional degradations, which are then corrected in subsequent iterations. This iterative approach ensures a comprehensive enhancement process, gradually improving the image quality through multiple refinement steps.

  2. 2.

    Deep Network for Identifying Dominant Degradation : To support the iterative framework, we designed a deep network, denoted as 𝚽DCsubscript𝚽𝐷𝐶\mathbf{\Phi}_{DC}bold_Φ start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT, for identifying the dominant degradation in underwater images. This network is critical as it drives the entire enhancement process by accurately pinpointing the most significant degradation affecting the image.

    The 𝚽DCsubscript𝚽𝐷𝐶\mathbf{\Phi}_{DC}bold_Φ start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT network is trained to recognize eight specific types of degradation: low illumination, low contrast, haziness, blur, noise, and color imbalances in the red, green, and blue channels. Additionally, it can identify if an image is not degraded. This identification step is crucial for ensuring that each image receives the appropriate type of enhancement.

  3. 3.

    Eight Deep Networks for Condition-Specific Underwater Image Enhancement : Following the identification of the dominant degradation, the framework employs one of eight specialized deep networks to enhance the image. Each of these networks is tailored to address a specific type of degradation:

    𝚽ICsubscript𝚽𝐼𝐶\mathbf{\Phi}_{IC}bold_Φ start_POSTSUBSCRIPT italic_I italic_C end_POSTSUBSCRIPT: Illumination Correction - Enhances images with low illumination, improving visibility and detail.

    𝚽CEsubscript𝚽𝐶𝐸\mathbf{\Phi}_{CE}bold_Φ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT : Contrast Enhancement - Increases the contrast in images, making features more distinguishable.

    𝚽DHsubscript𝚽𝐷𝐻\mathbf{\Phi}_{DH}bold_Φ start_POSTSUBSCRIPT italic_D italic_H end_POSTSUBSCRIPT: Removes haziness to clarify images.

    𝚽DBsubscript𝚽𝐷𝐵\mathbf{\Phi}_{DB}bold_Φ start_POSTSUBSCRIPT italic_D italic_B end_POSTSUBSCRIPT: Sharpens images to correct blur.

    𝚽DNsubscript𝚽𝐷𝑁\mathbf{\Phi}_{DN}bold_Φ start_POSTSUBSCRIPT italic_D italic_N end_POSTSUBSCRIPT: Reduces noise to produce cleaner images.

    𝚽CBRsubscript𝚽𝐶𝐵𝑅\mathbf{\Phi}_{CBR}bold_Φ start_POSTSUBSCRIPT italic_C italic_B italic_R end_POSTSUBSCRIPT : Color Balance for Red Channel - Corrects color imbalances in the red channel.

    𝚽CBBsubscript𝚽𝐶𝐵𝐵\mathbf{\Phi}_{CBB}bold_Φ start_POSTSUBSCRIPT italic_C italic_B italic_B end_POSTSUBSCRIPT: Color Balance for Blue Channel - Corrects color imbalances in the blue channel.

    𝚽CBGsubscript𝚽𝐶𝐵𝐺\mathbf{\Phi}_{CBG}bold_Φ start_POSTSUBSCRIPT italic_C italic_B italic_G end_POSTSUBSCRIPT: Color Balance for Green Channel - Corrects color imbalances in the green channel.

    Each network has been meticulously designed and trained to excel at its specific enhancement task, ensuring that the iterative framework can effectively improve various aspects of underwater images.

  4. 4.

    Construction of Two Datasets with Condition-Specific Degradations : To train the nine deep networks (one for degradation identification and eight for specific enhancements), we constructed two extensive datasets: UIEB-D8 and EUVP-X-D8. These datasets are based on standard underwater image datasets but have been augmented with condition-specific degradations to simulate real-world underwater conditions more accurately.

    Each image in these datasets has been systematically degraded to reflect one of the eight targeted conditions. This detailed and condition-specific dataset construction ensures that the networks are well-trained to recognize and correct each type of degradation effectively.

    UIEB-D8 Dataset The UIEB-D8 dataset is derived from the UIEB dataset  [14], with images subjected to controlled degradations to create training examples for each of the eight conditions. This dataset provides a robust foundation for training the enhancement networks.

    EUVP-X-D8 Dataset Similarly, the EUVP-X-D8 dataset is based on the EUVP dataset  [13] and includes images with various degradations. By using these two diverse datasets, the networks are trained to handle a wide range of underwater image conditions, enhancing their generalizability and effectiveness.

Refer to caption
Figure 7: The functional block diagram of the Iterative Framework for Degradation Aware Underwater Image Enhancement (IDA-UIE), illustrating the process of identifying dominant degradations and applying condition-specific enhancements iteratively to improve overall image quality.

3 Related Work

Here, presents a classification and summary of existing techniques for enhancing underwater images, mainly categorized into traditional and deep learning-based methods. The underwater image enhancement (UIE) techniques are broadly categorized in Figure 8.

Refer to caption
Figure 8: Techniques of Underwater Image Enhancement (UIE), categorized into traditional methods and deep learning-based methods, illustrating the various approaches used to improve underwater image quality. [2]

3.1 Traditional Methods

Traditional methods include both model-based and non-model methods. Non-model methods, such as the histogram algorithm, enhance visual effects through pixel adjustments without considering imaging principles. Model-based methods, also known as image restoration techniques, estimate the relationship between clear, blurry, and transmission images based on an imaging model to produce clear images. An example of a model-based method is the dark channel prior (DCP) algorithm, as shown in Figure fig:technique [2].

3.1.1 Image Denoising

Image denoising is a technique used to reduce or remove noise from digital images. It aims to improve the visual quality of an image by suppressing unwanted noise while preserving important details and structures [27].

3.1.2 Contrast Enhancement Techniques

Image quality is often evaluated based on the level of contrast in the image. Contrast refers to the difference in luminance reflected from two adjacent planes and is a key factor in making objects distinguishable from the background. Vision is more sensitive to contrast than absolute luminance, which allows us to perceive the world despite variations in illumination conditions. If an image has highly concentrated contrast in a particular range, such as being very dark, critical information may be lost in those areas. Therefore, optimizing the contrast is necessary to represent all the details in the input image. To address issues related to contrast in underwater image processing, numerous algorithms for achieving contrast enhancement have been developed [2].

3.1.3 Color Correction Techniques

The colors present in underwater images are mainly blue and green due to their shorter wavelengths. The histogram distribution of these images indicates that the green channel’s mean is more significant than that of the red channel, and the RGB channels’ distribution range does not cover the full range of [0, 255]. To correct the issue of color cast, color correction techniques are used to improve the visual information content of underwater images. A manual correction approach is found to be better than automatic enhancement techniques in terms of the significance level. An enhancement method that uses fuzzy logic and bacterial foraging optimization is proposed to remove the color cast, which gives better results than existing algorithms. Additionally, a method for non-uniform illumination correction is proposed, which uses maximum-likelihood estimation to map the image to Rayleigh distribution. An adaptive linear stretch method that adjusts regions with low light distributions with a threshold depending on the histogram is also proposed.

3.1.4 Histogram Equalization Method

Underwater images often require image enhancement for improved quality. As such, there are several methods available in the literature to address this issue. In this study, a new underwater image enhancement method is proposed. This method employs the HSV, V transform algorithm, and histogram equalization techniques. Initially, the RGB image is separated into its R, G, and B components, and then converted into the HSV color space. The V element is then extended within a specified interval before converting the image back to the RGB color space. Histogram equalization is then applied to each of the R, G, and B components, and the components are combined to form a color image. Finally, a Gaussian low-pass filter is applied to the image. The performance of the proposed method is compared to that of other studies using mean value and entropy metric, which demonstrate that the proposed method significantly improves underwater image quality [31].

3.1.5 CLAHE

Ordinary AHE tends to over-amplify the contrast in near-constant regions of the image. It is originally developed for the enhancement of low-contrast images [34]. CLAHE is a variant of adaptive histogram equalization in which contrast amplification is limited to reduce this problem of noise amplification. In order to limit noise amplification, CLAHE is used [44]. In CLAHE, the contrast-limited procedure is applied to each neighborhood from which the transformation function is derived. Rather than taking the whole image, CLAHE prevents over-amplification by dividing the image into small data regions called tiles and performing contrast enhancement [4]. These tiles are then rejoined to get an overall enhanced image. It is applied to both grayscale and colored images [30] [4].

3.1.6 Retinex Based Method

Underwater images often suffer from low contrast and color distortion due to the variable attenuation of light and non-uniform absorption of red, green, and blue components. To address these issues, a Retinex-based approach for underwater image enhancement has been proposed. The approach involves using contrast-limited adaptive histogram equalization (CLAHE) to enhance the contrast of the darker components of the underwater image while limiting noise, which may blur visual information. Next, a Retinex-based enhancement is performed on the CLAHE-processed image to restore distorted colors [30] [4]. To restore distorted edges and achieve smoothing of the blurred parts of the image, bilateral filtering is performed on the Retinex-processed image. To optimize the individual strengths of CLAHE, Retinex, and bilateral filtering algorithms within a single framework, suitable parameter values are determined. Comparing the performance of the proposed approach with existing methods, both qualitatively and quantitatively, indicates that it results in better enhancement of underwater images [29].

3.1.7 Dark Channel Prior

Haze arises from particles suspended in bodies of water such as sand, minerals, and plankton. This phenomenon disrupts the clarity of underwater images by reducing contrast, causing poor visibility, absorbing natural light, and limiting color variation. Enhancing the quality and visibility of underwater images requires the dehazing process [22]. This research introduces the Dark Channel Prior (DCP) algorithm, which capitalizes on the observation that most local patches in haze-free outdoor images contain pixels with very low intensity in at least one color channel. By utilizing DCP, underwater images exhibit significantly improved visibility and superior color accuracy. Moreover, this approach reduces computational complexity and enhances dehazing efficiency. Underwater images experience distortions primarily due to light dispersion and color effects. The dispersion of light and its scattering in water reduces the visibility and contrast of captured images. Additionally, color changes caused by the presence of particles such as sand, minerals, and plankton in the water, along with the absorption and scattering of natural light, further impact underwater images. When light reflects from objects in the water, it encounters suspended particles, leading to light absorption and scattering [22]. To address these issues, the Dark Channel Prior (DCP) method is applied. This method estimates the atmospheric light and utilizes a mathematical function to handle both sky and non-sky regions. It identifies affected patches in the images, estimates the scene depth, and removes the haze to enhance the clarity of the image. To improve the accuracy of the depth map generated by the block-based dark channel prior, image matting is employed. This combination of techniques enhances accuracy and enables more precise identification of object contours [22]. The application of image matting to the underwater depth map, derived through the general dark-channel methodology, represents a novel approach. Subsequently, the following section presents a list of existing works in this field.

3.1.8 Other Methods

Underwater images often suffer from low contrast and poor visibility, making it crucial to enhance them before further processing. Image enhancement techniques aim to improve the quality and contrast of degraded underwater photos and videos. Standard cameras used for capturing underwater scenes face challenges such as limited available light, low resolution, and blurriness, necessitating the improvement of the initial images or videos obtained from image processing equipment. Researchers have proposed various solutions to address these challenges.

One commonly used approach for enhancing underwater images is the dark channel prior (DCP), which aims to improve the Peak Signal to Noise Ratio (PSNR). However, DCP has significant drawbacks, including the tendency to darken images, reduce contrast, and introduce halo effects. To overcome these limitations, the suggested technique incorporates contrast-limited adaptive histogram equalization (CLAHE) and the Adaptive Color Correction technique.

To evaluate the proposed approach, experiments were conducted using photographs obtained from the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) as well as from the internet. Performance measures such as entropy (MOE), enhancement (EME), mean square error (MSE), and PSNR were used during the evaluation. The results demonstrate that the proposed framework outperforms other methods in terms of MSE and PSNR, achieving values of 0.26 and 32, respectively.

Mean Filter The mean filter is a method employed to decrease image noise. It involves performing a local averaging operation, making it one of the most basic linear filters. In this technique, each pixel’s value is substituted with the average value of all the pixels in its surrounding neighborhood. If we denote a noisy image as f(i,j)𝑓𝑖𝑗f(i,j)italic_f ( italic_i , italic_j ), the resulting smoothed image can be obtained as g(x,y)𝑔𝑥𝑦g(x,y)italic_g ( italic_x , italic_y ) by following this process.

g(x,y)=1ni,jSf(i,j)𝑔𝑥𝑦subscript1𝑛𝑖𝑗𝑆𝑓𝑖𝑗g(x,y)=\frac{1}{n}_{i,j\in S}\sum f(i,j)italic_g ( italic_x , italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG start_POSTSUBSCRIPT italic_i , italic_j ∈ italic_S end_POSTSUBSCRIPT ∑ italic_f ( italic_i , italic_j ) (8)

Bilateral Filter A bilateral filter is a non-linear image-smoothing filter that preserves edges while reducing noise. It operates by replacing the intensity of each pixel with a weighted average of the intensities of nearby pixels. The weights are determined using a Gaussian distribution.

BF[I]p=qSGσs(pq)Gσr(|IpIq|)Iq𝐵𝐹subscriptdelimited-[]𝐼𝑝subscript𝑞𝑆subscript𝐺subscript𝜎𝑠norm𝑝𝑞subscript𝐺subscript𝜎𝑟subscript𝐼𝑝subscript𝐼𝑞subscript𝐼𝑞BF[I]_{p}=\sum_{q\in S}G_{\sigma_{s}}(||p-q||)G_{\sigma_{r}}(|I_{p}-I_{q}|)I_{q}italic_B italic_F [ italic_I ] start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_q ∈ italic_S end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | | italic_p - italic_q | | ) italic_G start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( | italic_I start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ) italic_I start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT (9)

Gaussian Filter A Gaussian Filter serves as a low-pass filter employed to diminish noise (high-frequency components) and blur specific areas within an image. This filter is constructed as an Odd-sized Symmetric Kernel (a Matrix in Digital Image Processing terms), which is applied to each pixel in the Region of Interest to achieve the intended outcome. The kernel is designed to be gentle regarding significant colour changes (edges), as the pixels near the centre of the kernel hold more significance in determining the final value compared to those at the edges.

G(x,y)=12πσ2ex2+y22σ2𝐺𝑥𝑦12𝜋superscript𝜎2superscript𝑒superscript𝑥2superscript𝑦22superscript𝜎2G(x,y)=\frac{1}{2\pi\sigma^{2}}e^{-\frac{x^{2}+y^{2}}{2\sigma^{2}}}italic_G ( italic_x , italic_y ) = divide start_ARG 1 end_ARG start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT (10)

Median Filter The median filter, frequently employed in digital filtering, is a non-linear technique aimed at eliminating noise from images or signals. It serves as a common pre-processing step to enhance subsequent processing outcomes, such as image edge detection.

3.2 Deep Learning

Here, proposes a CNN-based network for enhancing underwater images, which can learn a map** to estimate the color-corrected image and transmission map without requiring extra labels on the target source. The report employs a pixels-disrupting strategy to suppress the interference of tiny textures in local patches, resulting in improved convergent speed and accuracy during the learning process. The proposed framework is trained on a synthesis dataset of 200,000 underwater images using the underwater imaging model presented in this report and demonstrates superior generalization ability on real-source underwater images.

Deep underwater image enhancement algorithms can be categorized into two primary types: CNN-based and GAN-based algorithms. The CNN algorithms focus on preserving the authenticity of the original underwater image, while the GAN-based algorithms strive to enhance the visual quality of the images. However, this classification is simplistic, so classify the networks based on their architectural distinctions.

3.2.1 Encoder-Decoder Models

The following models benefit from the well-known encoder–decoder architecture to advance underwater image enhancement research. P2P Network Recently, [21] proposed an approach to improve the quality of underwater images using pixel-to-pixel (P2P) networks. Their model, resembling REDNet [20], adopts a symmetric architecture consisting of an encoder and a decoder. The encoder is constructed with three convolutional layers, while the decoder is formed by three deconvolutional layers. ReLU activation is applied to each network element except for the last one as shown in Fig 9.

To train the model, the authors utilized a dataset of 3359 real-world underwater images. They introduced degradation levels by adding 30, 50, and 70 ml of milk to 1 m3superscript𝑚3m^{3}italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT of water, representing low, medium, and high degradation, respectively. Among the dataset, 10,000 images were chosen for training purposes, and an additional 2000 images were reserved for testing.

Refer to caption
Figure 9: Overview of CNN-based and GAN-based algorithms for underwater image enhancement. The CNN-based network learns to estimate color-corrected images and transmission maps without extra labels, employing a pixels-disrupting strategy for improved convergence and accuracy. Encoder-Decoder models, such as the P2P Network, utilize a symmetric architecture with convolutional and deconvolutional layers to enhance image quality. [21]

To achieve a data-driven image enhancement model, the super-parameters of our network play a crucial role. The convolutional part retains the first three layers while discarding the fully connected layers. The reason behind this decision is that the full connection layers are designed for feature map** from two dimensions to one, primarily for input to classifiers. However, the objective is to create a pixel-to-pixel network for image enhancement, which differs from classification tasks. Utilizing full connection layers would result in the loss of important two-dimensional information, making it unsuitable for underwater image enhancement[21]. Additionally, chose to abandon the pooling layers. Although pooling and unpooling layers can enhance object recognition and semantic segmentation by sharpening object edges, they are unnecessary and detrimental to image enhancement and denoising tasks. This is primarily because pooling layers lead to denser feature graphs during the multi-to-one map** operation, causing the loss of spatial information within a receptive field. Furthermore, the corresponding unpooling layers introduce considerable noise information. During the unpooling map**, only one value originates from the original feature map, while the remaining values are artificially generated (typically filled with zeros) [21].

3.2.2 U-Net

The improvement of U-Net is based on network structure. The specific structure diagram is shown in Fig. 10. The convolutional block attention module (CBAM) was added to the first U-Net as an attention mechanism module that combines spatial and channel. By applying attention to both the channel and spatial dimensions, it can be embedded into most current mainstream networks, and the feature extraction ability of the network model can be improved without significantly increasing the amount of computation and the number of parameters. A latent image representing the underwater image after compensating for the red light was estimated by using a U-Net, and another U-Net was used to estimate the transmission image from the input grey-scale image. To avoid losing details during network map**, the CBAM was added [28]. The first U-Net consists of an encoder stage and a decoder stage. The encoder stage consists of five network layers, with each layer containing two convolution layers. A kernel size of 3 is used in each convolutional layer, and each convolutional layer is followed by a LeakyReLU activation function and a BatchNorm2d function.

Refer to caption
Figure 10: Diagram of the improved U-Net structure for underwater image enhancement. The convolutional block attention module (CBAM) is integrated as an attention mechanism to enhance feature extraction by applying attention to both channel and spatial dimensions. This structure includes an encoder stage with five layers, each containing two convolutional layers with a kernel size of 3, followed by LeakyReLU activation and BatchNorm2d functions. The first U-Net estimates the latent image after compensating for red light, while another U-Net estimates the transmission image from the input grayscale image [28] [28]

A combination of multi-scale structure similarity and L1 is used for the loss function. To calculate SSIM, the appropriate selection of the size of the Gaussian kernel to compute the image mean value and variance is particularly crucial. If the selection is small, the local structure of the image cannot be ll-maintained by the calculated SSIM loss, and artifacts will appear. If the selection is large, the noise will be generated by the network at the edge of the image.

3.2.3 Conditional Generative Adversarial Network

Underwater images are crucial for obtaining and interpreting information about the underwater environment. The reliability of underwater intelligent systems depends on high-quality underwater images. Unfortunately, these images often suffer from low contrast, color casts, blurring, low light, and uneven illumination, which severely limit their usefulness. To address this issue, numerous methods have been proposed, including those that utilize deep learning technologies. Hover, the performance of these methods is often unsatisfactory due to a lack of sufficient training data and effective network structures [26].

To tackle these challenges, this report proposes a conditional generative adversarial network (cGAN) for enhancing underwater images. The proposed approach uses a multi-scale generator to produce clear underwater images and a dual discriminator to capture local and global semantic information, ensuring that the generated results are both realistic and natural. Experimental results, obtained from both real-world and synthetic underwater images, show that the proposed method outperforms existing state-of-the-art underwater image enhancement methods [26].

Refer to caption
Figure 11: Illustration of the proposed conditional generative adversarial network (cGAN) for enhancing underwater images. This approach employs a multi-scale generator to produce clear underwater images and a dual discriminator to capture both local and global semantic information, ensuring realistic and natural results. Experimental results demonstrate that this method outperforms existing state-of-the-art underwater image enhancement techniques [26][26].

Multi-Scale Generator. cGAN’s multi-scale generator comprises three main components: a multi-scale feature extraction unit, a feature refinement unit, and a residual map estimation unit. The multi-scale feature extraction unit is constructed using three sets of multi-scale convolutions with different kernel sizes (7x7, 5x5, and 3x3), each set consisting of five convolutional layers with increasing filter numbers ranging from 16 to 256 [26]. A non-linear activation ReLU follows each convolutional layer. The multi-scale feature extractor aims to obtain statistical information from inputs on various scales by acquiring different receptive fields. The multi-scale features are then down-sampled by half of their original size, concatenated, and fed to the feature refinement unit to capture global features and reduce computational costs. The refined features are processed through successive convolutional layers before being down-sampled and fed to three successive convolutional layers, each with 64 filters, and then up-sampled to their original size. Finally, a residual map is estimated by a convolutional layer without non-linear activity, which is used to achieve the final enhanced result via element-wise addition. Zero padding is applied to each convolutional layer to maintain input and output sizes. With the exception of the multi-scale feature extractor’s convolutional layers, all convolutional layers have 3x3 kernel sizes. Unlike the common encoder-decoder and cGAN network structures, the generator includes a multi-scale feature extraction unit designed to enhance network capability and adapt to varying underwater sources. Additionally, the generator has a shallow and lightweight structure and does not use skip connections.

Dual Discriminator. The dual discriminator comprises two sub-discriminators with identical network structures but different weights. Additionally, the inputs to these sub-discriminators have different sizes - one is the original size, while the other is half the original size. The dual discriminator aims to guide the generator in producing realistic images at both the global semantic and local detail levels. This design is necessary because the existing discriminator cannot effectively guide the generator to create realistic details. By providing multi-resolution inputs to different discriminators, the visual quality of the results can be improved. Specifically, the sub-discriminator contains eight convolutional layers with an increasing number of 3x3 filters, increasing from 64 to 512 by a factor of 2. Stridden convolutions are used to reduce the image resolutions, and the 512 feature maps are fed to two fully connected layers to predict the probability of the inputs being real or fake. Unlike the multi-scale generator, the first convolution in the sub-discriminator is followed by Leaky ReLU non-linear activation, while the other convolutions are followed by batch normalization and Leaky ReLU. The last fully connected layer uses the Sigmoid non-linear activation to predict the probability, which is commonly used in image classification tasks. These two sub-discriminators are employed to guide the multiscale generator [26].

3.2.4 Cycle GAN

A variation of the standard GAN network structure is the cycle-consistent adversarial network (CycleGAN), which uses two mirror-symmetric GAN generators and two matching discriminators arranged in a ring network. The CycleGAN framework involves training two GAN networks, denoted as G and F, along with two discriminators, Dxsubscript𝐷𝑥D_{x}italic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Dysubscript𝐷𝑦D_{y}italic_D start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT. The generators G and F are utilized to discover the map** relationships between the X and Y domains and the Y and X domains, respectively. The necessary conditions for the input picture and the produced image to correlate are F(G(x)) \approx x and G(F(y)) \approx y. To achieve cyclic consistency, Cyc1e GAN is suggested as the cyclic consistency loss function. This network structure overcomes the challenge faced by GANs, which require paired data for training, and performs all with underwater photos that do not have paired data [33]. The CycleGAN is a GAN designed for unpaired image-to-image translation, where the task is to translate images from a Source domain X to a target domain Y. It consists of two GANs, one for translating from domain X to Y and one from Y to X [32]. The two discriminators represent the functions:

DA:XR;DB:YR:subscript𝐷𝐴𝑋𝑅subscript𝐷𝐵:𝑌𝑅D_{A}:X\to R;D_{B}:Y\to Ritalic_D start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT : italic_X → italic_R ; italic_D start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT : italic_Y → italic_R (11)

the two generators represent the function:

GA:XY;GB:YX:subscript𝐺𝐴𝑋𝑌subscript𝐺𝐵:𝑌𝑋G_{A}:X\to Y;G_{B}:Y\to Xitalic_G start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT : italic_X → italic_Y ; italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT : italic_Y → italic_X (12)

Discriminator The structure of the discriminators used in cycleGAN is rather conventional: fully convolutional neural networks with five-layer blocks, each of which has an instance normalization layer, a leaky reLU layer, and a 2D convolution layer with a kernel size of 4x4 and stride of 2. (except the output block which uses a Sigmoid Layer as activation) [48]. Each of the first, five-layer blocks will reduce the size of the picture by half and increase the number of channels every time an image is fed into a discriminator. The input will thus have 512 channels and a size of 16x16 after the fifth layer (The model input has a size of 256x256). The output layer block will finally combine all 512 channels into a single 16x16 channel.

Refer to caption
Figure 12: Structure of the PatchGAN discriminator used in CycleGAN for underwater image enhancement. The discriminator comprises five convolutional blocks, each with instance normalization, leaky ReLU, and 2D convolution layers. The final block uses a Sigmoid activation. Input image size is halved and channels increased at each block, resulting in a 16x16 output with 512 channels [28]

.

Generator A generator’s goal is to alter the input picture and produce it as the output. A neural network structure is made up of three parts an encoder, a transformer, and a decoder. While increasing the number of channels, the encoder reduces the size of the input pictures. It is made up of 3 layers of blocks, similar to the Discriminator, with a 2D Convolution Layer, an Instance Normalization Layer, and a Leaky ReLU Layer in each block. The first layer block just adds 64 input channels; it has no effect on the image’s size. However, each of the next 2 layers of blocks reduces the input size by 50% while increasing the number of channels. The transformer then receives the altered input [45] [47]. The transformer maintains the input’s size while adding the needed characteristics [43]. It has six ResNet blocks, also known as residual netblocks. Each ResNet has two layers blocks: the first layer block has a Leaky ReLU Layer, an Instance Normalization Layer, and a 2D Convolution Layer (with stride=1). A 2D Convolution Layer (with stride=1) and an Instance Normalization Layers are both included in the second layer block. The decoder receives the modified input after that [41]. To create the final output image, the decoder shrinks the input to its original size and collapses all channels into RGB. Two Transpose Convolution Layers are stacked together to accomplish the enlargement operation. A transpose convolution layer might be thought of as a simple combination of a 2D Up-sampling layer and a 2D convolution layer with stride=1. In general, it will reduce the number of channels while increasing the size of the input. The output layer will eventually receive the 256x256 pixel data with 64 channels generated by the two transpose convolution layers, collapse the channels into RGB, and output it as the final output image [38] [39].

Refer to caption
Figure 13: Structure of the generator in CycleGAN for underwater image enhancement. It consists of an encoder, transformer, and decoder. The encoder reduces image size and increases channels, the transformer adds features with six ResNet blocks, and the decoder restores the original size and converts channels to RGB for the final output [28].

Deep learning techniques have shown promise in enhancing underwater images, but there are gaps in the research that need to be addressed. One challenge is the high number of parameters involved, leading to overfitting and reduced generalization ability. There is a need for research to develop efficient deep-learning models with fewer parameters that still achieve good performance. Another gap is the interpretability of deep learning models, which are often considered "black-box" models, making it difficult to understand how they make decisions. There is a need for develo** methods to interpret these models to identify strengths and weaknesses and improve performance. Overall, the research gap in deep learning for underwater image enhancement is in develo** efficient and interpretable models with good performance.

Several methods have been proposed to tackle challenges in Underwater Image Enhancement (UIE). Challenges like light attenuation and scattering often result in color casts and diminished visibility [15]. One particularly noteworthy approach introduced a novel quality assessment method centered around colorfulness, contrast, and visibility metrics, providing an effective means to evaluate UIE outcomes [15]. However, the diverse underwater landscapes pose a challenge to existing color constancy methods. To address this, an adaptive UIE technique leveraging hue channel statistics and deep learning networks trained on authentic datasets with ground truth annotations was developed in [16].

Texture and color enhancement are pivotal for effective underwater image enhancement, and the Texture-Aware and Color-Consistent Network (TACC-Net) has emerged as a standout performer in this regard. By decoupling features to enhance texture and ensure color consistency, TACC-Net has significantly improved visual quality [17]. Meanwhile, issues such as light absorption and turbulence continue to impair image quality in underwater target imaging, affecting clarity and resolution. To address these challenges, a study has proposed a block mixed filter denoising technique and underscored the importance of objective quality evaluation for image enhancement methods [18].

3.3 Baseline Methods:

3.3.1 Fusion Based

This paper [23] introduces a novel strategy to enhance underwater videos and images using fusion principles. The unique aspect of this strategy is that it derives both the inputs and the weight measures solely from the degraded version of the image, without the need for specialized hardware or prior knowledge of underwater conditions or scene structure.

The approach involves the derivation of two inputs from the original underwater image or frame. The first input is a color-corrected version that addresses the color distortion commonly caused by underwater environments. The second input is a contrast-enhanced version, which aims to improve the visibility of details often lost in the hazy underwater images. These inputs help to mitigate the color and contrast issues inherent in underwater imaging [23].

Additionally, four weight maps are defined to increase the visibility of distant objects, which are usually degraded due to medium scattering and absorption in underwater environments. These weight maps help in selectively emphasizing important features in the image, enhancing the overall clarity.

The fusion framework integrates these inputs and weight maps to produce an enhanced image. This approach ensures that the finest details and edges in the image are significantly improved. The enhanced images and videos are characterized by a reduced noise level, as effective edge-preserving noise reduction strategies are applied to minimize noise while retaining important details. Dark areas in the image are better exposed, making hidden details more visible. The overall contrast of the image is enhanced, making it more visually appealing and informative.

For videos, the framework also ensures temporal coherence between adjacent frames. This means that the enhancement process maintains consistency across frames, preventing flickering or abrupt changes that can distract viewers.

The utility of this enhancement technique is demonstrated across several challenging applications, showing its versatility and effectiveness in various underwater imaging scenarios.

3.3.2 UGan-Based

Autonomous underwater vehicles (AUVs) rely on a variety of sensors, including acoustic, inertial, and visual sensors, for intelligent decision-making. Among these, vision is particularly attractive due to its non-intrusive, passive nature and high information content, especially at shallower depths. However, several factors adversely affect the quality of visual data obtained underwater. Light refraction and absorption, suspended particles in the water, and color distortion all contribute to producing noisy and distorted images. Consequently, AUVs that depend on visual sensing face significant challenges and often exhibit poor performance on vision-driven tasks [50].

This paper [50] proposes a method to enhance the quality of visual underwater scenes using Generative Adversarial Networks (GANs). The goal is to improve the visual input for vision-driven behaviors further down the autonomy pipeline of AUVs. GANs are well-suited for this task because of their ability to generate high-quality images that closely resemble real-world scenes, making them ideal for underwater image restoration [50].

The key challenges in underwater visual data include light refraction and absorption, suspended particles, and color distortion. Underwater environments significantly alter light paths, causing refraction and absorption that lead to reduced clarity and visibility. Particles in the water scatter light, creating a hazy appearance and further degrading image quality. The underwater medium absorbs different wavelengths of light at different rates, causing color distortions that affect the accuracy of visual data [50].

The proposed method leverages the power of GANs to address these challenges. GANs consist of two neural networks: a generator and a discriminator. The generator creates enhanced images from the degraded input, while the discriminator evaluates the authenticity of the generated images, driving the generator to produce increasingly realistic enhancements. This adversarial process results in images that are not only visually appealing but also more useful for subsequent vision-driven tasks.

To train the GANs effectively, a dataset specifically tailored for underwater image restoration is required. Recently proposed methods allow for the generation of such datasets by simulating various underwater conditions and degradations. This synthetic dataset includes images with different types of distortions commonly found in underwater environments, providing a comprehensive training set for the GANs.

For visually-guided underwater robots, improving the quality of visual data can lead to increased safety and reliability. Enhanced visual perception enables AUVs to perform better in tasks such as navigation, object detection, and diver tracking. The proposed GAN-based approach not only generates visually appealing images but also enhances the accuracy of vision-driven algorithms.

The effectiveness of the proposed method is demonstrated through both quantitative and qualitative evaluations. Enhanced images show significant improvements in clarity, color accuracy, and detail preservation compared to the original degraded images. Additionally, these improvements translate to increased accuracy for a diver tracking algorithm, showcasing the practical benefits of the enhanced visual data.

3.3.3 FUnIE-GAN

In this work, [13] a conditional generative adversarial network-based model is presented for real-time underwater image enhancement. The model’s adversarial training is supervised by an objective function that evaluates perceptual image quality based on global content, color, local texture, and style information. A large-scale dataset, EUVP, is introduced, consisting of paired and unpaired collections of underwater images of varying quality, captured using seven different cameras under various visibility conditions during oceanic explorations and human-robot collaborative experiments.

Several qualitative and quantitative evaluations were performed, demonstrating that the proposed model effectively learns to enhance underwater image quality from both paired and unpaired training datasets. The enhanced images improve the performance of standard models for underwater object detection, human pose estimation, and saliency prediction. These results validate the suitability of the proposed model for real-time preprocessing in the autonomy pipeline of visually-guided underwater robots[13].

Refer to caption
Figure 14: Network architecture of the proposed model, FUnIE-GAN, used for real-time underwater image enhancement. (a) The Generator improves image quality by focusing on global content, color, local texture, and style information. (b) The Discriminator supervises adversarial training using an objective function that evaluates perceptual image quality. This model is trained on the large-scale EUVP dataset, which includes paired and unpaired underwater images captured under various visibility conditions [13].

3.3.4 Deep- SESR

In this paper [12], the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision is introduced and tackled, providing an efficient solution for near real-time applications. The proposed solution, Deep SESR, is a generative model based on a residual-in-residual network that learns to restore perceptual image qualities at 2x, 3x, or 4x higher spatial resolution. The model is trained using a multi-modal objective function that addresses chrominance-specific underwater color degradation, lack of image sharpness, and loss in high-level feature representation. Additionally, the model is supervised to learn salient foreground regions in the image, which guides it to enhance global contrast.

An end-to-end training pipeline is designed to jointly learn saliency prediction and SESR on a shared hierarchical feature space for fast inference. This approach ensures that the model can process images quickly, making it suitable for near real-time applications [12].

The paper [12] also introduces UFO-120, the first dataset designed to facilitate large-scale SESR learning, containing over 1500 training samples and a benchmark test set of 120 samples. Experimental evaluations on UFO-120 and other standard datasets demonstrate that Deep SESR outperforms existing solutions for underwater image enhancement and super-resolution. The model’s generalization performance is validated on several test cases, including underwater images with diverse spectral and spatial degradation levels and terrestrial images with unseen natural objects.

Furthermore, the computational feasibility of Deep SESR for single-board deployments is analyzed, demonstrating its operational benefits for visually-guided underwater robots. The model’s ability to enhance and super-resolve images in near real-time provides significant advantages for underwater robotics, enabling more accurate and reliable visual perception in challenging underwater environments.

3.3.5 WaterNet

Underwater image enhancement is vital for marine engineering and aquatic robotics, but existing algorithms are mainly tested on synthetic datasets or limited real-world images. To evaluate these algorithms’ real-world performance, a comprehensive perceptual study using large-scale real-world images is conducted. This study introduces the Underwater Image Enhancement Benchmark (UIEB), containing 950 real-world underwater images, with 890 having corresponding reference images and 60 considered challenging due to the lack of satisfactory references [14].

The study also proposes Water-Net, an underwater image enhancement network trained on the UIEB [24]. The benchmark evaluations and Water-Net demonstrate the strengths and limitations of current algorithms, providing insights for future research. This work advances the assessment and benchmarking of underwater image enhancement algorithms, contributing to the field’s progress [14].

3.3.6 MSSCE-GAN

Enhancing underwater images is crucial for applications such as underwater exploration. Traditional methods often rely on paired underwater and reference images for training, which are challenging to acquire. These methods frequently suffer from information loss, resulting in blurred details and limited applicability across diverse underwater conditions [49].

This paper [49] introduces a novel approach using the Multi-Scale Structural and Color Enhanced Generative Adversarial Network (MSSCE-GAN) for unpaired underwater image enhancement. The method includes modules for detail feature recovery and attention enhancement, addressing various distortions prevalent in underwater imagery.

Key to this approach is its ability to generate superior enhanced images without requiring paired training data. Experimental evaluations demonstrate significant improvements over existing techniques in terms of effectiveness and generalizability across multiple underwater image datasets.

Refer to caption
Figure 15: Network architecture of the Multi-Scale Structural and Color Enhanced Generative Adversarial Network (MSSCE-GAN) for unpaired underwater image enhancement. The model includes detail feature recovery and attention enhancement modules, generating high-quality enhanced images without paired training data. It shows significant improvements over existing methods in multiple underwater image datasets [49]

3.3.7 Deep WaveNet

Underwater images typically suffer from low contrast and significant color distortions due to varying light attenuation as it travels through water. This phenomenon affects different colors asymmetrically, complicating image restoration tasks. Despite numerous attempts using deep learning for underwater image restoration (UIR), existing methods often overlook this asymmetry in network design [11].

This article introduces two novel contributions to address these challenges in UIR. Firstly, it proposes adapting receptive field sizes based on the wavelength-dependent attenuation of color channels, aiming for improved performance. Secondly, it incorporates an attentive skip mechanism to refine multi-contextual features effectively, enhancing model representational power while suppressing irrelevant features.

The proposed framework, Deep WaveNet, is optimized using pixel-wise and feature-based cost functions. Extensive experiments demonstrate its superiority over state-of-the-art methods on benchmark datasets. Furthermore, the study validates the enhanced images through various high-level vision tasks, such as underwater image semantic segmentation and diver’s 2D pose estimation [11].

Refer to caption
Figure 16: The proposed model aims to enhance underwater images and achieve super-resolution simultaneously. Section 3 details the integration of CBAM and Pixel-shuffle operations within the model . It accepts degraded underwater images as input and produces images that are enhanced both visually and spatially [11].

4 Dataset

4.1 UIEB

The Underwater Image Enhancement Benchmark (UIEB) dataset comprises 950 real-world underwater images, each with a size of 256×256256256256\times 256256 × 256 pixels. Among these, 890 images have corresponding reference images available for evaluation, while the remaining 60 images lack satisfactory reference images, presenting a challenge for analysis, showing in Table 1. This dataset serves as a crucial resource for conducting comprehensive studies on underwater image enhancement algorithms, enabling both qualitative and quantitative assessments of algorithm performance.

Table 1: Summary of Underwater Dataset UIEB [14]
Dataset Characteristics Details
Number of Real-world Images 950
Number of Images with Reference 890
Number of Challenging Images 60
Refer to caption
Figure 17: Sample images from the UIEB dataset, showcasing a diverse range of underwater scenes.

4.2 EUVP

4.2.1 Paired Dataset

Underwater Dark:

This dataset comprises 5550 pairs of images for training, each with a size of 256×256256256256\times 256256 × 256 pixels. Each pair consists of two images, one contains poor-quality or gray images, and the other contains enhanced or colored images. The filenames for each pair are identical. Additionally, 570 images are set aside for validation. In total, the dataset contains 11,670 images as shown in Table 2 .

Refer to caption
Figure 18: Examples of images from the EUVP_Underwater_Dark dataset

Underwater ImageNet:

The Underwater ImageNet dataset consists of 3700 pairs of images for training, each with a size of 256×256256256256\times 256256 × 256 pixels. Similar to the Underwater Dark dataset, one contains poor-quality images and the other contains enhanced or better-quality images. The filenames for corresponding pairs match. The dataset also includes 1270 images for validation, resulting in a total of 8670 images.

Refer to caption
Figure 19: Sample images from the EUVP_Underwater_Imagenet dataset

Underwater Scenes:

This dataset comprises 2185 pairs of images, each with a size of 320×240320240320\times 240320 × 240 pixels for training, with each pair containing a poor-quality image and a corresponding enhanced or better-quality image. The filenames for corresponding pairs are consistent. Additionally, 130 images are allocated for validation purposes. In total, the dataset encompasses 4500 images.

Refer to caption
Figure 20: Sample images from the EUVP_Underwater_Scenes dataset
Table 2: Summary of Underwater Datasets EUVP (paired data)  [13]
Dataset Name Training Pairs Validation Total Images
Underwater Dark 5550 pairs 570 11670
Underwater ImageNet 3700 pairs 1270 8670
Underwater Scenes 2185 pairs 130 4500

4.2.2 Unpaired Data

In the dataset for unpaired training, there are 3195 images representing poor quality images, while the set comprises 3140 images of enhanced or better quality. These images come in sizes of 960×540960540960\times 540960 × 540, 640×480640480640\times 480640 × 480, and 320×240320240320\times 240320 × 240 pixels. Additionally, there are 330 images allocated for validation purposes. These images are not paired, meaning that there is no one-to-one correspondence between the poor-quality and enhanced-quality images. This dataset arrangement allows for the training of models aimed at enhancing image quality without relying on direct paired examples as shown in Table 3.

Refer to caption
Figure 21: Sample images from the EUVP_Unpaired dataset
Table 3: Distribution of images in the dataset (Unpaired data)
Poor quality Good quality Validation Total Images
3195 3140 330 6665

5 Performance Evaluation

Underwater image quality assessment is a challenging task that is used to evaluate the quality of the image accurately and automatically. Image quality assessment (IQA) methods are employed to automatically evaluate the quality of images. IQA approaches are broadly classified into (a) objective and (b) subjective image quality assessment. Subjective image quality assessments are expensive and time-consuming and hence not suitable for real-time applications. Objective assessment techniques use statistical and mathematical models based on the human visual system (HVS) to automatically estimate image quality. Based on the availability of the original image, objective IQA methods can be classified into three categories (1) full reference IQA (FR) where the reference image is available, (2) reduced reference IQA (RR) where partial information of the reference image is available and (3) no reference IQA (NR) in which the reference image is not available. In addition to the standard performance evaluation parameters, to assess underwater image quality effectively, specialized metrics are proposed in the literature.

The performance of various underwater image enhancement and restoration techniques is analyzed using different qualitative and quantitative parameters. The qualitative evaluation involves the visual enhancement of the image by comparing histograms. The quantitative performance framework deals with various quality metric parameters which include:

  • Mean square error (MSE): MSE computes the cumulative squared error between the enhanced and the original image. The lower the MSE, the better the quality (low error) and is given as:

    MSE=1MNi=1Mj=1N[F(i,j)E(i,j)]2𝑀𝑆𝐸1𝑀𝑁superscriptsubscript𝑖1𝑀superscriptsubscript𝑗1𝑁superscriptdelimited-[]𝐹𝑖𝑗𝐸𝑖𝑗2MSE=\frac{1}{MN}\sum_{i=1}^{M}\sum_{j=1}^{N}\big{[}F(i,j)-E(i,j)\big{]}^{2}italic_M italic_S italic_E = divide start_ARG 1 end_ARG start_ARG italic_M italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ italic_F ( italic_i , italic_j ) - italic_E ( italic_i , italic_j ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (13)

    where F(i, j) is the original image, E(i, j) is the enhanced image, and M × N is image size.

  • Peak-signal-to-noise ratio (PSNR): : It is the measure of the peak error and computed as

    PSNR=20log10(MAXFMSE)𝑃𝑆𝑁𝑅20subscript10𝑀𝐴subscript𝑋𝐹𝑀𝑆𝐸PSNR=20\log_{10}\Big{(}\frac{MAX_{F}}{\sqrt{MSE}}\Big{)}italic_P italic_S italic_N italic_R = 20 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG italic_M italic_A italic_X start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_M italic_S italic_E end_ARG end_ARG ) (14)

    where maximum pixel value of the image is represented by MAXF𝑀𝐴subscript𝑋𝐹MAX_{F}italic_M italic_A italic_X start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and is 255 for gray level image.

  • Entropy : Entropy is a measure of information content present in the image and is given as:

    H(F)=i=0255pilog2pi𝐻𝐹superscriptsubscript𝑖0255subscript𝑝𝑖subscript2subscript𝑝𝑖H(F)=-\sum_{i=0}^{255}p_{i}\log_{2}p_{i}italic_H ( italic_F ) = - ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 255 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (15)

    where pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the probability of occurrence of intensity i at a pixel in image F

  • Structure similarity index measure (SSIM): SSIM measures the similarity between original image patches and enhanced patches at locations x and y from three aspects: brightness, contrast, and structure

    SSIM(F,E)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2)𝑆𝑆𝐼𝑀𝐹𝐸2subscript𝜇𝑥subscript𝜇𝑦subscript𝐶12subscript𝜎𝑥𝑦subscript𝐶2superscriptsubscript𝜇𝑥2superscriptsubscript𝜇𝑦2subscript𝐶1superscriptsubscript𝜎𝑥2superscriptsubscript𝜎𝑦2subscript𝐶2SSIM(F,E)=\frac{\big{(}2\mu_{x}\mu_{y}+C_{1}\big{)}\big{(}2\sigma_{xy}+C_{2}% \big{)}}{\big{(}\mu_{x}^{2}+\mu_{y}^{2}+C_{1}\big{)}\big{(}\sigma_{x}^{2}+% \sigma_{y}^{2}+C_{2}\big{)}}italic_S italic_S italic_I italic_M ( italic_F , italic_E ) = divide start_ARG ( 2 italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( 2 italic_σ start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG (16)

    where μx,μysubscript𝜇𝑥subscript𝜇𝑦\mu_{x},\mu_{y}italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are the mean values and σx,σysubscript𝜎𝑥subscript𝜎𝑦\sigma_{x},\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are the standard deviation values of the pixels in patch x and y respectively. σxysubscript𝜎𝑥𝑦\sigma_{xy}italic_σ start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT is the covariance of patches x and y and C1=(k1L)2𝐶1superscriptsubscript𝑘1𝐿2C1=(k_{1}L)^{2}italic_C 1 = ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and C2=(k2L)2𝐶2superscriptsubscript𝑘2𝐿2C2=(k_{2}L)^{2}italic_C 2 = ( italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are small constants to avoid instability while the denominator is close to zero. L is the dynamic range of pixel values, k1=0.01subscript𝑘10.01k_{1}=0.01italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.01 and k2subscript𝑘2k_{2}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.03. The higher the SSIM value, the smaller the distortion and the better the enhancement.

  • Colour enhancement factor (CEF): It helps in the representation of the effect of enhancement and is given as

    CEF=CM(I)~CM(I)CEF=\frac{CM(\tilde{I)}}{CM(I)}italic_C italic_E italic_F = divide start_ARG italic_C italic_M ( over~ start_ARG italic_I ) end_ARG end_ARG start_ARG italic_C italic_M ( italic_I ) end_ARG (17)

    CM(I)=σα2+σβ2+0.3μα2+μβ2𝐶𝑀𝐼superscriptsubscript𝜎𝛼2superscriptsubscript𝜎𝛽20.3superscriptsubscript𝜇𝛼2superscriptsubscript𝜇𝛽2CM(I)=\sqrt{\sigma_{\alpha}^{2}+\sigma_{\beta}^{2}}+0.3\sqrt{\mu_{\alpha}^{2}+% \mu_{\beta}^{2}}italic_C italic_M ( italic_I ) = square-root start_ARG italic_σ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 0.3 square-root start_ARG italic_μ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG where σα2,σβ2superscriptsubscript𝜎𝛼2superscriptsubscript𝜎𝛽2\sigma_{\alpha}^{2},\sigma_{\beta}^{2}italic_σ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT represent the standard deviations and μα2superscriptsubscript𝜇𝛼2\mu_{\alpha}^{2}italic_μ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and μβ2superscriptsubscript𝜇𝛽2\mu_{\beta}^{2}italic_μ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are the average values of α𝛼\alphaitalic_α and β𝛽\betaitalic_β respectively. CM(I~)𝐶𝑀~𝐼CM(\tilde{I})italic_C italic_M ( over~ start_ARG italic_I end_ARG ) is used to denote enhanced image and CM(I) the original image.

  • Contrast to noise ratio (CNR): This metric describes the amplitude of the signal relative to the surrounding noise in an image. CNR is computed by using

    CNR(I,I)=(μiμn)σn𝐶𝑁𝑅𝐼superscript𝐼subscript𝜇𝑖subscript𝜇𝑛subscript𝜎𝑛CNR(I,I^{\prime})=\frac{(\mu_{i}-\mu_{n})}{\sigma_{n}}italic_C italic_N italic_R ( italic_I , italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG (18)

    μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the mean value of original image and μnsubscript𝜇𝑛\mu_{n}italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is mean value of enhanced image and σnsubscript𝜎𝑛\sigma_{n}italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes the standard deviation.

  • Image enhancement metric (IEM): This metric gives information about the sharpness and the improvement in the contrast after the process of enhancement. It is computed as follows

    IEM=l=1k1m=1k2n=18|Ie,cm,lIe,nm,l|l=1k1m=1k2n=18|Io,cm,lIo,nm,l|𝐼𝐸𝑀superscriptsubscript𝑙1𝑘1superscriptsubscript𝑚1𝑘2superscriptsubscript𝑛18superscriptsubscript𝐼𝑒𝑐𝑚𝑙superscriptsubscript𝐼𝑒𝑛𝑚𝑙superscriptsubscript𝑙1𝑘1superscriptsubscript𝑚1𝑘2superscriptsubscript𝑛18superscriptsubscript𝐼𝑜𝑐𝑚𝑙superscriptsubscript𝐼𝑜𝑛𝑚𝑙IEM=\frac{\sum_{l=1}^{k1}\sum_{m=1}^{k2}\sum_{n=1}^{8}|I_{e,c}^{m,l}-I_{e,n}^{% m,l}|}{\sum_{l=1}^{k1}\sum_{m=1}^{k2}\sum_{n=1}^{8}|I_{o,c}^{m,l}-I_{o,n}^{m,l% }|}italic_I italic_E italic_M = divide start_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT | italic_I start_POSTSUBSCRIPT italic_e , italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT - italic_I start_POSTSUBSCRIPT italic_e , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT | end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT | italic_I start_POSTSUBSCRIPT italic_o , italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT - italic_I start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT | end_ARG (19)

    k1 and k2 denote the non-overlap** blocks. o and e represent the original and enhanced images respectively. The intensities of the centre pixel is denoted by Io,cm,l,Ie,cm,l,Ie,nm,l,Io,nm,lsuperscriptsubscript𝐼𝑜𝑐𝑚𝑙superscriptsubscript𝐼𝑒𝑐𝑚𝑙superscriptsubscript𝐼𝑒𝑛𝑚𝑙superscriptsubscript𝐼𝑜𝑛𝑚𝑙I_{o,c}^{m,l},I_{e,c}^{m,l},I_{e,n}^{m,l},I_{o,n}^{m,l}italic_I start_POSTSUBSCRIPT italic_o , italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT , italic_I start_POSTSUBSCRIPT italic_e , italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT , italic_I start_POSTSUBSCRIPT italic_e , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT , italic_I start_POSTSUBSCRIPT italic_o , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_l end_POSTSUPERSCRIPT are the intensities of the neighbours from the centre pixel.

  • Absolute mean brightness error(AMBE): AMBE helps to compute the brightness content that is preserved after the process of image enhancement. It is given as

    AMBE(o,e)=|μoμe|𝐴𝑀𝐵𝐸𝑜𝑒subscript𝜇𝑜subscript𝜇𝑒AMBE(o,e)=|\mu_{o}-\mu_{e}|italic_A italic_M italic_B italic_E ( italic_o , italic_e ) = | italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT | (20)

    where F(i, j) is the original image, E(i, j) is the enhanced image, and M × N is the image size, the equation represents the absolute difference between the mean of original μosubscript𝜇𝑜\mu_{o}italic_μ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and enhanced images μesubscript𝜇𝑒\mu_{e}italic_μ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT. Median values of the AMBE metric indicate good preservation of brightness

  • Spatial spectral entropy based quality index (SSEQ): SSEQ is a highly efficient no reference (NR) IQA model proposed by. SSEQ can assess the quality of an image that is distorted across various distortion categories. SSEQ can be calculated by

    E=ijPi,jlog2Pi,j𝐸subscript𝑖subscript𝑗subscript𝑃𝑖𝑗subscript2subscript𝑃𝑖𝑗E=-\sum_{i}\sum_{j}P_{i,j}\log_{2}P_{i,j}italic_E = - ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT (21)

    where P(i, j) is the spectral probability map given as

    P(i,j)=C(i,j)2ijC(i,j)2𝑃𝑖𝑗𝐶superscript𝑖𝑗2subscript𝑖subscript𝑗𝐶superscript𝑖𝑗2P(i,j)=\frac{C(i,j)^{2}}{\sum_{i}\sum_{j}C(i,j)^{2}}italic_P ( italic_i , italic_j ) = divide start_ARG italic_C ( italic_i , italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_C ( italic_i , italic_j ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (22)

    C is a coefficient matrix computed on (i,j) pixels.

  • Measure of enhancement (EME): EME calculates the contrast of the images and aids in the optimum selection of processing parameters. It is computed as:

    EMEm1m2=max(1m1m2l=1m1n=1m220logXmax;n,lωXmin;n,lω)𝐸𝑀subscript𝐸subscript𝑚1subscript𝑚2𝑚𝑎𝑥1subscript𝑚1subscript𝑚2superscriptsubscript𝑙1subscript𝑚1superscriptsubscript𝑛1subscript𝑚220𝑙𝑜𝑔superscriptsubscript𝑋𝑚𝑎𝑥𝑛𝑙𝜔superscriptsubscript𝑋𝑚𝑖𝑛𝑛𝑙𝜔EME_{m_{1}m_{2}}=max\Big{(}\frac{1}{m_{1}m_{2}}\sum_{l=1}^{m_{1}}\sum_{n=1}^{m% _{2}}20log\frac{X_{max;n,l}^{\omega}}{X_{min;n,l}^{\omega}}\Big{)}italic_E italic_M italic_E start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_m italic_a italic_x ( divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 20 italic_l italic_o italic_g divide start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_a italic_x ; italic_n , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_i italic_n ; italic_n , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ω end_POSTSUPERSCRIPT end_ARG ) (23)

    where Xmax;n,lωandXmin;n,lωsuperscriptsubscript𝑋𝑚𝑎𝑥𝑛𝑙𝜔𝑎𝑛𝑑superscriptsubscript𝑋𝑚𝑖𝑛𝑛𝑙𝜔X_{max;n,l}^{\omega}and{X_{min;n,l}^{\omega}}italic_X start_POSTSUBSCRIPT italic_m italic_a italic_x ; italic_n , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ω end_POSTSUPERSCRIPT italic_a italic_n italic_d italic_X start_POSTSUBSCRIPT italic_m italic_i italic_n ; italic_n , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ω end_POSTSUPERSCRIPT represent the maximum value and minimum value of the image within the block ωn,lsubscript𝜔𝑛𝑙\omega_{n,l}italic_ω start_POSTSUBSCRIPT italic_n , italic_l end_POSTSUBSCRIPT

  • Root mean square error (RMSE): RMSE is computed by calculating the square root of MSE. It is given as

    RMSE=1MNi=1Mj=1N[F(i,j)E(i,j)]2𝑅𝑀𝑆𝐸1𝑀𝑁superscriptsubscript𝑖1𝑀superscriptsubscript𝑗1𝑁superscriptdelimited-[]𝐹𝑖𝑗𝐸𝑖𝑗2RMSE=\sqrt{\frac{1}{MN}\sum_{i=1}^{M}\sum_{j=1}^{N}\big{[}F(i,j)-E(i,j)]^{2}}italic_R italic_M italic_S italic_E = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_M italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ italic_F ( italic_i , italic_j ) - italic_E ( italic_i , italic_j ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (24)
  • Measure of enhancement by entropy (EMEE): EMEE is computed by

    EMEEm1m2=max(1m1m2l=1m1n=1m2αXmax;n,lθαXmin;n,lθXmax;n,lθXmin;n,lθ)𝐸𝑀𝐸subscript𝐸subscript𝑚1subscript𝑚2𝑚𝑎𝑥1subscript𝑚1subscript𝑚2superscriptsubscript𝑙1subscript𝑚1superscriptsubscript𝑛1subscript𝑚2𝛼subscript𝑋𝑚𝑎𝑥𝑛𝑙superscript𝜃𝛼subscript𝑋𝑚𝑖𝑛𝑛𝑙𝜃subscript𝑋𝑚𝑎𝑥𝑛𝑙𝜃subscript𝑋𝑚𝑖𝑛𝑛𝑙𝜃EMEE_{m_{1}m_{2}}=max\Big{(}\frac{1}{m_{1}m_{2}}\sum_{l=1}^{m_{1}}\sum_{n=1}^{% m_{2}}\alpha\frac{X_{max;n,l}{\theta}^{\alpha}}{X_{min;n,l}{\theta}}\frac{X_{% max;n,l}{\theta}}{X_{min;n,l}{\theta}}\Big{)}italic_E italic_M italic_E italic_E start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_m italic_a italic_x ( divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_α divide start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_a italic_x ; italic_n , italic_l end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT end_ARG start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_i italic_n ; italic_n , italic_l end_POSTSUBSCRIPT italic_θ end_ARG divide start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_a italic_x ; italic_n , italic_l end_POSTSUBSCRIPT italic_θ end_ARG start_ARG italic_X start_POSTSUBSCRIPT italic_m italic_i italic_n ; italic_n , italic_l end_POSTSUBSCRIPT italic_θ end_ARG ) (25)

    Good image quality is indicated by the high value of EMEE. m1 and m2 represent the blocks in which the image is divided.

  • Underwater color image quality evaluation metric (UCIQE): UCIQE was specifically designed to quantify the effects of non-uniform color cast, low contrast and issues of blurring that affect underwater images. UCIQE for an image X in CIELab space is calculated as:

    UCIQE=c1σchroma+c2contrastl+c3μsaturation𝑈𝐶𝐼𝑄𝐸𝑐1subscript𝜎𝑐𝑟𝑜𝑚𝑎𝑐2𝑐𝑜𝑛𝑡𝑟𝑎𝑠subscript𝑡𝑙𝑐3subscript𝜇𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑖𝑜𝑛UCIQE=c1*\sigma_{chroma}+c2*{contrast_{l}}+c3*\mu_{saturation}italic_U italic_C italic_I italic_Q italic_E = italic_c 1 ∗ italic_σ start_POSTSUBSCRIPT italic_c italic_h italic_r italic_o italic_m italic_a end_POSTSUBSCRIPT + italic_c 2 ∗ italic_c italic_o italic_n italic_t italic_r italic_a italic_s italic_t start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + italic_c 3 ∗ italic_μ start_POSTSUBSCRIPT italic_s italic_a italic_t italic_u italic_r italic_a italic_t italic_i italic_o italic_n end_POSTSUBSCRIPT (26)

    where c1 c2 c3 represents the weighted coefficients, σchromasubscript𝜎𝑐𝑟𝑜𝑚𝑎\sigma_{chroma}italic_σ start_POSTSUBSCRIPT italic_c italic_h italic_r italic_o italic_m italic_a end_POSTSUBSCRIPT denotes the standard deviation, contrastl𝑐𝑜𝑛𝑡𝑟𝑎𝑠subscript𝑡𝑙contrast_{l}italic_c italic_o italic_n italic_t italic_r italic_a italic_s italic_t start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the contrast and the average value of saturation is denoted by μsaturationsubscript𝜇𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑖𝑜𝑛\mu_{s}aturationitalic_μ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_a italic_t italic_u italic_r italic_a italic_t italic_i italic_o italic_n. Higher values of UCIQE signify that the image has good equilibrium among chroma, contrast, and saturation.

  • Underwater Image Colorfulness Measure (UICM): Underwater images often exhibit a color-casting problem wherein colors are gradually attenuated based on their wavelength as the water depth increases. The color red, which has the shortest wavelength, disappears first, resulting in a bluish or greenish appearance of the images. In addition, inadequate lighting conditions can also lead to significant color de-saturation. To address this, an effective algorithm for enhancing underwater images must ensure good color rendition. The human visual system (HVS) captures colors in the opponent color plane, and hence, the chrominance RG and YB components, which are associated with the two opponent color planes, are utilized in the UICM technique as illustrated in the reference.

    RGRG𝑅𝐺𝑅𝐺RG-R-Gitalic_R italic_G - italic_R - italic_G (27)
    YB=R+G2B𝑌𝐵𝑅𝐺2𝐵YB=\frac{R+G}{2}-Bitalic_Y italic_B = divide start_ARG italic_R + italic_G end_ARG start_ARG 2 end_ARG - italic_B (28)

    Due to the heavy noise in underwater images, the traditional statistical values are not suitable for measuring their colorfulness. As a result, asymmetric alpha-trimmed statistical values are used instead. The mean can be expressed as:

    μα,RG=1KTαLTαRi=TαL+1KTαRintensityRG,isubscript𝜇𝛼𝑅𝐺1𝐾subscript𝑇𝛼𝐿subscript𝑇𝛼𝑅superscriptsubscript𝑖subscript𝑇𝛼𝐿1𝐾subscript𝑇𝛼𝑅𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡subscript𝑦𝑅𝐺𝑖\mu_{\alpha,RG}=\frac{1}{K-T_{\alpha L}-T_{\alpha R}}\sum_{i=T_{\alpha L+1}}^{% K-T_{\alpha R}}intensity_{RG,i}italic_μ start_POSTSUBSCRIPT italic_α , italic_R italic_G end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_K - italic_T start_POSTSUBSCRIPT italic_α italic_L end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT italic_α italic_R end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_T start_POSTSUBSCRIPT italic_α italic_L + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - italic_T start_POSTSUBSCRIPT italic_α italic_R end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_i italic_n italic_t italic_e italic_n italic_s italic_i italic_t italic_y start_POSTSUBSCRIPT italic_R italic_G , italic_i end_POSTSUBSCRIPT (29)

    The second-order statistic variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in:

    σα,RG2=1Np=1N(IntensityRG,pμα,RG)2subscriptsuperscript𝜎2𝛼𝑅𝐺1𝑁superscriptsubscript𝑝1𝑁superscript𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡subscript𝑦𝑅𝐺𝑝subscript𝜇𝛼𝑅𝐺2\sigma^{2}_{\alpha,RG}=\frac{1}{N}\sum_{p=1}^{N}(Intensity_{RG,p}-\mu_{\alpha,% RG})^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α , italic_R italic_G end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_I italic_n italic_t italic_e italic_n italic_s italic_i italic_t italic_y start_POSTSUBSCRIPT italic_R italic_G , italic_p end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_α , italic_R italic_G end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (30)

    The overall colorfulness metric used for measuring underwater image colorfulness is demonstrated in

    UICM=0.2868μα,RG2+μα,YB2+0.1586σα,RG2+σα,YB2𝑈𝐼𝐶𝑀0.2868subscriptsuperscript𝜇2𝛼𝑅𝐺subscriptsuperscript𝜇2𝛼𝑌𝐵0.1586subscriptsuperscript𝜎2𝛼𝑅𝐺subscriptsuperscript𝜎2𝛼𝑌𝐵UICM=-0.2868\sqrt{\mu^{2}_{\alpha,RG}+\mu^{2}_{\alpha,YB}}+0.1586\sqrt{\sigma^% {2}_{\alpha,RG}+\sigma^{2}_{\alpha,YB}}italic_U italic_I italic_C italic_M = - 0.2868 square-root start_ARG italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α , italic_R italic_G end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α , italic_Y italic_B end_POSTSUBSCRIPT end_ARG + 0.1586 square-root start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α , italic_R italic_G end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_α , italic_Y italic_B end_POSTSUBSCRIPT end_ARG (31)
  • Underwater Image Sharpness Measure (UISM): Sharpness pertains to the quality of preserving fine details and edges in an image. In underwater images, forward scattering often causes significant blurring, resulting in a loss of image sharpness. To quantify sharpness on edges, the Sobel edge detector is initially applied to each RGB color component, and the resulting edge map is multiplied with the original image to generate a grayscale edge map. This preserves only the pixels on the edges of the original underwater image. To measure the sharpness of these edges, the enhancement measure estimation (EME) method is suitable for images with uniform backgrounds and exhibits non-periodic patterns. Hence, EME is utilized to calculate the sharpness of edges. The UISM is:

    UISM=c=13λcEME(grayscaleedgec)𝑈𝐼𝑆𝑀superscriptsubscript𝑐13subscript𝜆𝑐𝐸𝑀𝐸𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒𝑒𝑑𝑔subscript𝑒𝑐UISM=\sum_{c=1}^{3}\lambda_{c}EME(grayscaleedge_{c})italic_U italic_I italic_S italic_M = ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_E italic_M italic_E ( italic_g italic_r italic_a italic_y italic_s italic_c italic_a italic_l italic_e italic_e italic_d italic_g italic_e start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) (32)
    EME=2k1k2l=1k1k=1k2logImax,k,lImin,k,l𝐸𝑀𝐸2𝑘1𝑘2superscriptsubscript𝑙1𝑘1superscriptsubscript𝑘1𝑘2subscript𝐼𝑚𝑎𝑥𝑘𝑙subscript𝐼𝑚𝑖𝑛𝑘𝑙EME=\frac{2}{k1k2}\sum_{l=1}^{k1}\sum_{k=1}^{k2}\log\frac{I_{max,k,l}}{I_{min,% k,l}}italic_E italic_M italic_E = divide start_ARG 2 end_ARG start_ARG italic_k 1 italic_k 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_k , italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_k , italic_l end_POSTSUBSCRIPT end_ARG (33)
  • Underwater Image Contrast Measure (UIConM): Studies have demonstrated a correlation between contrast and underwater visual capabilities, including stereoscopic acuity. In the case of underwater imagery, contrast deterioration is typically attributed to backward scattering. The intensity image is evaluated using the logAMEE measure to determine the contrast.

    UIConM=logAMEE(intensity)𝑈𝐼𝐶𝑜𝑛𝑀𝐴𝑀𝐸𝐸𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦UIConM=\log AMEE(intensity)italic_U italic_I italic_C italic_o italic_n italic_M = roman_log italic_A italic_M italic_E italic_E ( italic_i italic_n italic_t italic_e italic_n italic_s italic_i italic_t italic_y ) (34)

    The logAMEE in

    logAMEE=2k1k2l=1k1k=1k2Imax,k,l,Imin,k,lImax,k,lImin,k,llogImax,k,l,Imin,k,lImax,k,lImin,k,l𝑙𝑜𝑔𝐴𝑀𝐸𝐸2𝑘1𝑘2superscriptsubscript𝑙1𝑘1superscriptsubscript𝑘1𝑘2subscript𝐼𝑚𝑎𝑥𝑘𝑙symmetric-differencesubscript𝐼𝑚𝑖𝑛𝑘𝑙direct-sumsubscript𝐼𝑚𝑎𝑥𝑘𝑙subscript𝐼𝑚𝑖𝑛𝑘𝑙subscript𝐼𝑚𝑎𝑥𝑘𝑙symmetric-differencesubscript𝐼𝑚𝑖𝑛𝑘𝑙direct-sumsubscript𝐼𝑚𝑎𝑥𝑘𝑙subscript𝐼𝑚𝑖𝑛𝑘𝑙logAMEE=\frac{2}{k1k2}\sum_{l=1}^{k1}\sum_{k=1}^{k2}\frac{I_{max,k,l},\ominus I% _{min,k,l}}{I_{max,k,l}\oplus I_{min,k,l}}*\log\frac{I_{max,k,l},\ominus I_{% min,k,l}}{I_{max,k,l}\oplus I_{min,k,l}}italic_l italic_o italic_g italic_A italic_M italic_E italic_E = divide start_ARG 2 end_ARG start_ARG italic_k 1 italic_k 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k 2 end_POSTSUPERSCRIPT divide start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_k , italic_l end_POSTSUBSCRIPT , ⊖ italic_I start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_k , italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_k , italic_l end_POSTSUBSCRIPT ⊕ italic_I start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_k , italic_l end_POSTSUBSCRIPT end_ARG ∗ roman_log divide start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_k , italic_l end_POSTSUBSCRIPT , ⊖ italic_I start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_k , italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_k , italic_l end_POSTSUBSCRIPT ⊕ italic_I start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_k , italic_l end_POSTSUBSCRIPT end_ARG (35)
  • Underwater image quality measure (UIQM): UIQM is based on the human visual system model and works without a reference image. UIQM comprises three main measures, UICM the underwater image colorfulness measure, UISM the underwater image sharpness measure, and UIConM the underwater image contrast measure. UIQM is calculated as follows:

    UIQM=Coeff1UICM+Coeff2UISM+Coeff3UIConM𝑈𝐼𝑄𝑀𝐶𝑜𝑒𝑓subscript𝑓1𝑈𝐼𝐶𝑀𝐶𝑜𝑒𝑓subscript𝑓2𝑈𝐼𝑆𝑀𝐶𝑜𝑒𝑓subscript𝑓3𝑈𝐼𝐶𝑜𝑛𝑀UIQM=Coeff_{1}*UICM+Coeff_{2}*UISM+Coeff_{3}*UIConMitalic_U italic_I italic_Q italic_M = italic_C italic_o italic_e italic_f italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_U italic_I italic_C italic_M + italic_C italic_o italic_e italic_f italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_U italic_I italic_S italic_M + italic_C italic_o italic_e italic_f italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∗ italic_U italic_I italic_C italic_o italic_n italic_M (36)

    Higher values of UIQM indicate good levels of enhancement.

  • Colourfulness contrast fog density index (CCF): No-reference IQA method is proposed to predict underwater color image quality. using CCF metric. CCF metric is a weighted combination of colorfulness index, contrast index, and fog density index which is computed as,

    CCF=ω1Colorfulness+ω2Contrast+ω3Fogdensity𝐶𝐶𝐹subscript𝜔1𝐶𝑜𝑙𝑜𝑟𝑓𝑢𝑙𝑛𝑒𝑠𝑠subscript𝜔2𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡subscript𝜔3𝐹𝑜𝑔𝑑𝑒𝑛𝑠𝑖𝑡𝑦CCF=\omega_{1}*Colorfulness+\omega_{2}*Contrast+\omega_{3}*Fogdensityitalic_C italic_C italic_F = italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∗ italic_C italic_o italic_l italic_o italic_r italic_f italic_u italic_l italic_n italic_e italic_s italic_s + italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∗ italic_C italic_o italic_n italic_t italic_r italic_a italic_s italic_t + italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∗ italic_F italic_o italic_g italic_d italic_e italic_n italic_s italic_i italic_t italic_y (37)

    Colorfulness index due to absorption, blurring because of forward scattering and fog density due to backward scattering is examined in the CCF computation.

  • Average gradient (AG): Average gradient is a full reference metric that is used to define the sharpness of the given image. It represents the change in the rate of minute details present in the image. It is computed as,

    AG=1(L1)(M1)i=1L1j=1M1(xI(i,j))2+(yI(i,j))𝐴𝐺1𝐿1𝑀1superscriptsubscript𝑖1𝐿1superscriptsubscript𝑗1𝑀1superscriptsubscript𝑥𝐼𝑖𝑗2subscript𝑦𝐼𝑖𝑗AG=\frac{1}{(L-1)(M-1)}\sum_{i=1}^{L-1}\sum_{j=1}^{M-1}\sqrt{\big{(}\nabla_{x}% I(i,j)\big{)}^{2}+\sqrt{\big{(}\nabla_{y}I(i,j)\big{)}}}italic_A italic_G = divide start_ARG 1 end_ARG start_ARG ( italic_L - 1 ) ( italic_M - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M - 1 end_POSTSUPERSCRIPT square-root start_ARG ( ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_I ( italic_i , italic_j ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG ( ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_I ( italic_i , italic_j ) ) end_ARG end_ARG (38)

    where L and M denote the width and height of the image and xsubscript𝑥\nabla_{x}∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and ysubscript𝑦\nabla_{y}∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT represent the the gradient in the x and y directions respectively [2].

  • Patch based contrast quality index (PCQI): PCQI is defined as,

    PCQI(i,j)=1Pk=1Plr(ik,jk)ls(ik,jk)lt(ik,jk)𝑃𝐶𝑄𝐼𝑖𝑗1𝑃superscriptsubscript𝑘1𝑃subscript𝑙𝑟subscript𝑖𝑘subscript𝑗𝑘subscript𝑙𝑠subscript𝑖𝑘subscript𝑗𝑘subscript𝑙𝑡subscript𝑖𝑘subscript𝑗𝑘PCQI(i,j)=\frac{1}{P}\sum_{k=1}^{P}l_{r}(i_{k},j_{k})l_{s}(i_{k},j_{k})l_{t}(i% _{k},j_{k})italic_P italic_C italic_Q italic_I ( italic_i , italic_j ) = divide start_ARG 1 end_ARG start_ARG italic_P end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (39)

    where P is the number of patches present in the image and lrsubscript𝑙𝑟l_{r}italic_l start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, lssubscript𝑙𝑠l_{s}italic_l start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and ltsubscript𝑙𝑡l_{t}italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represent the comparison functions. Higher values of PCQI indicate good contrast.

In this section, conducted experiments on two datasets, UIEB and EUVP, to evaluate the performance of various underwater image enhancement methods in terms of both qualitative and quantitative metrics. The UIEB dataset comprises 890 real underwater images, while the EUVP dataset contains paired and unpaired compilations of underwater images. selected five images from each dataset for evaluation purposes. used several typical methods for underwater image enhancement, including AHE, CLAHE, ICM, UCM, Gray World, Wavelet fusion, and the Recursive adaptive histogram modification method.

6 Datasets: UIEB-D8 and EUVP-X-D8

This work has used the two standard datasets UIEB [14] and EUVP [13] that are available in the public domain and are widely used in UIE research. The UIEB dataset has 890 paired images where each pair consists of a good quality image along with a degraded one. EUVP dataset has both paired and unpaired images. EUVP has three different paired datasets – Underwater Dark, Underwater ImageNet and Underwater Scenes.

6.1 Formation of Datasets

To diversify the dataset, 8 different degradation techniques were applied to the ground truth images:

6.1.1 Illumination Degradation

Low illumination in images can result from various factors such as poor lighting conditions. Simulating low illumination is crucial for testing the robustness of image processing algorithms in real-world scenarios. The degradation is achieved by reducing the overall brightness of the image, mimicking the effect of dim lighting conditions. This reduction in brightness can lead to loss of details and visibility of objects in the image and variation of illumination shown in Fig 22.

The equation to simulate low illumination is as follows:

IID(x,y)=sb×I(x,y),sb(a,b)formulae-sequencesubscript𝐼𝐼𝐷𝑥𝑦subscript𝑠𝑏𝐼𝑥𝑦similar-tofor-allsubscript𝑠𝑏𝑎𝑏I_{ID}(x,y)=s_{b}\times I(x,y),\quad\forall s_{b}\sim\cup(a,b)italic_I start_POSTSUBSCRIPT italic_I italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT × italic_I ( italic_x , italic_y ) , ∀ italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∼ ∪ ( italic_a , italic_b ) (40)
Refer to caption
Refer to caption
Figure 22: Image exhibiting illumination degradation, characterized by low lighting and diminished clarity due to varying levels of illumination conditions

where IIDsubscript𝐼𝐼𝐷I_{ID}italic_I start_POSTSUBSCRIPT italic_I italic_D end_POSTSUBSCRIPT is the modified image after applying varying illumination. This factor determines the extent of brightness reduction. A lower value results in a darker image.

6.1.2 Contrast Degradation

High contrast simulates images with intense differences between light and dark areas. This effect is achieved by adjusting the pixel values to increase the contrast.The equation for increasing contrast is:

ICD(x,y)=α×I(x,y)+β,αU(a,b),β=mformulae-sequencesubscript𝐼𝐶𝐷𝑥𝑦𝛼𝐼𝑥𝑦𝛽formulae-sequencesimilar-tofor-all𝛼𝑈𝑎𝑏𝛽𝑚I_{CD}(x,y)=\alpha\times I(x,y)+\beta,\quad\forall\alpha\sim U(a,b),\quad\beta=mitalic_I start_POSTSUBSCRIPT italic_C italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_α × italic_I ( italic_x , italic_y ) + italic_β , ∀ italic_α ∼ italic_U ( italic_a , italic_b ) , italic_β = italic_m (41)

By multiplying the pixel values by alpha and adding beta, the contrast of the image is increased while also adjusting its brightness as shown in Fig 23. This results in an image ICD(x,y)subscript𝐼𝐶𝐷𝑥𝑦I_{CD}(x,y)italic_I start_POSTSUBSCRIPT italic_C italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) with intensified differences between light and dark areas, creating a high contrast effect.

Refer to caption
Refer to caption
Figure 23: Image showcasing high contrast degradations. The pronounced contrast creates sharp distinctions, intensifying visual impact while potentially causing loss of detail in certain areas

6.1.3 Hazy Degradation

The hazy effect simulates the presence of haze or fog in the image. It is achieved by adding a semi-transparent haze layer over the original image. The mathematical expression for applying the haze effect to the image is:

IDH(x,y)=(1γ)×I(x,y)+γ×γc(x,y),γU(a,b),γcU(l,m)formulae-sequencesubscript𝐼𝐷𝐻𝑥𝑦1𝛾𝐼𝑥𝑦𝛾subscript𝛾𝑐𝑥𝑦formulae-sequencesimilar-tofor-all𝛾𝑈𝑎𝑏similar-tofor-allsubscript𝛾𝑐𝑈𝑙𝑚I_{DH}(x,y)=(1-\gamma)\times I(x,y)+\gamma\times\gamma_{c}(x,y),\quad\forall% \gamma\sim U(a,b),\quad\forall\gamma_{c}\sim U(l,m)italic_I start_POSTSUBSCRIPT italic_D italic_H end_POSTSUBSCRIPT ( italic_x , italic_y ) = ( 1 - italic_γ ) × italic_I ( italic_x , italic_y ) + italic_γ × italic_γ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x , italic_y ) , ∀ italic_γ ∼ italic_U ( italic_a , italic_b ) , ∀ italic_γ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∼ italic_U ( italic_l , italic_m ) (42)

Here, γL(x,y)subscript𝛾𝐿𝑥𝑦\gamma_{L}(x,y)italic_γ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_x , italic_y ) creates a haze layer with the same dimensions as the degraded image, where each pixel is set to the randomly generated haze color.

Refer to caption
Refer to caption
Figure 24: Image showing hazy degradations, the scene appears obscured due to atmospheric haze, resulting in reduced visibility and loss of fine details.

Blend the original image and the haze layer using the formula above that generate IDHsubscript𝐼𝐷𝐻I_{DH}italic_I start_POSTSUBSCRIPT italic_D italic_H end_POSTSUBSCRIPT . The original image is multiplied by (1γ)1𝛾(1-\gamma)( 1 - italic_γ ) to reduce its intensity, and the haze layer is multiplied by γ𝛾\gammaitalic_γ to control the strength of the haze effect as shown in Fig 24. The resulting image represents the original scene with the added haze effect. This process mimics the visual appearance of images captured in hazy conditions, where distant objects appear less distinct due to scattering of light by haze particles in the atmosphere.

6.1.4 Blurry Degradation

The blurry effect is simulated using a Gaussian blur filter applied to the entire image. The equation for applying Gaussian blur to an image is as follows:

G(x,y)=12πσ2exp(x2+y22σ2)𝐺𝑥𝑦12𝜋superscript𝜎2superscript𝑥2superscript𝑦22superscript𝜎2G(x,y)=\frac{1}{2\pi\sigma^{2}}\exp\left(-\frac{x^{2}+y^{2}}{2\sigma^{2}}\right)italic_G ( italic_x , italic_y ) = divide start_ARG 1 end_ARG start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_exp ( - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (43)

where (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) are the coordinates in the kernel, and σ𝜎\sigmaitalic_σ is the standard deviation of the Gaussian distribution. The Gaussian kernel G(x,y)𝐺𝑥𝑦G(x,y)italic_G ( italic_x , italic_y ) is typically normalized so that the sum of all elements equals 1. The convolution operation between the input image I𝐼Iitalic_I and the Gaussian kernel G𝐺Gitalic_G is denoted by IG𝐼𝐺I*Gitalic_I ∗ italic_G. It’s defined as:

(IG)(x,y)=i=kkj=llI(i,j)G(xi,yj)𝐼𝐺𝑥𝑦superscriptsubscript𝑖𝑘𝑘superscriptsubscript𝑗𝑙𝑙𝐼𝑖𝑗𝐺𝑥𝑖𝑦𝑗(I*G)(x,y)=\sum_{i=-k}^{k}\sum_{j=-l}^{l}I(i,j)\cdot G(x-i,y-j)( italic_I ∗ italic_G ) ( italic_x , italic_y ) = ∑ start_POSTSUBSCRIPT italic_i = - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = - italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_I ( italic_i , italic_j ) ⋅ italic_G ( italic_x - italic_i , italic_y - italic_j ) (44)

where (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) are the coordinates of the output pixel, (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) are the coordinates of the input pixel, and I(i,j)𝐼𝑖𝑗I(i,j)italic_I ( italic_i , italic_j ) is the intensity of the input pixel. The convolution operation involves sliding the Gaussian kernel over each pixel of the input image and computing a weighted sum of pixel intensities in the neighborhood defined by the kernel as shown in Fig 25. IDB(x,y)subscript𝐼𝐷𝐵𝑥𝑦I_{DB}(x,y)italic_I start_POSTSUBSCRIPT italic_D italic_B end_POSTSUBSCRIPT ( italic_x , italic_y ) is blurred image and defined as:

IDB(x,y)=(IG)(x,y)subscript𝐼𝐷𝐵𝑥𝑦𝐼𝐺𝑥𝑦I_{DB}(x,y)=(I*G)(x,y)italic_I start_POSTSUBSCRIPT italic_D italic_B end_POSTSUBSCRIPT ( italic_x , italic_y ) = ( italic_I ∗ italic_G ) ( italic_x , italic_y ) (45)

The GaussianBlur function convolves the image with a G(x,y)𝐺𝑥𝑦G(x,y)italic_G ( italic_x , italic_y ) to compute the blurred result. The standard deviation of the G(x,y)𝐺𝑥𝑦G(x,y)italic_G ( italic_x , italic_y ) is implicitly determined by the kernel size. Overall, the Gaussian blur operation smooths out the sharp transitions between pixel values in the input image, resulting in a blurred version of the original image. The degree of blurring is controlled by the standard deviation σ𝜎\sigmaitalic_σ of the Gaussian kernel, with larger values of σ𝜎\sigmaitalic_σ resulting in more significant blurring.

Refer to caption
Refer to caption
Figure 25: Blurry Degradatiom

6.1.5 Noisy Degradation

It is modeled as Gaussian (normal) distribution and is added to the pixel values of the image to simulate the effect of random fluctuations in the image acquisition process or transmission. Mathematically, Gaussian noise 𝒩(x,μ,σ)𝒩𝑥𝜇𝜎\mathcal{N}(x,\mu,\sigma)caligraphic_N ( italic_x , italic_μ , italic_σ ) can be expressed as:

𝒩(x,y;μ,σ)=12πσ2e(x2+y2)22σ2𝒩𝑥𝑦𝜇𝜎12𝜋superscript𝜎2superscript𝑒superscriptsuperscript𝑥2superscript𝑦222superscript𝜎2\mathcal{N}(x,y;\mu,\sigma)=\frac{1}{2\pi\sigma^{2}}e^{-\frac{(x^{2}+y^{2})^{2% }}{2\sigma^{2}}}caligraphic_N ( italic_x , italic_y ; italic_μ , italic_σ ) = divide start_ARG 1 end_ARG start_ARG 2 italic_π italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG ( italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT (46)

x𝑥xitalic_x is the random variable representing the noise amplitude. μ𝜇\muitalic_μ is the mean (average) of the distribution, indicating the central tendency of the noise values. It’s typically set to 0 for zero-mean noise. σ𝜎\sigmaitalic_σ is the standard deviation of the distribution, which controls the spread or variability of the noise values around the mean. It determines the scale of the noise.

Additive Gaussian noise 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) is the pixel-wise Gaussian noise at coordinates (x,y), drawn from a Gaussian distribution with mean μ=0𝜇0\mu=0italic_μ = 0 and standard deviation σ=1𝜎1\sigma=1italic_σ = 1. The degraded image IND(x,y)subscript𝐼𝑁𝐷𝑥𝑦I_{ND}(x,y)italic_I start_POSTSUBSCRIPT italic_N italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) resulting from the addition of Gaussian noise to the original image, can be mathematically represented as: The function IND(x,y)subscript𝐼𝑁𝐷𝑥𝑦I_{ND}(x,y)italic_I start_POSTSUBSCRIPT italic_N italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) is defined as:

IND(x,y)={0if I(x,y)+N(x,y)<0I(x,y)+N(x,y)if 0I(x,y)+N(x,y)255255if I(x,y)+N(x,y)>255subscript𝐼𝑁𝐷𝑥𝑦cases0if 𝐼𝑥𝑦𝑁𝑥𝑦0𝐼𝑥𝑦𝑁𝑥𝑦if 0𝐼𝑥𝑦𝑁𝑥𝑦255255if 𝐼𝑥𝑦𝑁𝑥𝑦255I_{ND}(x,y)=\begin{cases}0&\text{if }I(x,y)+N(x,y)<0\\ I(x,y)+N(x,y)&\text{if }0\leq I(x,y)+N(x,y)\leq 255\\ 255&\text{if }I(x,y)+N(x,y)>255\end{cases}italic_I start_POSTSUBSCRIPT italic_N italic_D end_POSTSUBSCRIPT ( italic_x , italic_y ) = { start_ROW start_CELL 0 end_CELL start_CELL if italic_I ( italic_x , italic_y ) + italic_N ( italic_x , italic_y ) < 0 end_CELL end_ROW start_ROW start_CELL italic_I ( italic_x , italic_y ) + italic_N ( italic_x , italic_y ) end_CELL start_CELL if 0 ≤ italic_I ( italic_x , italic_y ) + italic_N ( italic_x , italic_y ) ≤ 255 end_CELL end_ROW start_ROW start_CELL 255 end_CELL start_CELL if italic_I ( italic_x , italic_y ) + italic_N ( italic_x , italic_y ) > 255 end_CELL end_ROW (47)

where N(x,y)𝒩(0,σ2)similar-to𝑁𝑥𝑦𝒩0superscript𝜎2N(x,y)\sim\mathcal{N}(0,\sigma^{2})italic_N ( italic_x , italic_y ) ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

This equation illustrates how additive Gaussian noise alters the pixel values of the original image, resulting in a degraded image with stochastic fluctuations that mimic real-world imaging artifacts. Adjusting the standard deviation parameter σ𝜎\sigmaitalic_σ controls the intensity and spread of the noise, influencing the perceptual quality of the degraded image as shown in Fig 26.

Refer to caption
Refer to caption
Figure 26: The image exhibits noisy degradation, characterized by the presence of unwanted random variations in pixel intensity.

6.1.6 Color Balance Degradation

Consider an original image I𝐼Iitalic_I represented as a three-dimensional array where each pixel contains intensity values for red, green, and blue channels.

[R11G11B11R12G12B12RmnGmnBmn]matrixsubscript𝑅11subscript𝐺11subscript𝐵11subscript𝑅12subscript𝐺12subscript𝐵12subscript𝑅𝑚𝑛subscript𝐺𝑚𝑛subscript𝐵𝑚𝑛\begin{bmatrix}R_{11}&G_{11}&B_{11}\\ R_{12}&G_{12}&B_{12}\\ \vdots&\vdots&\vdots\\ R_{mn}&G_{mn}&B_{mn}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_R start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_G start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_G start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_G start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]

where Rijsubscript𝑅𝑖𝑗R_{ij}italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT,Gijsubscript𝐺𝑖𝑗G_{ij}italic_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and Bijsubscript𝐵𝑖𝑗B_{ij}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represent the intensity values of red, green, and blue channels respectively at pixel (i,j), and m and n denote the dimensions of the image.

1. Reddish Tint: The reddish tint degradation replicates a deviation in color balance within the image, biasing the color distribution towards red tones. This deviation can occur due to several factors such as environmental lighting, white balance inaccuracies, or sensor characteristics. When a reddish tint afflicts an image, the prominence of red hues intensifies while the contributions of green and blue hues decrease. The blue and green channels are attenuated to induce a reddish tint, while the red channel remains unaltered.

[R1100R1200Rmn00]matrixsubscript𝑅1100subscript𝑅1200subscript𝑅𝑚𝑛00\begin{bmatrix}R_{11}&0&0\\ R_{12}&0&0\\ \vdots&\vdots&\vdots\\ R_{mn}&0&0\end{bmatrix}[ start_ARG start_ROW start_CELL italic_R start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ]

Rijsubscript𝑅𝑖𝑗R_{ij}italic_R start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents the intensity value of the blue channel at pixel (i,j) and the Ireddish(x,y,0)subscript𝐼reddish𝑥𝑦0I_{\text{reddish}}(x,y,0)italic_I start_POSTSUBSCRIPT reddish end_POSTSUBSCRIPT ( italic_x , italic_y , 0 ) as:

Ireddish(x,y,0)={0,if I(x,y,0)×(1Factor)+255×Factor<0I(x,y,0)×(1Factor)+255×Factor,if 0I(x,y,0)×(1Factor)+255×Factor255255,if I(x,y,0)×(1Factor)+255×Factor>255subscript𝐼reddish𝑥𝑦0cases0if 𝐼𝑥𝑦01Factor255Factor0𝐼𝑥𝑦01Factor255Factorif 0𝐼𝑥𝑦01Factor255Factor255255if 𝐼𝑥𝑦01Factor255Factor255I_{\text{reddish}}(x,y,0)=\begin{cases}0,&\text{if }I(x,y,0)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,0)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 0)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,0)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}italic_I start_POSTSUBSCRIPT reddish end_POSTSUBSCRIPT ( italic_x , italic_y , 0 ) = { start_ROW start_CELL 0 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 0 ) × ( 1 - Factor ) + 255 × Factor < 0 end_CELL end_ROW start_ROW start_CELL italic_I ( italic_x , italic_y , 0 ) × ( 1 - Factor ) + 255 × Factor , end_CELL start_CELL if 0 ≤ italic_I ( italic_x , italic_y , 0 ) × ( 1 - Factor ) + 255 × Factor ≤ 255 end_CELL end_ROW start_ROW start_CELL 255 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 0 ) × ( 1 - Factor ) + 255 × Factor > 255 end_CELL end_ROW (48)
Factor(a,b);similar-tofor-allFactor𝑎𝑏\displaystyle\forall\,\text{Factor}\sim\cup(a,b);∀ Factor ∼ ∪ ( italic_a , italic_b ) ;
Ireddish(x,y,1)=I(x,y,1);subscript𝐼reddish𝑥𝑦1𝐼𝑥𝑦1\displaystyle I_{\text{reddish}}(x,y,1)=I(x,y,1);italic_I start_POSTSUBSCRIPT reddish end_POSTSUBSCRIPT ( italic_x , italic_y , 1 ) = italic_I ( italic_x , italic_y , 1 ) ;
Ireddish(x,y,2)=I(x,y,2);subscript𝐼reddish𝑥𝑦2𝐼𝑥𝑦2\displaystyle I_{\text{reddish}}(x,y,2)=I(x,y,2);italic_I start_POSTSUBSCRIPT reddish end_POSTSUBSCRIPT ( italic_x , italic_y , 2 ) = italic_I ( italic_x , italic_y , 2 ) ;
Refer to caption
Figure 27: Image displaying a reddish tint, the overall color tone of the scene is tinged with red, imparting a warm or rosy hue to the entire image with different variations of factor.

2. Greenish Tint: The greenish tint degradation emulates an imbalance in color distribution within the image, favoring green hues. This effect can arise from various factors such as environmental lighting conditions, inaccuracies in white balance, or characteristics of the imaging sensor. When an image is affected by a greenish tint, the intensity of green hues is accentuated while the contributions of red and blue hues diminish. To introduce a greenish tint, the red and blue color channels are suppressed, while the green channel remains unaffected and Igreenish(x,y,1)subscript𝐼greenish𝑥𝑦1I_{\text{greenish}}(x,y,1)italic_I start_POSTSUBSCRIPT greenish end_POSTSUBSCRIPT ( italic_x , italic_y , 1 ) is defined as:

[0G1100G1200Gmn0]matrix0subscript𝐺1100subscript𝐺1200subscript𝐺𝑚𝑛0\begin{bmatrix}0&G_{11}&0\\ 0&G_{12}&0\\ \vdots&\vdots&\vdots\\ 0&G_{mn}&0\end{bmatrix}[ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_G start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_G start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_G start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ]

Gijsubscript𝐺𝑖𝑗G_{ij}italic_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents the intensity value of the blue channel at pixel (i,j).

Igreenish(x,y,1)={0,if I(x,y,1)×(1Factor)+255×Factor<0I(x,y,1)×(1Factor)+255×Factor,if 0I(x,y,1)×(1Factor)+255×Factor255255,if I(x,y,1)×(1Factor)+255×Factor>255subscript𝐼greenish𝑥𝑦1cases0if 𝐼𝑥𝑦11Factor255Factor0𝐼𝑥𝑦11Factor255Factorif 0𝐼𝑥𝑦11Factor255Factor255255if 𝐼𝑥𝑦11Factor255Factor255I_{\text{greenish}}(x,y,1)=\begin{cases}0,&\text{if }I(x,y,1)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,1)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 1)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,1)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}italic_I start_POSTSUBSCRIPT greenish end_POSTSUBSCRIPT ( italic_x , italic_y , 1 ) = { start_ROW start_CELL 0 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 1 ) × ( 1 - Factor ) + 255 × Factor < 0 end_CELL end_ROW start_ROW start_CELL italic_I ( italic_x , italic_y , 1 ) × ( 1 - Factor ) + 255 × Factor , end_CELL start_CELL if 0 ≤ italic_I ( italic_x , italic_y , 1 ) × ( 1 - Factor ) + 255 × Factor ≤ 255 end_CELL end_ROW start_ROW start_CELL 255 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 1 ) × ( 1 - Factor ) + 255 × Factor > 255 end_CELL end_ROW (49)
Factor(a,b);similar-tofor-allFactor𝑎𝑏\displaystyle\forall\,\text{Factor}\sim\cup(a,b);∀ Factor ∼ ∪ ( italic_a , italic_b ) ;
Igreenish(x,y,0)=I(x,y,0);subscript𝐼greenish𝑥𝑦0𝐼𝑥𝑦0\displaystyle I_{\text{greenish}}(x,y,0)=I(x,y,0);italic_I start_POSTSUBSCRIPT greenish end_POSTSUBSCRIPT ( italic_x , italic_y , 0 ) = italic_I ( italic_x , italic_y , 0 ) ;
Igreenish(x,y,2)=I(x,y,2);subscript𝐼greenish𝑥𝑦2𝐼𝑥𝑦2\displaystyle I_{\text{greenish}}(x,y,2)=I(x,y,2);italic_I start_POSTSUBSCRIPT greenish end_POSTSUBSCRIPT ( italic_x , italic_y , 2 ) = italic_I ( italic_x , italic_y , 2 ) ;
Refer to caption
Figure 28: The image exhibits a greenish tint, imparting a subtle green hue to the overall color palette with varying Igreenish(x,y,1)subscript𝐼greenish𝑥𝑦1I_{\text{greenish}}(x,y,1)italic_I start_POSTSUBSCRIPT greenish end_POSTSUBSCRIPT ( italic_x , italic_y , 1 ).

3. Bluish Tint : The bluish tint degradation emulates a color imbalance in the image, skewing the color distribution towards blue hues. This phenomenon can occur due to various factors such as lighting conditions, white balance inaccuracies, or sensor characteristics. When an image is affected by a bluish tint, the intensity of blue color dominance increases while the contribution of red and green colors diminishes. To introduce a bluish tint, the red and green color channels are attenuated, while the blue channel remains unchanged.

[00B1100B1200Bmn]matrix00subscript𝐵1100subscript𝐵1200subscript𝐵𝑚𝑛\begin{bmatrix}0&0&B_{11}\\ 0&0&B_{12}\\ \vdots&\vdots&\vdots\\ 0&0&B_{mn}\end{bmatrix}[ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]

Bijsubscript𝐵𝑖𝑗B_{ij}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents the intensity value of the blue channel at pixel (i,j).

Ibluish(x,y,2)={0,if I(x,y,2)×(1Factor)+255×Factor<0I(x,y,2)×(1Factor)+255×Factor,if 0I(x,y,2)×(1Factor)+255×Factor255255,if I(x,y,2)×(1Factor)+255×Factor>255subscript𝐼bluish𝑥𝑦2cases0if 𝐼𝑥𝑦21Factor255Factor0𝐼𝑥𝑦21Factor255Factorif 0𝐼𝑥𝑦21Factor255Factor255255if 𝐼𝑥𝑦21Factor255Factor255I_{\text{bluish}}(x,y,2)=\begin{cases}0,&\text{if }I(x,y,2)\times(1-\text{% Factor})+255\times\text{Factor}<0\\ I(x,y,2)\times(1-\text{Factor})+255\times\text{Factor},&\text{if }0\leq I(x,y,% 2)\times(1-\text{Factor})+255\times\text{Factor}\leq 255\\ 255,&\text{if }I(x,y,2)\times(1-\text{Factor})+255\times\text{Factor}>255\end{cases}italic_I start_POSTSUBSCRIPT bluish end_POSTSUBSCRIPT ( italic_x , italic_y , 2 ) = { start_ROW start_CELL 0 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 2 ) × ( 1 - Factor ) + 255 × Factor < 0 end_CELL end_ROW start_ROW start_CELL italic_I ( italic_x , italic_y , 2 ) × ( 1 - Factor ) + 255 × Factor , end_CELL start_CELL if 0 ≤ italic_I ( italic_x , italic_y , 2 ) × ( 1 - Factor ) + 255 × Factor ≤ 255 end_CELL end_ROW start_ROW start_CELL 255 , end_CELL start_CELL if italic_I ( italic_x , italic_y , 2 ) × ( 1 - Factor ) + 255 × Factor > 255 end_CELL end_ROW (50)
Factor(a,b);similar-tofor-allFactor𝑎𝑏\displaystyle\forall\,\text{Factor}\sim\cup(a,b);∀ Factor ∼ ∪ ( italic_a , italic_b ) ;
Ibluish(x,y,0)=I(x,y,0);subscript𝐼bluish𝑥𝑦0𝐼𝑥𝑦0\displaystyle I_{\text{bluish}}(x,y,0)=I(x,y,0);italic_I start_POSTSUBSCRIPT bluish end_POSTSUBSCRIPT ( italic_x , italic_y , 0 ) = italic_I ( italic_x , italic_y , 0 ) ;
Ibluish(x,y,1)=I(x,y,1);subscript𝐼bluish𝑥𝑦1𝐼𝑥𝑦1\displaystyle I_{\text{bluish}}(x,y,1)=I(x,y,1);italic_I start_POSTSUBSCRIPT bluish end_POSTSUBSCRIPT ( italic_x , italic_y , 1 ) = italic_I ( italic_x , italic_y , 1 ) ;
Refer to caption
Figure 29: Image displaying a bluish tint, noticeable hues of blue saturate the image.

Here, ICBR,G,Bsubscript𝐼𝐶subscript𝐵𝑅𝐺𝐵I_{CB_{R,G,B}}italic_I start_POSTSUBSCRIPT italic_C italic_B start_POSTSUBSCRIPT italic_R , italic_G , italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT is modified image with an amplified Red, blue, green hue. This process increases the intensity of Red, blue, green hues across the image, effectively introducing a bluish cast.

6.2 Dataset Distribution:

The tables provide statistics for your datasets, UIEB-D8 and EUVP-X-D8, detailing the distribution of images with eight types of degradation across different subcategories.

In the first table, the columns represent the datasets (UIEB and various EUVP subsets) and their respective total image counts. Each type of image degradation—Illumination, Contrast, Hazy, Blurry, Noisy, Reddish/Greenish/Bluish—is split into three subcategories (a, b, and c) for different type of degradation. The total number of images in each dataset is listed at the end.

For the UIEB dataset, which contains 890 images, the images are evenly distributed among the three subcategories (a, b, c) for each degradation type, with approximately 296-298 images per subcategory. The total count for all images across all degradation types is 5340.

The EUVP_P (U_Dark) dataset consists of 3138 images, with each subcategory within the degradation types having exactly 1046 images, leading to a total image count of 18,828.

The EUVP_P (U_ImageNet) dataset contains 3700 images, with each subcategory for the degradation types having exactly 1233-1234 images, making the total number of images 22,200.

The EUVP_P (U_Scenes) dataset has 2185 images, with each subcategory for the degradation types having exactly 728-729 images, resulting in a total of 13,110 images.

The EUVP_Un dataset includes 3140 images, with each subcategory for the degradation types having 1046-1048 images, totaling 18,840 images.

The second table summarizes the overall counts of referenced and degraded images: there are 13,053 referenced images and 78,318 degraded images. This data shows how images are categorized and distributed among different degradation types and their subcategories within your datasets, hel** to understand the distribution and quantity of images available for each type of degradation.

Table 4: Dataset statistics for various types of image degradation
Dataset Illumination Contrast Hazy Blurry Noisy Reddish/ Greenish/ Bluish Total
a b c a b c a b c a b c a b c a b c
UIEB (890) 296 296 298 296 296 298 296 296 298 296 296 298 296 296 298 296 296 298 7120
EUVP_P (U_Dark) (3138) 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 1046 25,104
EUVP_P (U_ImageNet) (3700) 1233 1233 1234 1233 1233 1234 1233 1233 1234 1233 1233 1234 1233 1233 1234 1233 1233 1234 29,600
EUVP_P (U_Scenes) (2185) 728 728 729 728 728 729 728 728 729 728 728 729 728 728 729 728 728 729 17,480
EUVP_Un (3140) 1046 1046 1048 1046 1046 1048 1046 1046 1048 1046 1046 1048 1046 1046 1048 1046 1046 1048 25,120

This table summarizes the overall counts of referenced and degraded images as shown in Table 5 :

Table 5: Total Referenced and Degraded Images
Category Count
Total Referenced Image 13,053
Total Degraded Image 1,04,424

7 Methodology

The proposed Iterative framework for Degradation Aware Underwater Image Enhancement (IDA-UIE) progressively enhances the input image. In each iteration, a degradation classifier network 𝚽DCsubscript𝚽𝐷𝐶\mathbf{\Phi}_{DC}bold_Φ start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT identifies the dominant degradation condition in the image. Being degradation aware, helps in choosing the corresponding deep network for enhancing the image to remove the effect of degradation. The removal of the present dominant degradation might reveal the efect of another degradation. Thus, the output image is again processed in the next iteration for further enhancement. The process can be continued till (a) the classifier flags the absence of any degradation or (b) a maximum number of iterations are completed. This work uses the second criterion to limit the maximum number of floating point operations. Here, IDA-UIE is operated with a maximum of 3 iterations. The sub-networks used for degradation classification and sub-sequent enhancement are described next.

7.1 Design of Degradation Classification and Enhancement Networks

Degradation Classification Network – This network 𝚽DCsubscript𝚽𝐷𝐶\mathbf{\Phi}_{DC}bold_Φ start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT identifies the category (one of eight classes) of dominant degradation in an underwater image. Additionally, it recognizes the absence of degradation. Thus, it is a 8+1=98198+1=98 + 1 = 9 categtory classifier. It is trained on the UIEB-D8 and EUVP-D8 datasets. The custom neural network in Fig 32 using PyTorch for image classification with a default of 9 classes. The network starts with an initial 3×3333\times 33 × 3 convolution layer, followed by two parallel paths: a 1×1111\times 11 × 1 convolution and a 3×3333\times 33 × 3 convolution. The outputs of these paths are concatenated and passed through another 1×1111\times 11 × 1 convolution layer. This result is added to the initial convolution output, similar to a residual connection.

Next, the network uses weighted average pooling to reduce the feature maps to a 1x11𝑥11x11 italic_x 1 spatial dimension. The pooled features are then flattened and passed through a fully connected layer to produce the final classification output. This architecture combines convolutional layers, parallel processing paths, and pooling to effectively extract and classify features from the input image. The network architecture has a cascade of two modules in Fig 31, each containing 1×1111\times 11 × 1 and 3×3333\times 33 × 3 convolution kernels in parallel with residual connections. The convolution layer output is flattened and processed by fully connected layers for the final classification. A Winner-Take-All strategy is applied to select the dominant degradation. Accordingly, a suitable deep network is selected for image enhancement. This is an iterative process which checks for different degradations. If no degradation is detected, iteration stops. The network is shown in Figure 31 and Table 7.

Refer to caption
Figure 30: Degradation Classifier Architecture through Ablation Study
Degradation Type F1_Score
No Degradation 0.824
Bluish 0.757
Blurry 0.869
Contrast 0.802
Greenish 0.924
Hazy 0.816
Illumination 0.820
Noisy 0.843
Reddish 0.826
GFLOPs 1.7448
Number of Parameters 0.0280
Test Accuracy 80.14%
Table 6: F1 Score for different degradation types as described in Ablation.
Refer to caption
Figure 31: A block diagram of the proposed IDA-UIE model includes a classifier and enhancement model, incorporating degradation selection and enhancement model selection modules. It outputs an enhanced image along with performance metrics.
Degradation Type F1 Score
No Degradation 0.9350
Bluish 0.9907
Blurry 0.9898
Contrast 0.9049
Greenish 0.9980
Hazy 0.9554
Illumination 0.9936
Noisy 0.9929
Reddish 0.9827
GFLOPs 15.1666
Number of Parameters 0.2250 M
Test Accuracy 97.63%
Table 7: F1 Scores for different degradation types of the proposed IDA-UIE model

7.1.1 Ablation Study

Training of Models:

In the first ablation study (Figure 32), we designed a network constructed using a single convolutional layer, represented as CL(3×3×3@128;1,1)𝐶𝐿333@12811CL(3\times 3\times 3@128;1,1)italic_C italic_L ( 3 × 3 × 3 @ 128 ; 1 , 1 ). This layer aims to extract and assimilate information from the input image. A LeakyReLU activation function follows the initial convolution operation, enhancing the network’s ability to capture non-linear features. Here, CL(m×n×k@q;s,p)𝐶𝐿𝑚𝑛𝑘@𝑞𝑠𝑝CL(m\times n\times k@q;s,p)italic_C italic_L ( italic_m × italic_n × italic_k @ italic_q ; italic_s , italic_p ) refers to q𝑞qitalic_q number of m×n×k𝑚𝑛𝑘m\times n\times kitalic_m × italic_n × italic_k convolution kernels with stride s𝑠sitalic_s and padding p𝑝pitalic_p, followed by LeakyReLU. Following this, we employed another layer, CS(128×128×3@3;1,1)𝐶𝑆1281283@311CS(128\times 128\times 3@3;1,1)italic_C italic_S ( 128 × 128 × 3 @ 3 ; 1 , 1 ), which involves a convolution followed by a Sigmoid activation to produce the final output. Despite these efforts, the results shown in Table 8 indicate that this architecture did not yield satisfactory performance.

For the second ablation study (Figure 33), we expanded the network by incorporating additional convolutional layers, specifically CL(3×3×3@128;1,1)𝐶𝐿333@12811CL(3\times 3\times 3@128;1,1)italic_C italic_L ( 3 × 3 × 3 @ 128 ; 1 , 1 ) and CL(128×128×3@256;1,1)𝐶𝐿1281283@25611CL(128\times 128\times 3@256;1,1)italic_C italic_L ( 128 × 128 × 3 @ 256 ; 1 , 1 ). These layers were designed to further extract and assimilate information from the input image. As before, a LeakyReLU activation function was applied after each convolution operation to enhance feature capture. Subsequently, we added a layer CS(256×256×3@3;1,1)𝐶𝑆2562563@311CS(256\times 256\times 3@3;1,1)italic_C italic_S ( 256 × 256 × 3 @ 3 ; 1 , 1 ), involving a convolution followed by a Sigmoid activation. However, despite these modifications, the results presented in Table LABEL:tab:2nd_Ablation still showed that the architecture did not achieve satisfactory performance.

In the third ablation study (Figure 34), we explored the use of fully connected layers (FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L) encapsulated with LeakyReLU activations. The output from this setup was connected to a fully connected layer with Sigmoid activation (FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S) and then unflattened to reconstruct the enhanced image matching the original image dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). While fully connected layers are capable of modeling complex, non-linear transformations necessary to enhance various degradations, the results in Table 10 indicated that this approach also did not yield satisfactory performance.

In the fourth ablation study (Figure 35), we designed a deep network 𝚽ICsubscript𝚽𝐼𝐶\mathbf{\Phi}_{IC}bold_Φ start_POSTSUBSCRIPT italic_I italic_C end_POSTSUBSCRIPT specifically for enhancing images with low illumination. This network is constructed through a cascade of two fully connected layers (FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L) encapsulated with LeakyReLU. The output from this setup is connected to a fully connected layer with Sigmoid activation (FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S) and then unflattened to match the original image dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). Low illumination correction often involves adjusting the overall brightness and contrast of the image, which can be efficiently learned by fully connected layers as they consider all pixel values at once. Fully connected layers can model complex, non-linear transformations that might be needed to enhance the illumination of the entire image, especially when the correction requires considering the entire image context. This approach has shown good results in illumination correction, as evidenced in Table 11.

The illumination-specific network demonstrated promising results, guiding the development of more advanced network structures that integrate the strengths of fully connected layers.

Refer to caption
Figure 32: Model 1: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.01 24.97 dB
Hazy 2.06 10.85 dB
Blurry 0.23 23.85 dB
Noisy 0.71 18.36 dB
Contrast 0.501 21.86 dB
Color Balance 0.005 29.78 dB
GFLOPs: 0.4530 Number of Parameters: 0.0070 M
Table 8: A table of the enhancement model designed to address degradations through ablation study based on Model 1.
Refer to caption
Figure 33: Model 2: Evolving Network Architecture through Ablation Study.
Degradation Type MSE PSNR
Illumination 0.06 22.57 dB
Hazy 2.81 10.95 dB
Blurry 0.65 20.85 dB
Noisy 0.35 22.53 dB
Contrast 0.767 21.51 dB
Color Balance 0.01 25.98 dB
GFLOPs: 20.0068 Number of Parameters: 0.3057 M
Table 9: A table of the enhancement model designed to address degradations through ablation study based on Model 2.
Refer to caption
Figure 34: Model 3: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 3.09 14.07 dB
Hazy 3.90 14.08 dB
Blurry 3.90 14.08 dB
Noisy 3.91 14.07 dB
Contrast 3.91 14.07 dB
Color Balance 3.91 14.07 dB
GFLOPs 0.0503 Number of Parameters: 50.52 M
Table 10: A table of the enhancement model designed to address degradation through ablation study based on Model 3.
Refer to caption
Figure 35: Model 4: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Color Balance 0.010 17.71 dB
Hazy 0.017 17.08 dB
Blurry 0.010 18.62 dB
Noisy 0.010 17.71 dB
Contrast 0.014 18.28 dB
Illumination 5.41e-06 52.66 dB
GFLOPs: 0.0503 Number of Parameters: 50.54 M
Table 11: A table of the enhancement model designed to address degradation through ablation study based on Model 4.

In the fifth ablation study, we constructed a network (Figure 37) using a convolutional layer represented as CL(3×3×3@64;1,1)𝐶𝐿333@6411CL(3\times 3\times 3@64;1,1)italic_C italic_L ( 3 × 3 × 3 @ 64 ; 1 , 1 ). This layer aims to extract and assimilate information from the input image. Following the convolution operation, a LeakyReLU activation function is applied. Subsequently, a transposed convolutional layer, denoted as CTS(64×64×3@3;1,1)𝐶𝑇𝑆64643@311CTS(64\times 64\times 3@3;1,1)italic_C italic_T italic_S ( 64 × 64 × 3 @ 3 ; 1 , 1 ), is employed, which includes a Sigmoid activation. Despite these efforts, the results shown in Table 12 indicate that this architecture did not yield satisfactory performance.

In the sixth ablation study, we further refined the network 𝚽CBRGBsubscript𝚽𝐶subscript𝐵𝑅𝐺𝐵\mathbf{\Phi}_{CB_{RGB}}bold_Φ start_POSTSUBSCRIPT italic_C italic_B start_POSTSUBSCRIPT italic_R italic_G italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT (Figure 37) to address color imbalances in images. This network is constructed through a cascade of convolutional layers: CL(3×3×3@64;1,1)𝐶𝐿333@6411CL(3\times 3\times 3@64;1,1)italic_C italic_L ( 3 × 3 × 3 @ 64 ; 1 , 1 ) and CL(3×3×64@64;1,1)𝐶𝐿3364@6411CL(3\times 3\times 64@64;1,1)italic_C italic_L ( 3 × 3 × 64 @ 64 ; 1 , 1 ). These layers aim to extract and assimilate information from the input image, with a LeakyReLU activation function applied after each convolution operation. Following this, a transposed convolutional layer, CTS(64×64×3@3;1,1)𝐶𝑇𝑆64643@311CTS(64\times 64\times 3@3;1,1)italic_C italic_T italic_S ( 64 × 64 × 3 @ 3 ; 1 , 1 ), is employed, incorporating a Sigmoid activation as shown in Table 13.

The 𝚽CBRGBsubscript𝚽𝐶subscript𝐵𝑅𝐺𝐵\mathbf{\Phi}_{CB_{RGB}}bold_Φ start_POSTSUBSCRIPT italic_C italic_B start_POSTSUBSCRIPT italic_R italic_G italic_B end_POSTSUBSCRIPT end_POSTSUBSCRIPT network is particularly well-suited for color correction tasks due to its simple yet effective architecture. It leverages convolutional operations to capture local color relationships and uses non-linear activations to learn complex color map**s. The transposed convolutional layer helps maintain the original image resolution. This end-to-end learning approach makes the network adaptable to various color correction challenges, demonstrating its potential in addressing color imbalances effectively.

Through these ablation studies, we observed the varying efficacy of different architectures in tackling specific image enhancement challenges. While some architectures did not perform satisfactorily, the insights gained guided the refinement and development of more advanced network structures capable of addressing complex underwater image enhancement tasks.

Refer to caption
Figure 36: Model 5: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.04 24.57 dB
Hazy 2.06 10.85 dB
Blurry 0.82 20.85 dB
Noisy 0.23 22.36 dB
Contrast 0.567 21.64 dB
Color Balance 0.06 23.28 dB
GFLOPs 0.4530 Number of Parameters: 0.0070 M
Table 12: A table of the enhancement model designed to address degradation through ablation study based on Model 5.
Refer to caption
Figure 37: Model 6: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.05 24.27 dB
Hazy 1.02 17.08 dB
Blurry 0.01 22.62 dB
Noisy 0.019 22.35 dB
Contrast 0.507 19.46 dB
Color Balance 0.000035 36.45 dB
GFLOPs 0.166 Number of Parameters: 0.4045 M
Table 13: A table of the enhancement model designed to address degradation through ablation study based on Model 6.

In the seventh ablation study, the deep network, as shown in Figure 38, utilizes an encoder-decoder architecture. The encoder includes convolutional layers CL(3×3×3@32;1,1)𝐶𝐿333@3211CL(3\times 3\times 3@32;1,1)italic_C italic_L ( 3 × 3 × 3 @ 32 ; 1 , 1 ) and one fewer CL(3×3×32@64;2,1)𝐶𝐿3332@6421CL(3\times 3\times 32@64;2,1)italic_C italic_L ( 3 × 3 × 32 @ 64 ; 2 , 1 ) layer, reducing the parameter count. After encoding, the feature map is flattened and passed through a fully connected layer with Sigmoid activation (FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S), reducing the dimensionality to 100. The decoder reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S, and a Sigmoid activation function scales the pixel values between 0 and 1, restoring the image to its original dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). Despite these efforts, the results, as shown in Table 14, indicate unsatisfactory performance.

The advantages of using convolution layers in the encoder include:

  1. 1.

    Spatial Hierarchies: The convolutional encoder captures essential spatial features, making it effective for dehazing tasks that rely on understanding spatial dependencies.

  2. 2.

    Compact Representation: The output of convolution layers represents a high-level, compact representation of the input image, transforming it into a manageable latent space that facilitates efficient reconstruction.

  3. 3.

    Detail Preservation: Fully connected layers alone were insufficient for capturing the spatial details required for high-quality dehazing and contrast correction, leading to suboptimal performance.

In the eighth ablation study, the deep network (Figure 39 and Table 15) employs an encoder-decoder framework. The encoder includes convolutional layers CL(3×3×3@32;1,1)𝐶𝐿333@3211CL(3\times 3\times 3@32;1,1)italic_C italic_L ( 3 × 3 × 3 @ 32 ; 1 , 1 ). This network includes one fewer CL(3×3×32@64;2,1)𝐶𝐿3332@6421CL(3\times 3\times 32@64;2,1)italic_C italic_L ( 3 × 3 × 32 @ 64 ; 2 , 1 ) layer, reducing the parameter count. Once the image is encoded, the feature map is flattened and passed through a fully connected layer with Sigmoid activation (FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S), reducing the dimensionality to 100. The decoder then reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. The latent vector (size 100) is initially expanded to 500 dimensions using FCS𝐹𝐶𝑆FCSitalic_F italic_C italic_S, followed by a Sigmoid activation function to scale the pixel values between 0 and 1, restoring the output to the original image dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). However, the results shown in Table 15 indicate unsatisfactory performance.

In the ninth ablation study, the deep network 𝚽DBsubscript𝚽𝐷𝐵\mathbf{\Phi}_{DB}bold_Φ start_POSTSUBSCRIPT italic_D italic_B end_POSTSUBSCRIPT (Figure 40 and Table 16) gives satisfactory result for blurry dataset, realized in an encoder-decoder framework. The encoder includes convolutional layers CL(3×3×3@32;1,1)𝐶𝐿333@3211CL(3\times 3\times 3@32;1,1)italic_C italic_L ( 3 × 3 × 3 @ 32 ; 1 , 1 ). This network includes one fewer CL(3×3×32@64;2,1)𝐶𝐿3332@6421CL(3\times 3\times 32@64;2,1)italic_C italic_L ( 3 × 3 × 32 @ 64 ; 2 , 1 ) layer, reducing the parameter count. Once the image is encoded, the feature map is flattened and passed through a fully connected layer (FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L), reducing the dimensionality to 100. The decoder then reconstructs the image from this latent representation using fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L, followed by another fully connected layer that adjusts the output to the original image dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). Finally, a Sigmoid activation function is applied to scale the pixel values between 0 and 1, ensuring the deblurred image maintains proper intensity levels.

Refer to caption
Figure 38: Model 7: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.02 22.43 dB
Hazy 37.9 4.20 dB
Color Balance 5.79 8.79 dB
Noisy 5.91 12.27 dB
Contrast 38.9 4.09 dB
Blurry 2.58 10.88 dB
GFLOPs 0.088 Number of Parameters: 72.29 M
Table 14: A table of the enhancement model designed to address degradation through ablation study based on Model 7.
Refer to caption
Figure 39: Model 8: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.064 28.32 dB
Hazy 0.165 25.82 dB
Color Balance 0.100 26.98 dB
Noisy 0.130 26.38 dB
Contrast 0.176 25.89 dB
Blurry 0.071 28.42 dB
GFLOPs 0.138 Number of Parameters: 46.0913 M
Table 15: A table of the enhancement model designed to address degradation through ablation study based on Model 8.
Refer to caption
Figure 40: Model 9: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 0.001 38.55 dB
Hazy 0.00276 34.04 dB
Color Balance 0.0014 38.45 dB
Noisy 0.0002 39.05 dB
Contrast 0.001 38.67 dB
Blurry 4.06e-05 40.02 dB
GFLOPs 0.571 Number of Parameters: 203.430 M
Table 16: A table of the enhancement model designed to address degradation through ablation study based on Model 9.

Deep Network for Dehazing and Enhancing High-contrast Images – Ideally, two networks, 𝚽DHsubscript𝚽𝐷𝐻\mathbf{\Phi}_{DH}bold_Φ start_POSTSUBSCRIPT italic_D italic_H end_POSTSUBSCRIPT for dehazing and 𝚽CEsubscript𝚽𝐶𝐸\mathbf{\Phi}_{CE}bold_Φ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT for contrast enhancement, would be designed to handle these respective tasks. However, the 𝚽DHsubscript𝚽𝐷𝐻\mathbf{\Phi}_{DH}bold_Φ start_POSTSUBSCRIPT italic_D italic_H end_POSTSUBSCRIPT network often produces noisy images for high contrast and hazy datasets. To address this, we have designed a single network, 𝚽DHCEsubscript𝚽𝐷𝐻𝐶𝐸\mathbf{\Phi}_{DHCE}bold_Φ start_POSTSUBSCRIPT italic_D italic_H italic_C italic_E end_POSTSUBSCRIPT, which performs both dehazing and contrast enhancement (shown in Figure 41 and Table 17).

The model architecture comprises convolutional layers (CL𝐶𝐿CLitalic_C italic_L) in the encoder. The initial layer is represented as CL(3×3×3@32;1,1)𝐶𝐿333@3211CL(3\times 3\times 3@32;1,1)italic_C italic_L ( 3 × 3 × 3 @ 32 ; 1 , 1 ), where each convolution (C𝐶Citalic_C) is paired with LeakyReLU (L𝐿Litalic_L) for non-linearity. This encoder downsamples the input image, producing a compressed latent representation that retains essential details while eliminating noise. This representation is further refined with two additional layers: CL(3×3×32@64;1,1)𝐶𝐿3332@6411CL(3\times 3\times 32@64;1,1)italic_C italic_L ( 3 × 3 × 32 @ 64 ; 1 , 1 ) and CL(3×3×64@128;1,1)𝐶𝐿3364@12811CL(3\times 3\times 64@128;1,1)italic_C italic_L ( 3 × 3 × 64 @ 128 ; 1 , 1 ), each followed by LeakyReLU activation.

The encoded feature map is then flattened and passed through a fully connected layer (FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L), reducing the dimensionality to 100. The decoder reconstructs the dehazed and enhanced image from this latent representation. The decoder includes fully connected layers followed by transposed convolutional layers. Initially, the latent vector (size 100) is expanded to 500 dimensions using FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_L. This is followed by another fully connected layer, which expands the output to match the original image dimensions (h×w×3)𝑤3(h\times w\times 3)( italic_h × italic_w × 3 ). Finally, a sigmoid activation function scales the pixel values between 0 and 1, ensuring a properly reconstructed image that is both dehazed and enhanced.

Refer to caption
Figure 41: Model 10: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Color Balance 3.185×1053.185superscript1053.185\times 10^{-5}3.185 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 40.96 dB
Blurry 2.098×1052.098superscript1052.098\times 10^{-5}2.098 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 38.92 dB
Noisy 2.28×1052.28superscript1052.28\times 10^{-5}2.28 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 40.38 dB
Illumination 2.18×1052.18superscript1052.18\times 10^{-5}2.18 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 42.46 dB
Contrast 1.49×1051.49superscript1051.49\times 10^{-5}1.49 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 46.39 dB
Hazy 1.16×1051.16superscript1051.16\times 10^{-5}1.16 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 46.03 dB
GFLOPs 0.822 Number of Parameters: 151.07 M
Table 17: A table of the enhancement model designed to address degradation through ablation study based on Model 10.

In the eleventh ablation study (Figure 42 and Table 18), the network architecture begins with an encoder that vectorizes the incoming h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c image. This vectorized image is then sequentially processed through four fully connected layers (FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_Ls). The input dimensionality is progressively reduced: starting from h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c to 128 neurons in the first layer, then 64 neurons in the second layer, and finally 32 neurons in the third layer, creating a compact latent space representation. The decoder’s objective is to reconstruct the image from this compressed latent space, mirroring the encoder’s structure by expanding the 32-dimensional latent vector back to the original image size h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c. A Sigmoid activation function is applied to the final output to ensure pixel values are scaled between 0 and 1.

The choice of a network comprising fully connected layers for this task is driven by several key reasons:

  1. 1.

    Effective Noise Reduction: Fully connected layers in the encoder-decoder architecture inherently filter out noise during training. They achieve this by learning a compressed representation of the input data that focuses on essential features while disregarding noise, which tends to be less structured and influential in the learning process.

  2. 2.

    Direct Feature Map**: Unlike convolutional layers that excel at capturing spatial hierarchies and local features, fully connected layers treat each pixel uniformly across the image. This uniform treatment allows them to effectively learn and map the relationship between noisy input and clean output without heavily relying on spatial dependencies.

  3. 3.

    Compact Representation: By reducing the dimensionality of the input through sequential linear transformations in the encoder, the model learns to encapsulate relevant image features in a more condensed form. This latent representation tends to minimize noise components, leading to clearer and more refined reconstructions in the decoder.

  4. 4.

    Flexibility and Reconstruction Quality: The fully connected layers in the decoder enable flexible and nonlinear reconstruction of the denoised image. This capability ensures that the model can generate smooth and visually appealing outputs by effectively filling in missing or distorted information caused by noise.

  5. 5.

    Proven Effectiveness: Empirical evidence and research in image processing tasks, including denoising, demonstrate that fully connected autoencoders can achieve impressive results. This is also evident from our experiments on the datasets used. They significantly reduce noise levels while preserving important image details, making them a reliable choice for enhancing image quality.

Deep Network for Denoising – The deep network (𝚽DNsubscript𝚽𝐷𝑁\mathbf{\Phi}_{DN}bold_Φ start_POSTSUBSCRIPT italic_D italic_N end_POSTSUBSCRIPT) (Figure 43 and Table 19) for denoising the data follows a similar architecture. The encoder vectorizes the incoming h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c image and processes it through a sequence of four FCL𝐹𝐶𝐿FCLitalic_F italic_C italic_Ls. The input dimensionality is progressively reduced: from h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c to 128 neurons in the first layer, 64 neurons in the second, 32 neurons in the third, and finally 16 neurons in the fourth layer, creating a compact latent space representation. The decoder then reconstructs the image from this compressed latent space, mirroring the encoder’s structure by expanding the 16-dimensional latent vector back to the original image size h×w×c𝑤𝑐h\times w\times citalic_h × italic_w × italic_c. Finally, a Sigmoid activation function is applied to the output, ensuring the pixel values are scaled between 0 and 1.

Refer to caption
Figure 42: Model 11: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 4.14 13.82 dB
Hazy 1.02 12.78 dB
Blurry 0.07 23.82 dB
Color Balance 0.02 25.24 dB
Contrast 0.56 19.38 dB
Noisy 0.00056 36.72 dB
GFLOPs 0.05055 Number of Parameters: 50.54 M
Table 18: A table of the enhancement model designed to address degradation through ablation study based on Model 11.
Refer to caption
Figure 43: Model 12: Evolving Network Architecture through Ablation Study
Degradation Type MSE PSNR
Illumination 4.14 13.82 dB
Hazy 0.08 20.78 dB
Blurry 0.02 25.38 dB
Color Balance 0.02 26.44 dB
Contrast 0.05 22.38 dB
Noisy 1.65×1061.65superscript1061.65\times 10^{-6}1.65 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 48.72 dB
GFLOPs 0.05055 Number of Parameters: 50.54 M
Table 19: A table of the enhancement model designed to address degradation through ablation study based on Model 12.

This figure Fig. 44 illustrates the selection of models based on their Peak Signal-to-Noise Ratio (PSNR) values. PSNR is a metric used to measure the quality of reconstructed images compared to their original versions, with higher values indicating better performance. The figure compares various models, highlighting how each performs in terms of PSNR, thereby guiding the selection of the most effective model for image quality enhancement.

Refer to caption
Refer to caption
Refer to caption
Figure 44: Selection of the model based on Peak Signal-to-Noise Ratio (PSNR), illustrating how different models perform in terms of image quality enhancement, with higher PSNR values indicating better performance.

8 Experimental Results and Discussion

Baseline Methods – The proposed approach IDA-UIE is benchmarked on the UIEB [14] and EUVP [13] datasets against nine state-of-art methods. IDA-UIE is compared with WaterNet [24], Fusion-based [23], MSSCE-GAN [49], Deep Wavenet [11] on UIEB dataset. IDA-UIE is compared with UGAN [6], UGAN-P [6], Funie-GAN [25], Funie-GAN-UP [25], Deep SESR [12], Deep WaveNet [11] on EUVP dataset.

Evaluation Metrics – This work has incorporated both reference and reference-less image quality metrics for quantitative performance analysis. The following evaluation metrics are used – Mean-Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), SSIM, Underwater Image Quality Measure (UIQM) [9], Natural Image Quality Evaluator (NIQE) [10], Patch-based Contrast Quality Index (PCQI) [8], Underwater Image Sharpness Measure (UISM) [9], Average Entropy (E), Average Gradient, Underwater Image Contrast Measure (UIConM) [9], and Underwater Color Image Quality Evaluation (UCIQE) [7]. Additionally, the sub-network sizes (parameters in millions) and associated floating point operations (in GFLOPs).

8.1 Quantitative Performance Analysis –

In terms of qualitative evaluation, presented the results obtained by applying the aforementioned methods to a single image from the UIEB and EUVP datasets and analyzed the histograms to assess the effects of enhancement. The degradation classifier was trained on the combined UIEB-D8 and EUVP-X-D8 datasets. It achieved an overall accuracy of 97.63%percent97.6397.63\%97.63 %. The first performance analysis experiment studies the proportion of images categorized in different dominant degradation conditions (or absence of degradation) for UIEB and EUVP test sets. The results are reported in Table 20 in terms of percentage for all three iterations. The second experiment evaluated the performance of the individual sub-networks. The image enhancement sub-networks were trained on the combined training subsets of UIEB-D8 and EUVP-UWD-D8 datasets. Their performances were validated on the combined test-sets of UIEB-D8 and EUVP-UWD-D8. The results of this experiment are reported in Table 21. The network sizes (parameters in millions), floating point operations (in GFLOPs) are reported along with enhancement performance (in terms of MSE and PSNR). The third experiment compares the performance of the proposed model IDA-UIE with four baseline approaches. The results are reported in terms of PSNR and SSIM in Table 3. The fourth experiment presents the comparative performance IDA-UIE and six state-of-art approaches. The results are reported in Table 23 in terms of eleven different evaluation metrics.

Table 20: The degradation classifier identifies the necessity of either of illumination correction (IC), contrast enhancement (CE), dehazing (DH), deblurring (DB), denoising (DN), color imbalance correction in red (CBR), green (CBG) or blue (CBG) channel. Additionally, it may also detect the case no further enhancement (NE). The table presents the proportion of images (reported in percentage) from the UIEB and EUVP test sets that are detected for the different kinds of dominant degradation correction (or NE) in all three iterations.
IC CE DH DB DN CBR CBG CBB NE
UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP UIEB EUVP
Iteration 1 11.79% 12.04% 11.11% 11.77% 9.92% 11.69% 10.70% 11.71% 10.48% 11.76% 10.42% 12.36% 10.58% 11.71% 10.58% 11.61% 14.38% 5.37%
Iteration 2 11.11% 12.13% 11.04% 11.87% 10.86% 11.65% 10.70% 11.87% 10.86% 12.00% 11.01% 8.90% 10.04% 11.78% 10.98% 11.99% 13.35% 4.79%
Iteration 3 10.42% 18.56% 10.73% 11.58% 9.89% 11.78% 10.73% 11.67% 11.01% 11.85% 10.36% 11.51% 10.79% 11.78% 10.92% 11.75% 15.01% 6.11%
Table 21: Individual performance of the image enhancement sub-networks.The network parameters (in millions (M), floating point operations (in GFLOPs) and performance (in terms of MSE and PSNR) are reported.
Degradation Parameter(M) GFLOPs MSE PSNR
Bluish 0.04 0.166 0.00058 36.78
Reddish 0.04 0.166 0.00052 36.44
Greenish 0.04 0.166 0.00009 37.67
Noisy 50.55 0.050 4.65e-06 48.72
Contrast 151.07 0.822 0.00006 39.81
Blurry 203.43 0.571 0.00006 38.02
Illumination 50.54 0.050 4.88e-06 49.33
Hazy 151.07 0.822 0.00008 40.27
Table 22: Comparison of the proposed model IDA-UIE with four state-of-art approaches on UIEB dataset
Method GFLOPs PSNR SSIM
WaterNet [24] 12.37 19.11 0.79
Fusion-based [23] 34.98 21.23 0.78
MSSCE-GAN [49] 192 21.62 0.81
Deep Wavenet [11] 18.15 21.68 0.80
IDA-UIE (ours) 16.83 28.87 0.90

8.2 Qualitative Performance Analysis –

The qualitative performance analysis of the proposed model IDA-UIE are presented in Figures 45 and 46. Sample images from UIEB and EUVP test-sets are progressively enhanced by correcting the dominant degradations in each iteration. The final output obtained after three iterations is visually compared against the ground-truth good quality image.

Refer to caption
(a)
Refer to caption
(b)
Figure 45: Each column corresponds to different sample image from the UIEB dataset. The first row shows the input degraded image. This is fed as input to first iteration. The dominant degradation identified in each iteration is shown in square braces. The second and third rows show the images enhanced in respective iterations. The last row shows the ground-truth good quality image and is compared with the output of third iteration (third row).
Table 23: Comparison of the proposed model IDA-UIE with state-of-art approaches on EUVP dataset in terms of different performance metrics.
Method GFLOPs MSE PSNR SSIM UIQM NIQE PCQI UISM Entropy AG UIConM UCIQE
UGAN [6] 143 0.36 26.55 0.80 2.89 49.90 0.700 6.84 7.52 7.48 0.79 0.581
UGAN-P [6] 143 0.36 26.54 0.80 2.93 50.17 0.704 6.83 7.54 7.58 0.79 .590
Funie-GAN [25] 70.34 0.39 26.22 0.79 2.97 50.51 0.706 6.90 7.55 8.58 0.84 0.590
Funie-GAN-UP [25] 70.34 0.60 25.22 0.78 2.93 52.87 0.702 6.86 7.80 7.80 0.79 0.588
Deep SESR [12] 30 0.34 27.08 0.80 3.09 55.68 0.679 7.06 7.40 7.57 0.78 0.572
Deep WaveNet [11] 18.15 0.29 28.62 0.83 3.04 44.89 0.694 7.06 7.38 7.00 0.77 0.559
IDA-UIE (ours) 16.83 0.0005 33.75 0.91 3.89 40.34 0.876 9.34 9.45 8.78 0.89 0.784
Refer to caption
(a)
Refer to caption
(b)
Figure 46: Each column corresponds to different sample image from the EUVP dataset. The first row shows the input degraded image. This is fed as input to first iteration. The dominant degradation identified in each iteration is shown in square braces. The second and third rows show the images enhanced in respective iterations. The last row shows the ground-truth good quality image and is compared with the output of third iteration (third row).

The plot Fig 47 displays the relationship between Peak Signal-to-Noise Ratio (PSNR) and frequency values, which are used to assess the quality of image enhancement methods. Higher PSNR values typically indicate better image quality. The region inside the red square highlights the failure cases, where the image enhancement method did not perform well. In these instances, the PSNR values are significantly lower, indicating that the enhanced images still contain substantial noise or distortion and thus fail to achieve the desired quality improvements. This analysis helps in identifying specific conditions or frequencies where the enhancement method needs further improvement.

Refer to caption
Figure 47: Plot showing PSNR vs Frequency values. The region inside red square depicts the failure cases.
Refer to caption
Figure 48: Cascading degradation effect: Initial enhancement failure in iteration 1 adversely impacts subsequent iterations, leading to progressively degraded image quality

9 Failure Case

Due to its severity, the model struggled to eliminate a specific type of degradation. As the enhancement process is sequential, with iteration 2 and iteration 3 depending on the results of iteration 1, any shortcomings in the initial enhancement adversely impact the subsequent iterations. Consequently, the failure to adequately enhance the image in the first iteration propagates through the sequence, leading to progressively degraded results. This issue is illustrated in Figure 48, where the cascading effect of the initial enhancement failure is evident in the overall quality of the enhanced images.

10 Conclusion

This paper presents an iterative framework for enhancing underwater images with degradation awareness, which identifies and enhances the dominant degradation condition using specific enhancement networks. Unlike single-network approaches, IDA-UIE progressively performs degradation-aware enhancements. A classifier identifies one of eight degradation types (including low illumination, low contrast, haziness, blur, noise, and color imbalances), or no degradation, and deploys the corresponding enhancement network. Trained on condition-specific degradations applied to UIEB and EUVP datasets, IDA-UIE outperforms nine state-of-the-art methods on eleven evaluation metrics.

This framework can also be adapted for general image enhancement problems by incorporating condition classifiers and specific enhancement sub-networks, with future research focusing on designing lightweight networks for each component.

References

  • [1] Pooja Sahu, Neelesh Gupta, and Neetu Sharma. A survey on underwater image enhancement techniques. International Journal of Computer Applications, 87(13), 2014.
  • Raveendran et al. [2021] S. Raveendran, M. D. Patil, and G. K. Birajdar, "Underwater image enhancement: a comprehensive review, recent trends, challenges and applications," Artificial Intelligence Review, vol. 54, pp. 5413-5467, 2021.
  • [3] Oscar C. Au, Lin Sun, Ruobing Zou, Wei Dai, and Si** Li. An improved method for color images enhancement considering HVS. In 2012 International Conference on Audio, Language and Image Processing, pages 117–122. IEEE, 2012.
  • [4] Z. A. Hasibuan, P. N. Andono, D. Pujiono, R. I. M. Setiadi, et al. Contrast limited adaptive histogram equalization for underwater image matching optimization use SURF. In Journal of Physics: Conference Series, volume 1803, number 1, page 012008. IOP Publishing, 2021.
  • [5] Achmad Basuki and Nana Ramadijanti. Improving auto level method for enhancement of underwater images. In 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC), pages 120–125. IEEE, 2016.
  • Fabbri et al. [2018] Fabbri, Cameron, Islam, Md Jahidul, and Sattar, Junaed. Enhancing underwater imagery using generative adversarial networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7159-7165. IEEE, 2018.
  • Yang and Sowmya [2015] Yang, Miao, and Sowmya, Arcot. An underwater color image quality evaluation metric. IEEE Transactions on Image Processing, 24(12):6062-6071, 2015.
  • Wang et al. [2015] Wang, Shiqi, Ma, Kede, Yeganeh, Hojatollah, Wang, Zhou, and Lin, Weisi. A patch-structure representation method for quality assessment of contrast changed images. IEEE Signal Processing Letters, 22(12):2387-2390, 2015.
  • Panetta et al. [2015] Panetta, Karen, Gao, Chen, and Agaian, Sos. Human-visual-system-inspired underwater image quality measures. IEEE Journal of Oceanic Engineering, 41(3):541-551, 2015.
  • Mittal et al. [2012] Mittal, Anish, Soundararajan, Rajiv, and Bovik, Alan C. Making a "completely blind" image quality analyzer. IEEE Signal Processing Letters, 20(3):209-212, 2012.
  • Sharma et al. [2023] Sharma, Prasen, Bisht, Ira, and Sur, Arijit. Wavelength-based attributed deep neural network for underwater image restoration. ACM Transactions on Multimedia Computing, Communications and Applications, 19(1):1-23, 2023.
  • Islam et al. [2020] Islam, Md Jahidul, Luo, Peigen, and Sattar, Junaed. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv preprint arXiv:2002.01155, 2020.
  • Islam et al. [2020] M. J. Islam, Y. Xia, and J. Sattar, "Fast Underwater Image Enhancement for Improved Visual Perception," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3227-3234, 2020.
  • Li et al. [2019] C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, "An underwater image enhancement benchmark dataset and beyond," IEEE Transactions on Image Processing, vol. 29, pp. 4376-4389, 2019.
  • Yi et al. [2024] X. Yi, Q. Jiang, and W. Zhou, "No-reference quality assessment of underwater image enhancement," Displays, vol. 81, pp. 102586, 2024.
  • Li et al. [2024] Y. Li, D. Li, Z. Gao, S. Wang, Q. Jiao, and others, "Underwater image enhancement utilizing adaptive color correction and model conversion for dehazing," Optics & Laser Technology, vol. 169, pp. 110039, 2024.
  • Hu et al. [2024] S. Hu, Z. Cheng, G. Fan, M. Gan, and C. L. P. Chen, "Texture-aware and color-consistent learning for underwater image enhancement," Journal of Visual Communication and Image Representation, vol. 98, pp. 104051, 2024.
  • Xiao et al. [2024] S. Xiao, X. Shen, Z. Zhang, J. Wen, M. Xi, and J. Yang, "Underwater image classification based on image enhancement and information quality evaluation," Displays, vol. 82, pp. 102635, 2024.
  • Zheng et al. [2024] R. Zheng, J. Miao, H. Zhang, X. Liu, and D. Tan, "An illumination adaptive underwater image enhancement method," in International Conference on Algorithm, Imaging Processing, and Machine Vision (AIPMV 2023), vol. 12969, pp. 442-449, 2024.
  • Mao et al. [2016] X. Mao, C. Shen, and Y.-B. Yang, "Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections," Advances in Neural Information Processing Systems, vol. 29, 2016.
  • Sun et al. [2019] X. Sun, L. Liu, Q. Li, J. Dong, E. Lima, and R. Yin, "Deep pixel-to-pixel network for underwater image enhancement and restoration," IET Image Processing, vol. 13, no. 3, pp. 469-474, 2019.
  • Kumar [2020] V. S. Kumar, "An Underwater Image Dehazing Method using Dark Channel Prior," Journal, vol. XX, pp. XX-XX, 2020.
  • [23] Cosmin Ancuti, Codruta Orniana Ancuti, Tom Haber, and Philippe Bekaert. Enhancing underwater images and videos by fusion. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 81–88. IEEE, 2012.
  • [24] Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, and Dacheng Tao. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Transactions on Image Processing, 29:4376–4389, 2020.
  • [25] Md Jahidul Islam, Youya Xia, and Junaed Sattar. Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters, 5(2):3227–3234, 2020.
  • Yang et al. [2020] M. Yang, K. Hu, Y. Du, Z. Wei, Z. Sheng, and J. Hu, "Underwater image enhancement based on conditional generative adversarial network," Signal Processing: Image Communication, vol. 81, pp. 115723, 2020.
  • Hu et al. [2022] K. Hu, C. Weng, Y. Zhang, J. **, and Q. Xia, "An overview of underwater vision enhancement: from traditional methods to recent deep learning," Journal of Marine Science and Engineering, vol. 10, no. 2, pp. 241, 2022.
  • Wang et al. [2020] Z. Wang, X. Xue, L. Ma, and X. Fan, "Underwater image enhancement based on dual U-net," in 2020 8th International Conference on Digital Home (ICDH), pp. 141-146, 2020.
  • [29] W. N. J. H. W. Yussof, M. S. Hitam, E. A. Awalludin, and Z. Bachok. Performing contrast limited adaptive histogram equalization technique on combined color models for underwater image enhancement. International Journal of Interactive Digital Media, 1(1):1–6, 2013.
  • [30] Najmul Hassan, Sami Ullah, Naeem Bhatti, Hasan Mahmood, and Muhammad Zia. The Retinex based improved underwater image enhancement. Multimedia Tools and Applications, 80:1839–1857, 2021.
  • [31] Omer Deperlioglu, Utku Kose, and G. Emre Guraksin. Underwater image enhancement with HSV and histogram equalization. Image, 1(4):461–465, 2018.
  • [32] Raj S. M. Alex, S. Deepa, and M. H. Supriya. Underwater image enhancement using CLAHE in a reconfigurable platform. In OCEANS 2016 MTS/IEEE Monterey, pages 1–5. IEEE, 2016.
  • [33] Cosmin Ancuti, Codruta Orniana Ancuti, Tom Haber, and Philippe Bekaert. Enhancing underwater images and videos by fusion. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 81–88. IEEE, 2012.
  • [34] Kamil Zakwan Mohd Azmi, Ahmad Shahrizan Abdul Ghani, Zulkifli Md Yusof, and Zuwairie Ibrahim. Natural-based underwater image color enhancement through fusion of swarm-intelligence algorithm. Applied Soft Computing, 85:105810, 2019.
  • [35] Bastiaan J. Boom, Phoenix X. Huang, Cigdem Beyan, Concetto Spampinato, Simone Palazzo, Jiyin He, Emmanuelle Beauxis-Aussalet, Sun-In Lin, Hsiu-Mei Chou, Gayathri Nadarajan, et al. Long-term underwater camera surveillance for monitoring and analysis of fish populations. VAIB12, 2012.
  • [36] Diksha Garg, Naresh Kumar Garg, and Munish Kumar. Underwater image enhancement using blending of CLAHE and percentile methodologies. Multimedia Tools and Applications, 77:26545–26561, 2018.
  • [37] Evgin Goceri. Challenges and recent solutions for image segmentation in the era of deep learning. In 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), pages 1–6. IEEE, 2019.
  • [38] Evgin Goceri. Skin disease diagnosis from photographs using deep learning. In VipIMAGE 2019: Proceedings of the VII ECCOMAS Thematic Conference on Computational Vision and Medical Image Processing, October 16–18, 2019, Porto, Portugal, pages 239–246. Springer, 2019.
  • [39] Manuel Gonzalez-Rivero, Oscar Beijbom, Alberto Rodriguez-Ramirez, Dominic E. P. Bryant, Anjani Ganase, Yeray Gonzalez-Marrero, Ana Herrera-Reveles, Emma V. Kennedy, Catherine J. S. Kim, Sebastian Lopez-Marcano, et al. Monitoring of coral reefs using artificial intelligence: A feasible and cost-effective approach. Remote Sensing, 12(3):489, 2020.
  • [40] Minjun Hou, Risheng Liu, Xin Fan, and Zhongxuan Luo. Joint residual learning for underwater image enhancement. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 4043–4047. IEEE, 2018.
  • [41] Kashif Iqbal, Michael Odetayo, Anne James, Rosalina Abdul Salam, and Abdullah Zawawi Hj Talib. Enhancing the low quality images using unsupervised colour correction method. In 2010 IEEE International Conference on Systems, Man and Cybernetics, pages 1703–1709. IEEE, 2010.
  • [42] Md Jahidul Islam, Youya Xia, and Junaed Sattar. Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters, 5(2):3227–3234, 2020.
  • [43] Geir Johnsen, Martin Ludvigsen, Asgeir Sørensen, and Lars Martin Sandvik Aas. The use of underwater hyperspectral imaging deployed on remotely operated vehicles-methods and applications. IFAC-PapersOnLine, 49(23):476–481, 2016.
  • [44] Chongyi Li, Jichang Guo, Chunle Guo, Runmin Cong, and Jiachang Gong. A hybrid method for underwater image correction. Pattern Recognition Letters, 94:62–67, 2017.
  • [45] Dawei Li, Lihong Xu, and Huanyu Liu. Detection of uneaten fish food pellets in underwater images for aquaculture. Aquacultural Engineering, 78:85–94, 2017.
  • [46] Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik. No-reference image quality assessment based on spatial and spectral entropies. Signal Processing: Image Communication, 29(8):856–863, 2014.
  • [47] Jianru Li and Yujie Li. Underwater image restoration algorithm for free-ascending deep-sea tripods. Optics & Laser Technology, 110:129–134, 2019.
  • [48] Sanparith Marukatat. Image enhancement using local intensity distribution equalization. EURASIP Journal on Image and Video Processing, 2015(1):1–18, 2015.
  • [49] Lingxin Zhang, Youkun Chen, Jie Lan, and Yuzhen Niu. MSSCE-GAN: Multi-Scale Structural and Color Enhanced Generative Adversarial Network for Unpaired Underwater Image Enhancement. In 2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC), pages 837–841. IEEE, 2023.
  • [50] Cameron Fabbri, Md Jahidul Islam, and Junaed Sattar. Enhancing Underwater Imagery Using Generative Adversarial Networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7159–7165. IEEE, 2018.
  • [51] **gyu Lu, Na Li, Shaoyong Zhang, Zhibin Yu, Haiyong Zheng, and Bing Zheng. Multi-scale adversarial network for underwater image restoration. Optics & Laser Technology, 110:105–113, 2019.