Search | arXiv e-print repository

High-Resolution Hyperspectral Video Imaging Using A Hexagonal Camera Array

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspect… ▽ More Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspectral image at once, also called snapshot hyperspectral imaging, to avoid spectral smearing. For this task, a high-resolution snapshot hyperspectral camera array using a hexagonal shape is introduced.The hexagonal array for hyperspectral imaging uses off-the-shelf hardware, which enables high flexibility regarding employed cameras, lenses and filters. Hence, the spectral range can be easily varied by mounting a different set of filters. Moreover, the concept of using off-the-shelf hardware enables low prices in comparison to other approaches with highly specialized hardware. Since classical industrial cameras are used in this hyperspectral camera array, the spatial and temporal resolution is very high, while recording 37 hyperspectral channels in the range from 400 nm to 760 nm in 10 nm steps. A registration process is required for near-field imaging, which maps the peripheral camera views to the center view. It is shown that this combination using a hyperspectral camera array and the corresponding image registration pipeline is superior in comparison to other popular snapshot approaches. For this evaluation, a synthetic hyperspectral database is rendered. On the synthetic data, the novel approach outperforms its best competitor by more than 3 dB in reconstruction quality. This synthetic data is also used to show the superiority of the hexagonal shape in comparison to an orthogonal-spaced one. Moreover, a real-world high resolution hyperspectral video database is provided. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.05900 [pdf, other]

SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

Abstract: Enabling high compression efficiency while kee** encoding energy consumption at a low level, requires prioritization of which videos need more sophisticated encoding techniques. However, the effects vary highly based on the content, and information on how good a video can be compressed is required. This can be measured by estimating the encoded bitstream size prior to encoding. We identified the… ▽ More Enabling high compression efficiency while kee** encoding energy consumption at a low level, requires prioritization of which videos need more sophisticated encoding techniques. However, the effects vary highly based on the content, and information on how good a video can be compressed is required. This can be measured by estimating the encoded bitstream size prior to encoding. We identified the errors between estimated motion vectors from Motion Search, an algorithm that predicts temporal changes in videos, correlates well to the encoded bitstream size. Combining Motion Search with Random Forests, the encoding bitrate can be estimated with a Pearson correlation of above 0.96. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures, accepted for European Signal Processing Conference (EUSIPCO) 2024

arXiv:2406.13709 [pdf, other]

A Study on the Effect of Color Spaces in Learned Image Compression

Authors: Srivatsa Prativadibhayankaram, Mahadev Prasad Panda, Jürgen Seiler, Thomas Richter, Heiko Sparenberg, Siegfried Fößel, André Kaup

Abstract: In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels i… ▽ More In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels in a single branch, similar to most learned image codecs operating in RGB. The models are trained for multiple bitrate configurations in each color space. We report the findings from our experiments by evaluating them on various datasets and compare the results to state-of-the-art image codecs. The YUV model performs better than the LAB variant in terms of MS-SSIM with a Bjøntegaard delta bitrate (BD-BR) gain of 7.5\% using VTM intra-coding mode as the baseline. Whereas the LAB variant has a better performance than YUV model in terms of CIEDE2000 having a BD-BR gain of 8\%. Overall, the RGB variant of SLIC achieves the best performance with a BD-BR gain of 13.14\% in terms of MS-SSIM and a gain of 17.96\% in CIEDE2000 at the cost of a higher model complexity. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepter pre-print version for ICIP 2024

arXiv:2406.11284 [pdf, other]

Multispectral Snapshot Image Registration Using Learned Cross Spectral Disparity Estimation and a Deep Guided Occlusion Reconstruction Network

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Multispectral imaging aims at recording images in different spectral bands. This is extremely beneficial in diverse discrimination applications, for example in agriculture, recycling or healthcare. One approach for snapshot multispectral imaging, which is capable of recording multispectral videos, is by using camera arrays, where each camera records a different spectral band. Since the cameras are… ▽ More Multispectral imaging aims at recording images in different spectral bands. This is extremely beneficial in diverse discrimination applications, for example in agriculture, recycling or healthcare. One approach for snapshot multispectral imaging, which is capable of recording multispectral videos, is by using camera arrays, where each camera records a different spectral band. Since the cameras are at different spatial positions, a registration procedure is necessary to map every camera to the same view. In this paper, we present a multispectral snapshot image registration with three novel components. First, a cross spectral disparity estimation network is introduced, which is trained on a popular stereo database using pseudo spectral data augmentation. Subsequently, this disparity estimation is used to accurately detect occlusions by war** the disparity map in a layer-wise manner. Finally, these detected occlusions are reconstructed by a learned deep guided neural network, which leverages the structure from other spectral components. It is shown that each element of this registration process as well as the final result is superior to the current state of the art. In terms of PSNR, our registration achieves an improvement of over 3 dB. At the same time, the runtime is decreased by a factor of over 3 on a CPU. Additionally, the registration is executable on a GPU, where the runtime can be decreased by a factor of 111. The source code and the data is available at https://github.com/FAU-LMS/MSIR. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.07938 [pdf, other]

On Annotation-free Optimization of Video Coding for Machines

Authors: Marc Windsheimer, Fabian Brand, André Kaup

Abstract: Today, image and video data is not only viewed by humans, but also automatically analyzed by computer vision algorithms. However, current coding standards are optimized for human perception. Emerging from this, research on video coding for machines tries to develop coding methods designed for machines as information sink. Since many of these algorithms are based on neural networks, most proposals… ▽ More Today, image and video data is not only viewed by humans, but also automatically analyzed by computer vision algorithms. However, current coding standards are optimized for human perception. Emerging from this, research on video coding for machines tries to develop coding methods designed for machines as information sink. Since many of these algorithms are based on neural networks, most proposals for video coding for machines build upon neural compression. So far, optimizing the compression by applying the task loss of the analysis network, for which ground truth data is needed, is achieving the best coding performance. But ground truth data is difficult to obtain and thus an optimization without ground truth is preferred. In this paper, we present an annotation-free optimization strategy for video coding for machines. We measure the distortion by calculating the task loss of the analysis network. Therefore, the predictions on the compressed image are compared with the predictions on the original image, instead of the ground truth data. Our results show that this strategy can even outperform training with ground truth data with rate savings of up to 7.5 %. By using the non-annotated training data, the rate gains can be further increased up to 8.2 %. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 7 pages, 10 figures

arXiv:2405.17866 [pdf, ps, other]

Towards Video Codec Performance Evaluation: A Rate-Energy-Distortion Perspective

Authors: Geetha Ramasubbu, André Kaup, Christian Herglotz

Abstract: The Bjøntegaard Delta rate (BD-rate) objectively assesses the coding efficiency of video codecs using the rate-distortion (R-D) performance but overlooks encoding energy, which is crucial in practical applications, especially for those on handheld devices. Although R-D analysis can be extended to incorporate encoding energy as energy-distortion (E-D), it fails to integrate all three parameters sea… ▽ More The Bjøntegaard Delta rate (BD-rate) objectively assesses the coding efficiency of video codecs using the rate-distortion (R-D) performance but overlooks encoding energy, which is crucial in practical applications, especially for those on handheld devices. Although R-D analysis can be extended to incorporate encoding energy as energy-distortion (E-D), it fails to integrate all three parameters seamlessly. This work proposes a novel approach to address this limitation by introducing a 3D representation of rate, encoding energy, and distortion through surface fitting. In addition, we evaluate various surface fitting techniques based on their accuracy and investigate the proposed 3D representation and its projections. The overlap** areas in projections help in encoder selection and recommend avoiding the slow presets of the older encoders (x264, x265), as the recent encoders (x265, VVenC) offer higher quality for the same bitrate-energy performance and provide a lower rate for the same energy-distortion performance. △ Less

Submitted 11 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.12631 [pdf, other]

Efficient Learned Wavelet Image and Video Coding

Authors: Anna Meyer, Srivatsa Prativadibhayankaram, André Kaup

Abstract: Learned wavelet image and video coding approaches provide an explainable framework with a latent space corresponding to a wavelet decomposition. The wavelet image coder iWave++ achieves state-of-the-art performance and has been employed for various compression tasks, including lossy as well as lossless image, video, and medical data compression. However, the approaches suffer from slow decoding sp… ▽ More Learned wavelet image and video coding approaches provide an explainable framework with a latent space corresponding to a wavelet decomposition. The wavelet image coder iWave++ achieves state-of-the-art performance and has been employed for various compression tasks, including lossy as well as lossless image, video, and medical data compression. However, the approaches suffer from slow decoding speed due to the autoregressive context model used in iWave++. In this paper, we show how a parallelized context model can be integrated into the iWave++ framework. Our experimental results demonstrate a speedup factor of over 350 and 240 for image and video compression, respectively. At the same time, the rate-distortion performance in terms of Bjøntegaard delta bitrate is slightly worse by 1.5\% for image coding and 1\% for video coding. In addition, we analyze the learned wavelet decomposition by visualizing its subband impulse responses. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 7 pages, 11 figures, submitted to ICIP2024

arXiv:2402.17487 [pdf, other]

Bit Rate Matching Algorithm Optimization in JPEG-AI Verification Model

Authors: Panqi Jia, A. Burakhan Koyuncu, Jue Mao, Ze Cui, Yi Ma, Tiansheng Guo, Timofey Solovyev, Alexander Karabutov, Yin Zhao, **g Wang, Elena Alshina, Andre Kaup

Abstract: The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties… ▽ More The research on neural network (NN) based image compression has shown superior performance compared to classical compression frameworks. Unlike the hand-engineered transforms in the classical frameworks, NN-based models learn the non-linear transforms providing more compact bit representations, and achieve faster coding speed on parallel devices over their classical counterparts. Those properties evoked the attention of both scientific and industrial communities, resulting in the standardization activity JPEG-AI. The verification model for the standardization process of JPEG-AI is already in development and has surpassed the advanced VVC intra codec. To generate reconstructed images with the desired bits per pixel and assess the BD-rate performance of both the JPEG-AI verification model and VVC intra, bit rate matching is employed. However, the current state of the JPEG-AI verification model experiences significant slowdowns during bit rate matching, resulting in suboptimal performance due to an unsuitable model. The proposed methodology offers a gradual algorithmic optimization for matching bit rates, resulting in a fourfold acceleration and over 1% improvement in BD-rate at the base operation point. At the high operation point, the acceleration increases up to sixfold. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at (IEEE) PCS 2024; 6 pages

arXiv:2402.17470 [pdf, other]

Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization

Authors: Panqi Jia, Jue Mao, Esin Koyuncu, A. Burakhan Koyuncu, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Elena Alshina, Andre Kaup

Abstract: Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization ef… ▽ More Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2402.10257 [pdf, other]

Analysis of Neural Video Compression Networks for 360-Degree Video Coding

Authors: Andy Regensky, Fabian Brand, André Kaup

Abstract: With the increasing efforts of bringing high-quality virtual reality technologies into the market, efficient 360-degree video compression gains in importance. As such, the state-of-the-art H.266/VVC video coding standard integrates dedicated tools for 360-degree video, and considerable efforts have been put into designing 360-degree projection formats with improved compression efficiency. For the… ▽ More With the increasing efforts of bringing high-quality virtual reality technologies into the market, efficient 360-degree video compression gains in importance. As such, the state-of-the-art H.266/VVC video coding standard integrates dedicated tools for 360-degree video, and considerable efforts have been put into designing 360-degree projection formats with improved compression efficiency. For the fast-evolving field of neural video compression networks (NVCs), the effects of different 360-degree projection formats on the overall compression performance have not yet been investigated. It is thus unclear, whether a resampling from the conventional equirectangular projection (ERP) to other projection formats yields similar gains for NVCs as for hybrid video codecs, and which formats perform best. In this paper, we analyze several generations of NVCs and an extensive set of 360-degree projection formats with respect to their compression performance for 360-degree video. Based on our analysis, we find that projection format resampling yields significant improvements in compression performance also for NVCs. The adjusted cubemap projection (ACP) and equatorial cylindrical projection (ECP) show to perform best and achieve rate savings of more than 55% compared to ERP based on WS-PSNR for the most recent NVC. Remarkably, the observed rate savings are higher than for H.266/VVC, emphasizing the importance of projection format resampling for NVCs. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures, 1 table, accepted for Picture Coding Symposium 2024 (PCS 2024)

arXiv:2402.09926 [pdf, other]

Predicting the Energy Demand of a Hardware Video Decoder with Unknown Design Using Software Profiling

Authors: Matthias Kränzler, Christian Herglotz, André Kaup

Abstract: Energy efficiency for video communications and video-on-demand streaming is essential for mobile devices with a limited battery capacity. Therefore, hardware decoder implementations are commonly used to significantly reduce the energetic load of video playback. The energy consumption of such a hardware implementation largely depends on a previously published recommendation document of a video codi… ▽ More Energy efficiency for video communications and video-on-demand streaming is essential for mobile devices with a limited battery capacity. Therefore, hardware decoder implementations are commonly used to significantly reduce the energetic load of video playback. The energy consumption of such a hardware implementation largely depends on a previously published recommendation document of a video coding standard that specifies which coding tools and methods are included. However, during the standardization of a video coding standard, the energy demand of a hardware implementation is unknown. Hence, the hardware complexity of coding tools is judged subjectively by experts from the field of hardware programming without using standardized assessment procedures. This can lead to suboptimal decisions on rejection or acceptance of a coding tool. To solve this problem, we propose a method that accurately models the energy demand of existing hardware decoders with an average error of 1.79% by exploiting information from software decoder profiling. Motivated by the low estimation error, we propose a hardware decoding energy metric that can predict and estimate the complexity of an unknown hardware implementation using information from existing hardware decoder implementations and available software implementations of the future video decoder. By using multiple video coding standards for model training, we can predict the complexity of an unknown hardware decoder with a minimum error of 4.54% without using the corresponding hardware decoder for training. △ Less

Submitted 4 July, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: Submitted to IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 13 Pages

arXiv:2402.09001 [pdf, other]

A Comprehensive Review of Software and Hardware Energy Efficiency of Video Decoders

Authors: Matthias Kränzler, Christian Herglotz, André Kaup

Abstract: Energy and compression efficiency are two essential parts of modern video decoder implementations that have to be considered. This work comprehensively studies the following six video coding formats regarding compression and decoding energy efficiency: AVC, VP9, HEVC, AV1, VVC, and AVM. We first evaluate the energy demand of reference and optimized software decoder implementations. Furthermore, we… ▽ More Energy and compression efficiency are two essential parts of modern video decoder implementations that have to be considered. This work comprehensively studies the following six video coding formats regarding compression and decoding energy efficiency: AVC, VP9, HEVC, AV1, VVC, and AVM. We first evaluate the energy demand of reference and optimized software decoder implementations. Furthermore, we consider the influence of the usage of SIMD instructions on those decoder implementations. We find that AV1 is a sweet spot for optimized software decoder implementations with an additional energy demand of 16.55% and bitrate savings of -43.95% compared to VP9. We furthermore evaluate the hardware decoding energy demand of four video coding formats. Thereby, we show that AV1 has energy demand increases by 117.50% compared to VP9. For HEVC, we found a sweet spot in terms of energy demand with an increase of 6.06% with respect to VP9. Relative to their optimized software counterparts, hardware video decoders reduce the energy consumption to less than 9% compared to software decoders. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: accepted as a conference paper for Picture Coding Symposium (PCS) 2024

arXiv:2401.17246 [pdf, other]

SLIC: A Learned Image Codec Using Structure and Color

Authors: Srivatsa Prativadibhayankaram, Mahadev Prasad Panda, Thomas Richter, Heiko Sparenberg, Siegfried Fößel, André Kaup

Abstract: We propose the structure and color based learned image codec (SLIC) in which the task of compression is split into that of luminance and chrominance. The deep learning model is built with a novel multi-scale architecture for Y and UV channels in the encoder, where the features from various stages are combined to obtain the latent representation. An autoregressive context model is employed for back… ▽ More We propose the structure and color based learned image codec (SLIC) in which the task of compression is split into that of luminance and chrominance. The deep learning model is built with a novel multi-scale architecture for Y and UV channels in the encoder, where the features from various stages are combined to obtain the latent representation. An autoregressive context model is employed for backward adaptation and a hyperprior block for forward adaptation. Various experiments are carried out to study and analyze the performance of the proposed model, and to compare it with other image codecs. We also illustrate the advantages of our method through the visualization of channel impulse responses, latent channels and various ablation studies. The model achieves Bjøntegaard delta bitrate gains of 7.5% and 4.66% in terms of MS-SSIM and CIEDE2000 metrics with respect to other state-of-the-art reference codecs. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Accepter paper for Data Compression Conference 2024

arXiv:2401.16067 [pdf, other]

Encoding Time and Energy Model for SVT-AV1 based on Video Complexity

Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

Abstract: The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption,… ▽ More The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption, modeling encoding energy for a wide range of encoding parameters is crucial. We propose an encoding time and energy model for SVT-AV1 based on empirical relations between the encoding time and video parameters as well as encoder configurations. Furthermore, we model the influence of video content by established content descriptors such as spatial and temporal information. We then use the predicted encoding time to estimate the required energy demand and achieve a prediction error of 19.6 % for encoding time and 20.9 % for encoding energy. △ Less

Submitted 30 January, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 5 pages, 1 figure, accepted for IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

arXiv:2312.14491 [pdf, ps, other]

Enhanced Color Palette Modeling for Lossless Screen Content Compression

Authors: Hannah Och, Shabhrish Reddy Uddehal, Tilo Strutz, André Kaup

Abstract: Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline includes three stages, namely a context model based stage, a color palette stage and a residual coding stage. Each subsequent stage is only employed if the previous stage can not be ap… ▽ More Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline includes three stages, namely a context model based stage, a color palette stage and a residual coding stage. Each subsequent stage is only employed if the previous stage can not be applied since necessary statistics, e.g. colors or contexts, have not been learned yet. We propose the following enhancements: First, information from previous stages is used to remove redundant color palette entries and prediction errors in subsequent stages. Additionally, implicitly known stage decision signals are no longer explicitly transmitted. These enhancements lead to an average bit rate decrease of 1.07% on the evaluated data. Compared to VVC and HEVC, the proposed method needs roughly 0.44 and 0.17 bits per pixel less on average for 24-bit screen content images, respectively. △ Less

Submitted 9 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, 2 tables; accepted for IEEE International Conference on Acoustics, Speech and Signal Processing 2024 (IEEE ICASSP 2024)

arXiv:2312.11209 [pdf, other]

Quantized Decoder in Learned Image Compression for Deterministic Reconstruction

Authors: Esin Koyuncu, Timofey Solovyev, Johannes Sauer, Elena Alshina, André Kaup

Abstract: Learned image compression has a problem of non-bit-exact reconstruction due to different calculations of floating point arithmetic on different devices. This paper shows a method to achieve a deterministic reconstructed image by quantizing only the decoder of the learned image compression model. From the implementation perspective of an image codec, it is beneficial to have the results reproducibl… ▽ More Learned image compression has a problem of non-bit-exact reconstruction due to different calculations of floating point arithmetic on different devices. This paper shows a method to achieve a deterministic reconstructed image by quantizing only the decoder of the learned image compression model. From the implementation perspective of an image codec, it is beneficial to have the results reproducible when decoded on different devices. In this paper, we study quantization of weights and activations without overflow of accumulator in all decoder subnetworks. We show that the results are bit-exact at the output, and the resulting BD-rate loss of quantization of decoder is 0.5 % in the case of 16-bit weights and 16-bit activations, and 7.9 % in the case of 8-bit weights and 16-bit activations. △ Less

Submitted 11 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures, 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2312.09266 [pdf, other]

Geometry-Corrected Geodesic Motion Modeling with Per-Frame Camera Motion for 360-Degree Video Compression

Authors: Andy Regensky, André Kaup

Abstract: The large amounts of data associated with 360-degree video require highly effective compression techniques for efficient storage and distribution. The development of improved motion models for 360-degree motion compensation has shown significant improvements in compression efficiency. A geodesic motion model representing translational camera motion proved to be one of the most effective models. In… ▽ More The large amounts of data associated with 360-degree video require highly effective compression techniques for efficient storage and distribution. The development of improved motion models for 360-degree motion compensation has shown significant improvements in compression efficiency. A geodesic motion model representing translational camera motion proved to be one of the most effective models. In this paper, we propose an improved geometry-corrected geodesic motion model that outperforms the state of the art at reduced complexity. We additionally propose the transmission of per-frame camera motion information, where prior work assumed the same camera motion for all frames of a sequence. Our approach yields average Bjøntegaard Delta rate savings of 2.27% over H.266/VVC, outperforming the original geodesic motion model by 0.32 percentage points at reduced computational complexity. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures, 3 tables, accepted for IEEE International Conference on Acoustics, Speech and Signal Processing 2024 (IEEE ICASSP 2024)

arXiv:2312.08949 [pdf, other]

A Guided Upsampling Network for Short Wave Infrared Images Using Graph Regularization

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Exploiting the infrared area of the spectrum for classification problems is getting increasingly popular, because many materials have characteristic absorption bands in this area. However, sensors in the short wave infrared (SWIR) area and even higher wavelengths have a very low spatial resolution in comparison to classical cameras that operate in the visible wavelength area. Thus, in this paper a… ▽ More Exploiting the infrared area of the spectrum for classification problems is getting increasingly popular, because many materials have characteristic absorption bands in this area. However, sensors in the short wave infrared (SWIR) area and even higher wavelengths have a very low spatial resolution in comparison to classical cameras that operate in the visible wavelength area. Thus, in this paper an upsampling method for SWIR images guided by a visible image is presented. For that, the proposed guided upsampling network (GUNet) uses a graph-regularized optimization problem based on learned affinities is presented. The evaluation is based on a novel synthetic near-field visible-SWIR stereo database. Different guided upsampling methods are evaluated, which shows an improvement of nearly 1 dB on this database for the proposed upsampling method in comparison to the second best guided upsampling network. Furthermore, a visual example of an upsampled SWIR image of a real-world scene is depicted for showing real-world applicability. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2312.08946 [pdf, other]

Color Agnostic Cross-Spectral Disparity Estimation

Authors: Frank Sippel, Nils Genser, Hannah Och, Jürgen Seiler, André Kaup

Abstract: Since camera modules become more and more affordable, multispectral camera arrays have found their way from special applications to the mass market, e.g., in automotive systems, smartphones, or drones. Due to multiple modalities, the registration of different viewpoints and the required cross-spectral disparity estimation is up to the present extremely challenging. To overcome this problem, we int… ▽ More Since camera modules become more and more affordable, multispectral camera arrays have found their way from special applications to the mass market, e.g., in automotive systems, smartphones, or drones. Due to multiple modalities, the registration of different viewpoints and the required cross-spectral disparity estimation is up to the present extremely challenging. To overcome this problem, we introduce a novel spectral image synthesis in combination with a color agnostic transform. Thus, any recently published stereo matching network can be turned to a cross-spectral disparity estimator. Our novel algorithm requires only RGB stereo data to train a cross-spectral disparity estimator and a generalization from artificial training data to camera-captured images is obtained. The theoretical examination of the novel color agnostic method is completed by an extensive evaluation compared to state of the art including self-recorded multispectral data and a reference implementation. The novel color agnostic disparity estimation improves cross-spectral as well as conventional color stereo matching by reducing the average end-point error by 41% for cross-spectral and by 22% for mono-modal content, respectively. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2310.17346 [pdf, ps, other]

Extended Signaling Methods for Reduced Video Decoder Power Consumption Using Green Metadata

Authors: Christian Herglotz, Matthias Kränzler, Xixue Chu, Edouard Francois, Yong He, André Kaup

Abstract: In this paper, we discuss one aspect of the latest MPEG standard edition on energy-efficient media consumption, also known as Green Metadata (ISO/IEC 232001-11), which is the interactive signaling for remote decoder-power reduction for peer-to-peer video conferencing. In this scenario, the receiver of a video, e.g., a battery-driven portable device, can send a dedicated request to the sender which… ▽ More In this paper, we discuss one aspect of the latest MPEG standard edition on energy-efficient media consumption, also known as Green Metadata (ISO/IEC 232001-11), which is the interactive signaling for remote decoder-power reduction for peer-to-peer video conferencing. In this scenario, the receiver of a video, e.g., a battery-driven portable device, can send a dedicated request to the sender which asks for a video bitstream representation that is less complex to decode and process. Consequently, the receiver saves energy and extends operating times. We provide an overview on latest studies from the literature dealing with energy-saving aspects, which motivate the extension of the legacy Green Metadata standard. Furthermore, we explain the newly introduced syntax elements and verify their effectiveness by performing dedicated experiments. We show that the integration of these syntax elements can lead to dynamic energy savings of up to 90% for software video decoding and 80% for hardware video decoding, respectively. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 5 pages, 2 figures

arXiv:2309.06945 [pdf, ps, other]

doi 10.1109/ISM.2018.00063

Improving HEVC Encoding of Rendered Video Data Using True Motion Information

Authors: Christian Herglotz, David Müller, Andreas Weinlich, Frank Bauer, Michael Ortner, Marc Stamminger, André Kaup

Abstract: This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation meth… ▽ More This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation methods, it is proposed to exploit the computer generated motion vectors to enhance the ratedistortion performance. To this end, a motion vector map** method including disocclusion handling is presented. It is shown that mean rate savings of 3.78% can be achieved. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 4 pages, 4 figures

Journal ref: Proc. 2018 IEEE International Symposium on Multimedia (ISM)

arXiv:2308.06570 [pdf, other]

doi 10.1109/QoMEX48832.2020.9123140

On Versatile Video Coding at UHD with Machine-Learning-Based Super-Resolution

Authors: Kristian Fischer, Christian Herglotz, André Kaup

Abstract: Coding 4K data has become of vital interest in recent years, since the amount of 4K data is significantly increasing. We propose a coding chain with spatial down- and upscaling that combines the next-generation VVC codec with machine learning based single image super-resolution algorithms for 4K. The investigated coding chain, which spatially downscales the 4K data before coding, shows superior qu… ▽ More Coding 4K data has become of vital interest in recent years, since the amount of 4K data is significantly increasing. We propose a coding chain with spatial down- and upscaling that combines the next-generation VVC codec with machine learning based single image super-resolution algorithms for 4K. The investigated coding chain, which spatially downscales the 4K data before coding, shows superior quality than the conventional VVC reference software for low bitrate scenarios. Throughout several tests, we find that up to 12 % and 18 % Bjontegaard delta rate gains can be achieved on average when coding 4K sequences with VVC and QP values above 34 and 42, respectively. Additionally, the investigated scenario with up- and downscaling helps to reduce the loss of details and compression artifacts, as it is shown in a visual example. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Originally published as conference paper at QoMEX 2020

arXiv:2307.14000 [pdf, ps, other]

doi 10.1109/ICIP.2017.8296731

Video Decoding Energy Estimation Using Processor Events

Authors: Christian Herglotz, André Kaup

Abstract: In this paper, we show that processor events like instruction counts or cache misses can be used to accurately estimate the processing energy of software video decoders. Therefore, we perform energy measurements on an ARM-based evaluation platform and count processor level events using a dedicated profiling software. Measurements are performed for various codecs and decoder implementations to prov… ▽ More In this paper, we show that processor events like instruction counts or cache misses can be used to accurately estimate the processing energy of software video decoders. Therefore, we perform energy measurements on an ARM-based evaluation platform and count processor level events using a dedicated profiling software. Measurements are performed for various codecs and decoder implementations to prove the general viability of our observations. Using the estimation method proposed in this paper, the true decoding energy for various recent video coding standards including HEVC and VP9 can be estimated with a mean estimation error that is smaller than 6%. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures

Journal ref: IEEE International Conference on Image Processing (ICIP), Bei**g, China, 2017, pp. 2493-2497

arXiv:2307.12864 [pdf, other]

Conditional Residual Coding: A Remedy for Bottleneck Problems in Conditional Inter Frame Coding

Authors: Fabian Brand, Jürgen Seiler, André Kaup

Abstract: Conditional coding is a new video coding paradigm enabled by neural-network-based compression. It can be shown that conditional coding is in theory better than the traditional residual coding, which is widely used in video compression standards like HEVC or VVC. However, on closer inspection, it becomes clear that conditional coders can suffer from information bottlenecks in the prediction path, i… ▽ More Conditional coding is a new video coding paradigm enabled by neural-network-based compression. It can be shown that conditional coding is in theory better than the traditional residual coding, which is widely used in video compression standards like HEVC or VVC. However, on closer inspection, it becomes clear that conditional coders can suffer from information bottlenecks in the prediction path, i.e., that due to the data processing inequality not all information from the prediction signal can be passed to the reconstructed signal, thereby impairing the coder performance. In this paper we propose the conditional residual coding concept, which we derive from information theoretical properties of the conditional coder. This coder significantly reduces the influence of bottlenecks, while maintaining the theoretical performance of the conditional coder. We provide a theoretical analysis of the coding paradigm and demonstrate the performance of the conditional residual coder in a practical example. We show that conditional residual coders alleviate the disadvantages of conditional coders while being able to maintain their advantages over residual coders. In the spectrum of residual and conditional coding, we can therefore consider them as ``the best from both worlds''. △ Less

Submitted 26 January, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 12 pages, 8 figures Accepted for Publication in TCSVT

arXiv:2307.08354 [pdf, ps, other]

doi 10.1109/TCE.2021.3122076

Component-wise Power Estimation of Electrical Devices Using Thermal Imaging

Authors: Christian Herglotz, Simon Grosche, Akarsh Bharadwaj, André Kaup

Abstract: This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on… ▽ More This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on the board are dissipating power. In order to enable reliable estimates, a segmentation of the thermal image must be available that can be obtained by manual labeling, object detection methods, or exploiting layout information. Evaluations show that with low-resolution consumer infrared cameras and dissipated powers larger than 300mW, mean estimation errors of 10% can be achieved. △ Less

Submitted 18 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 10 pages, 8 figures

Journal ref: IEEE Transactions on Consumer Electronics, vol. 67, no. 4, pp. 383-392, Nov. 2021,

arXiv:2307.08338 [pdf, ps, other]

doi 10.1109/ISCE.2019.8901018

Power Modeling for Virtual Reality Video Playback Applications

Authors: Christian Herglotz, Stéphane Coulombe, Ahmad Vakili, André Kaup

Abstract: This paper proposes a method to evaluate and model the power consumption of modern virtual reality playback and streaming applications on smartphones. Due to the high computational complexity of the virtual reality processing toolchain, the corresponding power consumption is very high, which reduces operating times of battery-powered devices. To tackle this problem, we analyze the power consumptio… ▽ More This paper proposes a method to evaluate and model the power consumption of modern virtual reality playback and streaming applications on smartphones. Due to the high computational complexity of the virtual reality processing toolchain, the corresponding power consumption is very high, which reduces operating times of battery-powered devices. To tackle this problem, we analyze the power consumption in detail by performing power measurements. Furthermore, we construct a model to estimate the true power consumption with a mean error of less than 3.5%. The model can be used to save power at critical battery levels by changing the streaming video parameters. Particularly, the results show that the power consumption is significantly reduced by decreasing the input video resolution. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures

Journal ref: 2019 IEEE 23rd International Symposium on Consumer Technologies (ISCT), Ancona, Italy, 2019, pp. 105-110

arXiv:2307.08337 [pdf, ps, other]

doi 10.1109/ICCE-Berlin47944.2019.8966177

Power-Efficient Video Streaming on Mobile Devices Using Optimal Spatial Scaling

Authors: Christian Herglotz, André Kaup, Stéphane Coulombe, Ahmad Vakili

Abstract: This paper derives optimal spatial scaling and rate control parameters for power-efficient wireless video streaming on portable devices. A video streaming application is studied, which receives a high-resolution and high-quality video stream from a remote server and displays the content to the end-user.We show that the resolution of the input video can be adjusted such that the quality-power trade… ▽ More This paper derives optimal spatial scaling and rate control parameters for power-efficient wireless video streaming on portable devices. A video streaming application is studied, which receives a high-resolution and high-quality video stream from a remote server and displays the content to the end-user.We show that the resolution of the input video can be adjusted such that the quality-power trade-off is optimized. Making use of a power model from the literature and subjective quality evaluation using a perceptual metric, we derive optimal combinations of the scaling factor and the rate-control parameter for encoding. For HD sequences, up to 10% of power can be saved at negligible quality losses and up to 15% of power can be saved at tolerable distortions. To show general validity, the method was tested for Wi-Fi and a mobile network as well as for two different smartphones. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 6 pages, 7 figures

Journal ref: Proc. IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), Berlin, Germany, 2019, pp. 233-238

arXiv:2307.06102 [pdf, other]

Spatially-Adaptive Learning-Based Image Compression with Hierarchical Multi-Scale Latent Spaces

Authors: Fabian Brand, Alexander Kopte, Kristian Fischer, André Kaup

Abstract: Adaptive block partitioning is responsible for large gains in current image and video compression systems. This method is able to compress large stationary image areas with only a few symbols, while maintaining a high level of quality in more detailed areas. Current state-of-the-art neural-network-based image compression systems however use only one scale to transmit the latent space. In previous… ▽ More Adaptive block partitioning is responsible for large gains in current image and video compression systems. This method is able to compress large stationary image areas with only a few symbols, while maintaining a high level of quality in more detailed areas. Current state-of-the-art neural-network-based image compression systems however use only one scale to transmit the latent space. In previous publications, we proposed RDONet, a scheme to transmit the latent space in multiple spatial resolutions. Following this principle, we extend a state-of-the-art compression network by a second hierarchical latent-space level to enable multi-scale processing. We extend the existing rate variability capabilities of RDONet by a gain unit. With that we are able to outperform an equivalent traditional autoencoder by 7% rate savings. Furthermore, we show that even though we add an additional latent space, the complexity only increases marginally and the decoding time can potentially even be decreased. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 5 pages, 3 figures Accepted for presentation at ICIP 2023

arXiv:2307.05208 [pdf, other]

Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, André Kaup, Christian Herglotz

Abstract: Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to… ▽ More Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to find a reasonable trade-off between performance and complexity. We present a complexity control mechanism in SVT-AV1 by using speed-adaptive preset switching to comply with the remaining time budget. This method enables encoding with a user-defined time constraint within the complete preset range with an average precision of 8.9 \% without introducing any additional latencies. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, accepted for IEEE International Conference on Image Processing (ICIP) 2023

arXiv:2306.16755 [pdf, ps, other]

Processing Energy Modeling for Neural Network Based Image Compression

Authors: Christian Herglotz, Fabian Brand, Andy Regensky, Felix Rievel, André Kaup

Abstract: Nowadays, the compression performance of neural-networkbased image compression algorithms outperforms state-of-the-art compression approaches such as JPEG or HEIC-based image compression. Unfortunately, most neural-network based compression methods are executed on GPUs and consume a high amount of energy during execution. Therefore, this paper performs an in-depth analysis on the energy consumptio… ▽ More Nowadays, the compression performance of neural-networkbased image compression algorithms outperforms state-of-the-art compression approaches such as JPEG or HEIC-based image compression. Unfortunately, most neural-network based compression methods are executed on GPUs and consume a high amount of energy during execution. Therefore, this paper performs an in-depth analysis on the energy consumption of state-of-the-art neural-network based compression methods on a GPU and show that the energy consumption of compression networks can be estimated using the image size with mean estimation errors of less than 7%. Finally, using a correlation analysis, we find that the number of operations per pixel is the main driving force for energy consumption and deduce that the network layers up to the second downsampling step are consuming most energy. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures, accepted for IEEE International Conference on Image Processing (ICIP) 2023

arXiv:2306.15237 [pdf, other]

doi 10.1109/ICIP49359.2023.10222159

Cross Spectral Image Reconstruction Using a Deep Guided Neural Network

Authors: Frank Sippel, Jürgen Seiler, André Kaup

Abstract: Cross spectral camera arrays, where each camera records different spectral content, are becoming increasingly popular for RGB, multispectral and hyperspectral imaging, since they are capable of a high resolution in every dimension using off-the-shelf hardware. For these, it is necessary to build an image processing pipeline to calculate a consistent image data cube, i.e., it should look like as if… ▽ More Cross spectral camera arrays, where each camera records different spectral content, are becoming increasingly popular for RGB, multispectral and hyperspectral imaging, since they are capable of a high resolution in every dimension using off-the-shelf hardware. For these, it is necessary to build an image processing pipeline to calculate a consistent image data cube, i.e., it should look like as if every camera records the scene from the center camera. Since the cameras record the scene from a different angle, this pipeline needs a reconstruction component for pixels that are not visible to peripheral cameras. For that, a novel deep guided neural network (DGNet) is presented. Since only little cross spectral data is available for training, this neural network is highly regularized. Furthermore, a new data augmentation process is introduced to generate the cross spectral content. On synthetic and real multispectral camera array data, the proposed network outperforms the state of the art by up to 2 dB in terms of PSNR on average. Besides, DGNet also tops its best competitor in terms of SSIM as well as in runtime by a factor of nearly 12. Moreover, a qualitative evaluation reveals visually more appealing results for real camera array data. △ Less

Submitted 14 September, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Journal ref: 2023 IEEE International Conference on Image Processing (ICIP)

arXiv:2306.13694 [pdf, other]

doi 10.1109/ICIP49359.2023.10222661

Motion Plane Adaptive Motion Modeling for Spherical Video Coding in H.266/VVC

Authors: Andy Regensky, Christian Herglotz, André Kaup

Abstract: Motion compensation is one of the key technologies enabling the high compression efficiency of modern video coding standards. To allow compression of spherical video content, special map** functions are required to project the video to the 2D image plane. Distortions inevitably occurring in these map**s impair the performance of classical motion models. In this paper, we propose a novel motion… ▽ More Motion compensation is one of the key technologies enabling the high compression efficiency of modern video coding standards. To allow compression of spherical video content, special map** functions are required to project the video to the 2D image plane. Distortions inevitably occurring in these map**s impair the performance of classical motion models. In this paper, we propose a novel motion plane adaptive motion modeling technique (MPA) for spherical video that allows to perform motion compensation on different motion planes in 3D space instead of having to work on the - in theory arbitrarily mapped - 2D image representation directly. The integration of MPA into the state-of-the-art H.266/VVC video coding standard shows average Bjøntegaard Delta rate savings of 1.72\% with a peak of 3.37\% based on PSNR and 1.55\% with a peak of 2.92\% based on WS-PSNR compared to VTM-14.2. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 5 pages, 4 figures, 1 table, accepted for IEEE International Conference on Image Processing 2023 (IEEE ICIP 2023). arXiv admin note: substantial text overlap with arXiv:2202.03323

arXiv:2306.13692 [pdf, other]

doi 10.1109/ICIP49359.2023.10222645

Improving Spherical Image Resampling through Viewport-Adaptivity

Authors: Andy Regensky, Viktoria Heimann, Ruoyu Zhang, André Kaup

Abstract: The conversion between different spherical image and video projection formats requires highly accurate resampling techniques in order to minimize the inevitable loss of information. Suitable resampling algorithms such as nearest neighbor, linear or cubic resampling are readily available. However, no generally applicable resampling technique exploits the special properties of spherical images so fa… ▽ More The conversion between different spherical image and video projection formats requires highly accurate resampling techniques in order to minimize the inevitable loss of information. Suitable resampling algorithms such as nearest neighbor, linear or cubic resampling are readily available. However, no generally applicable resampling technique exploits the special properties of spherical images so far. Thus, we propose a novel viewport-adaptive resampling (VAR) technique that takes the spherical characteristics of the underlying resampling problem into account. VAR can be applied to any mesh-to-mesh capable resampling algorithm and shows significant gains across all tested techniques. In combination with frequency-selective resampling, VAR outperforms conventional cubic resampling by more than 2 dB in terms of WS-PSNR. A visual inspection and the evaluation of further metrics such as PSNR and SSIM support the positive results. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures, 2 tables, accepted for IEEE International Conference on Image Processing 2023 (IEEE ICIP 2023)

arXiv:2306.06917 [pdf, ps, other]

doi 10.1145/3593908.3593948

Video Decoding Energy Reduction Using Temporal-Domain Filtering

Authors: Christian Herglotz, Matthias Kränzler, Robert Ludwig, André Kaup

Abstract: In this paper, we study decoding energy reduction opportunities using temporal-domain filtering and subsampling methods. In particular, we study spatiotemporal filtering using a contrast sensitivity function and temporal downscaling, i.e., frame rate reduction. We apply these concepts as a pre-filtering to the video before compression and evaluate the bitrate, the decoding energy, and the visual q… ▽ More In this paper, we study decoding energy reduction opportunities using temporal-domain filtering and subsampling methods. In particular, we study spatiotemporal filtering using a contrast sensitivity function and temporal downscaling, i.e., frame rate reduction. We apply these concepts as a pre-filtering to the video before compression and evaluate the bitrate, the decoding energy, and the visual quality with a dedicated metric targeting temporally down-scaled sequences. We find that decoding energy savings yield 35% when halving the frame rate and that spatiotemporal filtering can lead to up to 5% of additional savings, depending on the content. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 6 pages, 5 figures

arXiv:2305.16211 [pdf, other]

doi 10.1109/ACCESS.2023.3323873

Learned Wavelet Video Coding using Motion Compensated Temporal Filtering

Authors: Anna Meyer, Fabian Brand, André Kaup

Abstract: We present an end-to-end trainable wavelet video coder based on motion-compensated temporal filtering (MCTF). Thereby, we introduce a different coding scheme for learned video compression, which is currently dominated by residual and conditional coding approaches. By performing discrete wavelet transforms in temporal, horizontal, and vertical dimension, we obtain an explainable framework with spat… ▽ More We present an end-to-end trainable wavelet video coder based on motion-compensated temporal filtering (MCTF). Thereby, we introduce a different coding scheme for learned video compression, which is currently dominated by residual and conditional coding approaches. By performing discrete wavelet transforms in temporal, horizontal, and vertical dimension, we obtain an explainable framework with spatial and temporal scalability. We focus on investigating a novel trainable MCTF module that is implemented using the lifting scheme. We show how multiple temporal decomposition levels in MCTF can be considered during training and how larger temporal displacements due to the MCTF coding order can be handled. Further, we present a content adaptive extension to MCTF which adapts to different motion strengths during inference. In our experiments, we compare our MCTF-based approach to learning-based conditional coders and traditional hybrid video coding. Especially at high rates, our approach has promising rate-distortion performance. Our method achieves average Bjøntegaard Delta savings of up to 21% over HEVC on the UVG data set and thereby outperforms state-of-the-art learned video coders. △ Less

Submitted 12 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 14 pages, 14 figures, Accepted for IEEE Access 2023

arXiv:2305.15117 [pdf, other]

Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming

Authors: Christian Herglotz, Werner Robitza, Alexander Raake, Tobias Hossfeld, André Kaup

Abstract: This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing… ▽ More This paper uses a crowdsourced dataset of online video streaming sessions to investigate opportunities to reduce the power consumption while considering QoE. For this, we base our work on prior studies which model both the end-user's QoE and the end-user device's power consumption with the help of high-level video features such as the bitrate, the frame rate, and the resolution. On top of existing research, which focused on reducing the power consumption at the same QoE optimizing video parameters, we investigate potential power savings by other means such as using a different playback device, a different codec, or a predefined maximum quality level. We find that based on the power consumption of the streaming sessions from the crowdsourcing dataset, devices could save more than 55% of power if all participants adhere to low-power settings. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 4 pages, 3 figures

arXiv:2305.05996 [pdf, ps, other]

doi 10.1109/ICASSP49357.2023.10094983

Image Segmentation For Improved Lossless Screen Content Compression

Authors: Shabhrish Reddy Uddehal, Tilo Strutz, Hannah Och, André Kaup

Abstract: In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually scre… ▽ More In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed. One way to achieve this goal is to separate synthetic and natural regions. This paper proposes a segmentation method that identifies natural regions enabling better adaptive treatment. It supplements a compression method known as Soft Context Formation (SCF) and operates as a pre-processing step. If at least one natural segment is found within the SCI, it is split into two sub images (natural and synthetic parts), and the process of modelling and coding is performed separately for both. For SCIs with natural regions, the proposed method achieves a bit-rate reduction of up to 11.6% and 1.52% with respect to HEVC and the previous version of the SCF. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 5 Pages, 3 Figures

arXiv:2305.05451 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446147

Multiscale Augmented Normalizing Flows for Image Compression

Authors: Marc Windsheimer, Fabian Brand, André Kaup

Abstract: Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of the encoding transform. This issue can be resolved by using invertible latent variable models, which allow a perfect reconstruction if no quantization is perfo… ▽ More Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of the encoding transform. This issue can be resolved by using invertible latent variable models, which allow a perfect reconstruction if no quantization is performed. Furthermore, many traditional image and video coders apply dynamic block partitioning to vary the compression of certain image regions depending on their content. Inspired by this approach, hierarchical latent spaces have been applied to learning-based compression networks. In this paper, we present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model. Our best performing model achieved average rate savings of more than 7% over comparable single-scale models. △ Less

Submitted 22 May, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 5 pages, 7 figures

arXiv:2305.05440 [pdf, other]

Improved Screen Content Coding in VVC Using Soft Context Formation

Authors: Hannah Och, Shabhrish Reddy Uddehal, Tilo Strutz, André Kaup

Abstract: Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these properties are exploited using Intra Block Copy and Palette Mode. In this paper, we show that pixel-wise lossless coding can outperform lossy VVC coding in such areas. We propose an enha… ▽ More Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these properties are exploited using Intra Block Copy and Palette Mode. In this paper, we show that pixel-wise lossless coding can outperform lossy VVC coding in such areas. We propose an enhanced VVC coding approach for screen content images using the principle of soft context formation. First, the image is separated into two layers in a block-wise manner using a learning-based method with four block features. Synthetic image parts are coded losslessly using soft context formation, the rest with VVC.We modify the available soft context formation coder to incorporate information gained by the decoded VVC layer for improved coding efficiency. Using this approach, we achieve Bjontegaard-Delta-rate gains of 4.98% on the evaluated data sets compared to VVC. △ Less

Submitted 9 January, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 5 pages, 5 figures, 2 tables; accepted for IEEE International Conference on Acoustics, Speech and Signal Processing 2024 (IEEE ICASSP 2024)

arXiv:2304.12852 [pdf, ps, other]

doi 10.1109/TIP.2023.3346695

The Bjøntegaard Bible -- Why your Way of Comparing Video Codecs May Be Wrong

Authors: Christian Herglotz, Hannah Och, Anna Meyer, Geetha Ramasubbu, Lena Eichermüller, Matthias Kränzler, Fabian Brand, Kristian Fischer, Dat Thanh Nguyen, Andy Regensky, André Kaup

Abstract: In this paper, we provide an in-depth assessment on the Bjøntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as classic perspective video, 360$^\circ$ video, point clouds, and screen content. As compression technology… ▽ More In this paper, we provide an in-depth assessment on the Bjøntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as classic perspective video, 360$^\circ$ video, point clouds, and screen content. As compression technology, we consider multiple hybrid video codecs as well as state-of-the-art neural network based compression methods. Using additional supporting points inbetween standard points defined by parameters such as the quantization parameter, we assess the interpolation error of the Bjøntegaard-Delta (BD) calculus and its impact on the final BD value. From the analysis, we find that the BD calculus is most accurate in the standard application of rate-distortion comparisons with mean errors below 0.5 percentage points. For other applications and special cases, e.g., VMAF quality, energy considerations, or inter-codec comparisons, the errors are higher (up to 5 percentage points), but can be halved by using a higher number of supporting points. We finally come up with recommendations on how to use the BD calculus such that the validity of the resulting BD-values is maximized. Main recommendations are as follows: First, relative curve differences should be plotted and analyzed. Second, the logarithmic domain should be used for saturating metrics such as SSIM and VMAF. Third, BD values below a certain threshold indicated by the subset error should not be used to draw recommendations. Fourth, using two supporting points is sufficient to obtain rough performance estimates. △ Less

Submitted 22 December, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 21 pages, 14 figures

arXiv:2304.12412 [pdf, other]

End-to-End Lidar-Camera Self-Calibration for Autonomous Vehicles

Authors: Arya Rachman, Jürgen Seiler, André Kaup

Abstract: Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate mul… ▽ More Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate multiple sensors jointly to ensure no propagation of systemic errors. In this paper, we propose CaLiCa, an end-to-end deep self-calibration network which addresses the automatic calibration problem for pinhole camera and Lidar. We jointly predict the camera intrinsic parameters (focal length and distortion) as well as Lidar-Camera extrinsic parameters (rotation and translation), by regressing feature correlation between the camera image and the Lidar point cloud. The network is arranged in a Siamese-twin structure to constrain the network features learning to a mutually shared feature in both point cloud and camera (Lidar-camera constraint). Evaluation using KITTI datasets shows that we achieve 0.154 ° and 0.059 m accuracy with a reprojection error of 0.028 pixel with a single-pass inference. We also provide an ablative study of how our end-to-end learning architecture offers lower terminal loss (21% decrease in rotation loss) compared to isolated calibration △ Less

Submitted 27 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: Accepted for The 35th IEEE Intelligent Vehicles Symposium (IV 2023)

arXiv:2303.06519 [pdf, other]

doi 10.1109/TCSVT.2023.3239321

Lossless Point Cloud Geometry and Attribute Compression Using a Learned Conditional Probability Model

Authors: Dat Thanh Nguyen, Andre Kaup

Abstract: In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distribution… ▽ More In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distributions. Our method represents a point cloud with both occupancy feature and three attribute features at different bit depths in a unified sparse representation. This allows us to efficiently exploit feature-wise and point-wise dependencies within point clouds using a sparse tensor-based neural network and thus build an accurate auto-regressive context model for an arithmetic coder. To the best of our knowledge, this is the first learning-based lossless point cloud geometry and attribute compression approach. Compared with the-state-of-the-art lossless point cloud compression method from Moving Picture Experts Group (MPEG), our method achieves 22.6% reduction in total bitrate on a diverse set of test point clouds while having 49.0% and 18.3% rate reduction on geometry and color attribute component, respectively. △ Less

Submitted 20 March, 2024; v1 submitted 11 March, 2023; originally announced March 2023.

Comments: 12 pages, accepted to IEEE Transactions on Circuits and Systems for Video Technology

Journal ref: EEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4337-4348, Aug. 2023

arXiv:2303.06517 [pdf, other]

Deep probabilistic model for lossless scalable point cloud attribute compression

Authors: Dat Thanh Nguyen, Kamal Gopikrishnan Nambiar, Andre Kaup

Abstract: In recent years, several point cloud geometry compression methods that utilize advanced deep learning techniques have been proposed, but there are limited works on attribute compression, especially lossless compression. In this work, we build an end-to-end multiscale point cloud attribute coding method (MNeT) that progressively projects the attributes onto multiscale latent spaces. The multiscale… ▽ More In recent years, several point cloud geometry compression methods that utilize advanced deep learning techniques have been proposed, but there are limited works on attribute compression, especially lossless compression. In this work, we build an end-to-end multiscale point cloud attribute coding method (MNeT) that progressively projects the attributes onto multiscale latent spaces. The multiscale architecture provides an accurate context for the attribute probability modeling and thus minimizes the coding bitrate with a single network prediction. Besides, our method allows scalable coding that lower quality versions can be easily extracted from the losslessly compressed bitstream. We validate our method on a set of point clouds from MVUB and MPEG and show that our method outperforms recently proposed methods and on par with the latest G-PCC version 14. Besides, our coding time is substantially faster than G-PCC. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 5 pages, accepted for presentation at ICASSP 2023

arXiv:2303.05132 [pdf, other]

doi 10.1109/MMSP48831.2020.9287132

Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction

Authors: Anna Meyer, Nils Genser, André Kaup

Abstract: Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral imag… ▽ More Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures, 1 table; Originally published as conference paper at IEEE MMSP 2020

Journal ref: IEEE MMSP 2020

arXiv:2303.05121 [pdf, other]

A novel Cross-Component Context Model for End-to-End Wavelet Image Coding

Authors: Anna Meyer, André Kaup

Abstract: In contrast to traditional compression techniques performing linear transforms, the latent space of popular compressive autoencoders is obtained from a learned nonlinear map** and hard to interpret. In this paper, we explore a promising alternative approach for neural compression, with an autoencoder whose latent space represents a nonlinear wavelet decomposition. Previous work has shown that ne… ▽ More In contrast to traditional compression techniques performing linear transforms, the latent space of popular compressive autoencoders is obtained from a learned nonlinear map** and hard to interpret. In this paper, we explore a promising alternative approach for neural compression, with an autoencoder whose latent space represents a nonlinear wavelet decomposition. Previous work has shown that neural wavelet image coding can outperform HEVC. However, the approach codes color components independently, thereby ignoring inter-component dependencies. Hence, we propose a novel cross-component context model (CCM). With CCM, the entropy model for the chroma latent space can be conditioned on previously coded components exploiting correlations in the learned wavelet space. The proposed CCM outperforms the baseline model with average Bjøntegaard delta rate savings of 2.6 % and 1.6 % for the Kodak and Tecnick image sets. Also, our method is competitive with VVC and learning-based methods. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted for publication at ICASSP 2023

arXiv:2303.00433 [pdf, ps, other]

Motion Estimation for Fisheye Video With an Application to Temporal Resolution Enhancement

Authors: Andrea Eichenseer, Michel Bätz, André Kaup

Abstract: Surveying wide areas with only one camera is a typical scenario in surveillance and automotive applications. Ultra wide-angle fisheye cameras employed to that end produce video data with characteristics that differ significantly from conventional rectilinear imagery as obtained by perspective pinhole cameras. Those characteristics are not considered in typical image and video processing algorithms… ▽ More Surveying wide areas with only one camera is a typical scenario in surveillance and automotive applications. Ultra wide-angle fisheye cameras employed to that end produce video data with characteristics that differ significantly from conventional rectilinear imagery as obtained by perspective pinhole cameras. Those characteristics are not considered in typical image and video processing algorithms such as motion estimation, where translation is assumed to be the predominant kind of motion. This contribution introduces an adapted technique for use in block-based motion estimation that takes into the account the projection function of fisheye cameras and thus compensates for the non-perspective properties of fisheye videos. By including suitable projections, the translational motion model that would otherwise only hold for perspective material is exploited, leading to improved motion estimation results without altering the source material. In addition, we discuss extensions that allow for a better prediction of the peripheral image areas, where motion estimation falters due to spatial constraints, and further include calibration information to account for lens properties deviating from the theoretical function. Simulations and experiments are conducted on synthetic as well as real-world fisheye video sequences that are part of a data set created in the context of this paper. Average synthetic and real-world gains of 1.45 and 1.51 dB in luminance PSNR are achieved compared against conventional block matching. Furthermore, the proposed fisheye motion estimation method is successfully applied to motion compensated temporal resolution enhancement, where average gains amount to 0.79 and 0.76 dB. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 8, pp. 2376-2390, Aug. 2019

arXiv:2303.00379 [pdf, ps, other]

doi 10.1109/TIP.2017.2762586

Temporal Scalability of Dynamic Volume Data Using Mesh Compensated Wavelet Lifting

Authors: Wolfgang Schnurrer, Niklas Pallast, Thomas Richter, André Kaup

Abstract: Due to their high resolution, dynamic medical 2D+t and 3D+t volumes from computed tomography (CT) and magnetic resonance tomography (MR) reach a size which makes them very unhandy for teleradiologic applications. A lossless scalable representation offers the advantage of a down-scaled version which can be used for orientation or previewing, while the remaining information for reconstructing the fu… ▽ More Due to their high resolution, dynamic medical 2D+t and 3D+t volumes from computed tomography (CT) and magnetic resonance tomography (MR) reach a size which makes them very unhandy for teleradiologic applications. A lossless scalable representation offers the advantage of a down-scaled version which can be used for orientation or previewing, while the remaining information for reconstructing the full resolution is transmitted on demand. The wavelet transform offers the desired scalability. A very high quality of the lowpass sub-band is crucial in order to use it as a down-scaled representation. We propose an approach based on compensated wavelet lifting for obtaining a scalable representation of dynamic CT and MR volumes with very high quality. The mesh compensation is feasible to model the displacement in dynamic volumes which is mainly given by expansion and contraction of tissue over time. To achieve this, we propose an optimized estimation of the mesh compensation parameters to optimally fit for dynamic volumes. Within the lifting structure, the inversion of the motion compensation is crucial in the update step. We propose to take this inversion directly into account during the estimation step and can improve the quality of the lowpass sub-band by 0.63 and 0.43 dB on average for our tested dynamic CT and MR volumes at the cost of an increase of the rate by 2.4% and 1.2% on average. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Journal ref: IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 419-431, Jan. 2018

arXiv:2302.13581 [pdf, other]

Saliency-Driven Hierarchical Learned Image Coding for Machines

Authors: Kristian Fischer, Fabian Brand, Christian Blum, André Kaup

Abstract: We propose to employ a saliency-driven hierarchical neural image compression network for a machine-to-machine communication scenario following the compress-then-analyze paradigm. By that, different areas of the image are coded at different qualities depending on whether salient objects are located in the corresponding area. Areas without saliency are transmitted in latent spaces of lower spatial r… ▽ More We propose to employ a saliency-driven hierarchical neural image compression network for a machine-to-machine communication scenario following the compress-then-analyze paradigm. By that, different areas of the image are coded at different qualities depending on whether salient objects are located in the corresponding area. Areas without saliency are transmitted in latent spaces of lower spatial resolution in order to reduce the bitrate. The saliency information is explicitly derived from the detections of an object detection network. Furthermore, we propose to add saliency information to the training process in order to further specialize the different latent spaces. All in all, our hierarchical model with all proposed optimizations achieves 77.1 % bitrate savings over the latest video coding standard VVC on the Cityscapes dataset and with Mask R-CNN as analysis network at the decoder side. Thereby, it also outperforms traditional, non-hierarchical compression networks. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted for publication in 2023 ICASSP

arXiv:2302.01594 [pdf, ps, other]

Analysis of mesh-based motion compensation in wavelet lifting of dynamical 3-D+t CT data

Authors: Wolfgang Schnurrer, Thomas Richter, Jürgen Seiler, André Kaup

Abstract: Factorized in the lifting structure, the wavelet transform can easily be extended by arbitrary compensation methods. Thereby, the transform can be adapted to displacements in the signal without losing the ability of perfect reconstruction. This leads to an improvement of scalability. In temporal direction of dynamic medical 3-D+t volumes from Computed Tomography, displacement is mainly given by ex… ▽ More Factorized in the lifting structure, the wavelet transform can easily be extended by arbitrary compensation methods. Thereby, the transform can be adapted to displacements in the signal without losing the ability of perfect reconstruction. This leads to an improvement of scalability. In temporal direction of dynamic medical 3-D+t volumes from Computed Tomography, displacement is mainly given by expansion and compression of tissue. We show that these smooth movements can be well compensated with a mesh-based method. We compare the properties of triangle and quadrilateral meshes. We also show that with a mesh-based compensation approach coding results are comparable to the common slice wise coding with JPEG 2000 while a scalable representation in temporal direction can be achieved. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Journal ref: IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), Banff, AB, Canada, 2012, pp. 152-157

arXiv:2302.01592 [pdf, ps, other]

doi 10.1109/TIP.2019.2947138

Graph-Based Compensated Wavelet Lifting for Scalable Lossless Coding of Dynamic Medical Data

Authors: Daniela Lanz, André Kaup

Abstract: Lossless compression of dynamic 2D+t and 3D+t medical data is challenging regarding the huge amount of data, the characteristics of the inherent noise, and the high bit depth. Beyond that, a scalable representation is often required in telemedicine applications. Motion Compensated Temporal Filtering works well for lossless compression of medical volume data and additionally provides temporal, spat… ▽ More Lossless compression of dynamic 2D+t and 3D+t medical data is challenging regarding the huge amount of data, the characteristics of the inherent noise, and the high bit depth. Beyond that, a scalable representation is often required in telemedicine applications. Motion Compensated Temporal Filtering works well for lossless compression of medical volume data and additionally provides temporal, spatial, and quality scalability features. To achieve a high quality lowpass subband, which shall be used as a downscaled representative of the original data, graph-based motion compensation was recently introduced to this framework. However, encoding the motion information, which is stored in adjacency matrices, is not well investigated so far. This work focuses on coding these adjacency matrices to make the graph-based motion compensation feasible for data compression. We propose a novel coding scheme based on constructing so-called motion maps. This allows for the first time to compare the performance of graph-based motion compensation to traditional block- and mesh-based approaches. For high quality lowpass subbands our method is able to outperform the block- and mesh-based approaches by increasing the visual quality in terms of PSNR by 0.53dB and 0.28dB for CT data, as well as 1.04dB and 1.90dB for MR data, respectively, while the bit rate is reduced at the same time. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Journal ref: IEEE Transactions on Image Processing, vol. 29, pp. 2439-2451, 2020

Showing 1–50 of 175 results for author: Kaup, A