-
3R-INN: How to be climate friendly while consuming/delivering videos?
Authors:
Zoubida Ameur,
Claire-Hélène Demarty,
Daniel Menard,
Olivier Le Meur
Abstract:
The consumption of a video requires a considerable amount of energy during the various stages of its life-cycle. With a billion hours of video consumed daily, this contributes significantly to the greenhouse gas emission. Therefore, reducing the end-to-end carbon footprint of the video chain, while preserving the quality of experience at the user side, is of high importance. To contribute in an im…
▽ More
The consumption of a video requires a considerable amount of energy during the various stages of its life-cycle. With a billion hours of video consumed daily, this contributes significantly to the greenhouse gas emission. Therefore, reducing the end-to-end carbon footprint of the video chain, while preserving the quality of experience at the user side, is of high importance. To contribute in an impactful manner, we propose 3R-INN, a single light invertible network that does three tasks at once: given a high-resolution grainy image, it Rescales it to a lower resolution, Removes film grain and Reduces its power consumption when displayed. Providing such a minimum viable quality content contributes to reducing the energy consumption during encoding, transmission, decoding and display. 3R-INN also offers the possibility to restore either the high-resolution grainy original image or a grain-free version, thanks to its invertibility and the disentanglement of the high frequency, and without transmitting auxiliary data. Experiments show that, while enabling significant energy savings for encoding (78%), decoding (77%) and rendering (5% to 20%), 3R-INN outperforms state-of-the-art film grain synthesis and energy-aware methods and achieves state-of-the-art performance on the rescaling task on different test-sets.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Customizing Number Representation and Precision
Authors:
Olivier Sentieys,
Daniel Menard
Abstract:
There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g., 8-bit integer, 16-bit floating point). In the context of FPGAs, any number format and bit-width can even be considered.In computer arithmetic, the representation…
▽ More
There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g., 8-bit integer, 16-bit floating point). In the context of FPGAs, any number format and bit-width can even be considered.In computer arithmetic, the representation of real numbers is a major issue. Fixed-point (FxP) and floating-point (FlP) are the main options to represent reals, both with their advantages and drawbacks. This chapter presents both FxP and FlP number representations, and draws a fair a comparison between their cost, performance and energy, as well as their impact on accuracy during computations.It is shown that the choice between FxP and FlP is not obvious and strongly depends on the application considered. In some cases, low-precision floating-point arithmetic can be the most effective and provides some benefits over the classical fixed-point choice for energy-constrained applications.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Quality-Driven Dynamic VVC Frame Partitioning for Efficient Parallel Processing
Authors:
Thomas Amestoy,
Wassim Hamidouche,
Cyril Bergeron,
Daniel Menard
Abstract:
VVC is the next generation video coding standard, offering coding capability beyond HEVC standard. The high computational complexity of the latest video coding standards requires high-level parallelism techniques, in order to achieve real-time and low latency encoding and decoding. HEVC and VVC include tile grid partitioning that allows to process simultaneously rectangular regions of a frame with…
▽ More
VVC is the next generation video coding standard, offering coding capability beyond HEVC standard. The high computational complexity of the latest video coding standards requires high-level parallelism techniques, in order to achieve real-time and low latency encoding and decoding. HEVC and VVC include tile grid partitioning that allows to process simultaneously rectangular regions of a frame with independent threads. The tile grid may be further partitioned into a horizontal sub-grid of Rectangular Slices (RSs), increasing the partitioning flexibility. The dynamic Tile and Rectangular Slice (TRS) partitioning solution proposed in this paper benefits from this flexibility. The TRS partitioning is carried-out at the frame level, taking into account both spatial texture of the content and encoding times of previously encoded frames. The proposed solution searches the best partitioning configuration that minimizes the trade-off between multi-thread encoding time and encoding quality loss. Experiments prove that the proposed solution, compared to uniform TRS partitioning, significantly decreases multi-thread encoding time, with slightly better encoding quality.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
A Novel Loss Function Incorporating Imaging Acquisition Physics for PET Attenuation Map Generation using Deep Learning
Authors:
Luyao Shi,
John A. Onofrey,
Enette Mae Revilla,
Takuya Toyonaga,
David Menard,
Jo-seph Ankrah,
Richard E. Carson,
Chi Liu,
Yihuan Lu
Abstract:
In PET/CT imaging, CT is used for PET attenuation correction (AC). Mismatch between CT and PET due to patient body motion results in AC artifacts. In addition, artifact caused by metal, beam-hardening and count-starving in CT itself also introduces inaccurate AC for PET. Maximum likelihood reconstruction of activity and attenuation (MLAA) was proposed to solve those issues by simultaneously recons…
▽ More
In PET/CT imaging, CT is used for PET attenuation correction (AC). Mismatch between CT and PET due to patient body motion results in AC artifacts. In addition, artifact caused by metal, beam-hardening and count-starving in CT itself also introduces inaccurate AC for PET. Maximum likelihood reconstruction of activity and attenuation (MLAA) was proposed to solve those issues by simultaneously reconstructing tracer activity ($λ$-MLAA) and attenuation map ($μ$-MLAA) based on the PET raw data only. However, $μ$-MLAA suffers from high noise and $λ$-MLAA suffers from large bias as compared to the reconstruction using the CT-based attenuation map ($μ$-CT). Recently, a convolutional neural network (CNN) was applied to predict the CT attenuation map ($μ$-CNN) from $λ$-MLAA and $μ$-MLAA, in which an image-domain loss (IM-loss) function between the $μ$-CNN and the ground truth $μ$-CT was used. However, IM-loss does not directly measure the AC errors according to the PET attenuation physics, where the line-integral projection of the attenuation map ($μ$) along the path of the two annihilation events, instead of the $μ$ itself, is used for AC. Therefore, a network trained with the IM-loss may yield suboptimal performance in the $μ$ generation. Here, we propose a novel line-integral projection loss (LIP-loss) function that incorporates the PET attenuation physics for $μ$ generation. Eighty training and twenty testing datasets of whole-body 18F-FDG PET and paired ground truth $μ$-CT were used. Quantitative evaluations showed that the model trained with the additional LIP-loss was able to significantly outperform the model trained solely based on the IM-loss function.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.