Search | arXiv e-print repository

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. Despite several efforts to alleviate the degradation, the transformer-based SR model still suffers severe degradation due to its distinctive activation distribution. In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization. The proposed method first investigates the weight and activation and finds that the distribution is characterized by coexisting symmetry and asymmetry, long tails. Specifically, we propose Distribution-Oriented Bound Initialization (DOBI), using different searching strategies to search a coarse bound for quantizers. To obtain refined quantizer parameters, we further propose Distillation Quantization Calibration (DQC), which employs a distillation approach to make the quantized model learn from its FP counterpart. Through extensive experiments on different bits and scaling factors, the performance of DOBI can reach the state-of-the-art (SOTA) while after stage two, our method surpasses existing PTQ in both metrics and visual effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2) compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio and 5.08x speedup ratio. The code and models will be available at https://github.com/Kai-Liu001/2DQuant. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

arXiv:2404.04848 [pdf, other]

Task-Aware Encoder Control for Deep Video Compression

Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, **g Geng, Yan Wang, Jun Zhang, Hongwei Qin

Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an innovative encoder controller for deep video compression for machines. This controller features a mode prediction and a Group of Pictures (GoP) selection module. Our approach centralizes control at the encoding stage, allowing for adaptable encoder adjustments across different tasks, such as detection and tracking, while maintaining compatibility with a standard pre-trained DVC decoder. Empirical evidence demonstrates that our method is applicable across multiple tasks with various existing pre-trained DVCs. Moreover, extensive experiments demonstrate that our method outperforms previous DVC by about 25% bitrate for different tasks, with only one pre-trained decoder. △ Less

Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.13030 [pdf, other]

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

Authors: Jixiang Luo, Yan Wang, Hongwei Qin

Abstract: Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text… ▽ More Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear map** within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG. △ Less

Submitted 20 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.08551 [pdf, other]

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, **g Geng, Jun Zhang

Abstract: Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation an… ▽ More Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3$\times$ lower GPU memory usage and 5$\times$ faster fitting time not only rivals INRs (e.g., WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. Code will be available at https://github.com/Xinjie-Q/GaussianImage. △ Less

Submitted 10 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.01960 [pdf, other]

A robust audio deepfake detection system via multi-view feature

Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Abstract: With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, includi… ▽ More With the advancement of generative modeling techniques, synthetic human speech becomes increasingly indistinguishable from real, and tricky challenges are elicited for the audio deepfake detection (ADD) system. In this paper, we exploit audio features to improve the generalizability of ADD systems. Investigation of the ADD task performance is conducted over a broad range of audio features, including various handcrafted features and learning-based features. Experiments show that learning-based audio features pretrained on a large amount of data generalize better than hand-crafted features on out-of-domain scenarios. Subsequently, we further improve the generalizability of the ADD system using proposed multi-feature approaches to incorporate complimentary information from features of different views. The model trained on ASV2019 data achieves an equal error rate of 24.27\% on the In-the-Wild dataset. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures

arXiv:2402.18152 [pdf, other]

Boosting Neural Representations for Videos with a Conditional Decoder

Authors: Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting frame… ▽ More Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs. Code is available at https://github.com/Xinjie-Q/Boosting-NeRV. △ Less

Submitted 16 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accept by CVPR 2024

arXiv:2402.09101 [pdf, other]

DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destri**

Authors: Shiqi Yang, Hanlin Qin, Shuai Yuan, Xiang Yan, Hossein Rahmani

Abstract: CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destri** task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise unde… ▽ More CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destri** task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise under unsupervised constraints. This poses a threat to the effectiveness of the cycle-consistency loss, leading to stripe noise residual in the denoised image. To address the above issue, we present a novel framework for single-frame infrared image destri**, named DestripeCycleGAN. In this model, the conventional auxiliary generator is replaced with a priori stripe generation model (SGM) to introduce vertical stripe noise in the clean data, and the gradient map is employed to re-establish cycle-consistency. Meanwhile, a Haar wavelet background guidance module (HBGM) has been designed to minimize the divergence of background details between the different domains. To preserve vertical edges, a multi-level wavelet U-Net (MWUNet) is proposed as the denoising generator, which utilizes the Haar wavelet transform as the sampler to decline directional information loss. Moreover, it incorporates the group fusion block (GFB) into skip connections to fuse the multi-scale features and build the context of long-distance dependencies. Extensive experiments on real and synthetic data demonstrate that our DestripeCycleGAN surpasses the state-of-the-art methods in terms of visual quality and quantitative evaluation. Our code will be made public at https://github.com/0wuji/DestripeCycleGAN. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.08920 [pdf, other]

Idempotence and Perceptual Image Compression

Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, **g**g Liu, Ya-Qin Zhang

Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we propose a new paradigm of perceptual image codec by inverting unconditional generative model with idempotence constraints. Our codec is theoretically equivalent to conditional generative codec, and it does not require training new models. Instead, it only requires a pre-trained mean-square-error codec and unconditional generative model. Empirically, we show that our proposed approach outperforms state-of-the-art methods such as HiFiC and ILLM, in terms of Fréchet Inception Distance (FID). The source code is provided in https://github.com/tongdaxu/Idempotence-and-Perceptual-Image-Compression. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: ICLR 2024

arXiv:2311.03046 [pdf, other]

Antenna Positioning and Beamforming Design for Fluid-Antenna Enabled Multi-user Downlink Communications

Authors: Haoran Qin, Wen Chen, Zhendong Li, Qingqing Wu, Nan Cheng, Fangjiong Chen

Abstract: This paper investigates a multiple input single output (MISO) downlink communication system in which users are equipped with fluid antennas (FAs). First, we adopt a field-response based channel model to characterize the downlink channel with respect to FAs' positions. Then, we aim to minimize the total transmit power by jointly optimizing the FAs' positions and beamforming matrix. To solve the res… ▽ More This paper investigates a multiple input single output (MISO) downlink communication system in which users are equipped with fluid antennas (FAs). First, we adopt a field-response based channel model to characterize the downlink channel with respect to FAs' positions. Then, we aim to minimize the total transmit power by jointly optimizing the FAs' positions and beamforming matrix. To solve the resulting non-convex problem, we employ an alternating optimization (AO) algorithm based on penalty method and successive convex approximation (SCA) to obtain a sub-optimal solution. Numerical results demonstrate that the FA-assisted communication system performs better than conventional fixed position antennas system. △ Less

Submitted 13 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2309.00017 [pdf, other]

Physics-Based Trajectory Design for Cellular-Connected UAV in Rainy Environments Based on Deep Reinforcement Learning

Authors: Hao Qin, Zhaozhou Wu, Xingqi Zhang

Abstract: Cellular-connected unmanned aerial vehicles (UAVs) have gained increasing attention due to their potential to enhance conventional UAV capabilities by leveraging existing cellular infrastructure for reliable communications between UAVs and base stations. They have been used for various applications, including weather forecasting and search and rescue operations. However, under extreme weather cond… ▽ More Cellular-connected unmanned aerial vehicles (UAVs) have gained increasing attention due to their potential to enhance conventional UAV capabilities by leveraging existing cellular infrastructure for reliable communications between UAVs and base stations. They have been used for various applications, including weather forecasting and search and rescue operations. However, under extreme weather conditions such as rainfall, it is challenging for the trajectory design of cellular UAVs, due to weak coverage regions in the sky, limitations of UAV flying time, and signal attenuation caused by raindrops. To this end, this paper proposes a physics-based trajectory design approach for cellular-connected UAVs in rainy environments. A physics-based electromagnetic simulator is utilized to take into account detailed environment information and the impact of rain on radio wave propagation. The trajectory optimization problem is formulated to jointly consider UAV flying time and signal-to-interference ratio, and is solved through a Markov decision process using deep reinforcement learning algorithms based on multi-step learning and double Q-learning. Optimal UAV trajectories are compared in examples with homogeneous atmosphere medium and rain medium. Additionally, a thorough study of varying weather conditions on trajectory design is provided, and the impact of weight coefficients in the problem formulation is discussed. The proposed approach has demonstrated great potential for UAV trajectory design under rainy weather conditions. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.13287 [pdf, other]

Efficient Learned Lossless JPEG Recompression

Authors: Lina Guo, Yuanyuan Wang, Tongda Xu, Jixiang Luo, Dailan He, Zhenjun Ji, Shanshan Wang, Yang Wang, Hongwei Qin

Abstract: JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granula… ▽ More JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granularities. First, luma and chroma are processed independently to allow parallel coding. Second, we propose pipeline parallel context model (PPCM) and compressed checkerboard context model (CCCM) for the effective conditional modeling and efficient decoding within luma and chroma components. Our method has much lower latency while achieves better compression ratio compared with previous SOTA. After proper software optimization, we can obtain a good throughput of 57 FPS for 1080P images on NVIDIA T4 GPU. Furthermore, combined with quantization, our approach can also act as a lossy JPEG codec which has obvious advantage over SOTA lossy compression methods in high bit rate (bpp$>0.9$). △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.11864 [pdf, other]

Enhanced Residual SwinV2 Transformer for Learned Image Compression

Authors: Yongqiang Wang, Feng Liang, Haisheng Fu, Jie Liang, Haipeng Qin, Junzhe Liang

Abstract: Recently, the deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. However, a challenge of many learning-based approaches is that they often achieve better performance via sacrificing complexity, which making practical deployment difficult. To alleviate this issue, in this paper, we propose an effective and effic… ▽ More Recently, the deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. However, a challenge of many learning-based approaches is that they often achieve better performance via sacrificing complexity, which making practical deployment difficult. To alleviate this issue, in this paper, we propose an effective and efficient learned image compression framework based on an enhanced residual Swinv2 transformer. To enhance the nonlinear representation of images in our framework, we use a feature enhancement module that consists of three consecutive convolutional layers. In the subsequent coding and hyper coding steps, we utilize a SwinV2 transformer-based attention mechanism to process the input image. The SwinV2 model can help to reduce model complexity while maintaining high performance. Experimental results show that the proposed method achieves comparable performance compared to some recent learned image compression methods on Kodak and Tecnick datasets, and outperforms some traditional codecs including VVC. In particular, our method achieves comparable results while reducing model complexity by 56% compared to these recent methods. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.08154 [pdf, other]

Conditional Perceptual Quality Preserving Image Compression

Authors: Tongda Xu, Qian Zhang, Yanghao Li, Dailan He, Zhe Wang, Yuanyuan Wang, Hongwei Qin, Yan Wang, **g**g Liu, Ya-Qin Zhang

Abstract: We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information. Specifically, we extend the original perceptual quality $d(p_{X},p_{\hat{X}})$ to the conditional perceptual quality $d(p_{X|Y},p_{\hat{X}|Y})$, where $X$ is the original image, $\hat{X}$ is the reconstructed, $Y$ is side informati… ▽ More We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information. Specifically, we extend the original perceptual quality $d(p_{X},p_{\hat{X}})$ to the conditional perceptual quality $d(p_{X|Y},p_{\hat{X}|Y})$, where $X$ is the original image, $\hat{X}$ is the reconstructed, $Y$ is side information defined by user and $d(.,.)$ is divergence. We show that conditional perceptual quality has similar theoretical properties as rate-distortion-perception trade-off \citep{blau2019rethinking}. Based on these theoretical results, we propose an optimal framework for conditional perceptual quality preserving compression. Experimental results show that our codec successfully maintains high perceptual quality and semantic quality at all bitrate. Besides, by providing a lowerbound of common randomness required, we settle the previous arguments on whether randomness should be incorporated into generator for (conditional) perceptual quality compression. The source code is provided in supplementary material. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2305.14049 [pdf, other]

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Authors: Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

Abstract: Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating more accurate and informative semantic states. In this paper, we propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. In particular, unlike vanilla… ▽ More Attention-based encoder-decoder (AED) models have shown impressive performance in ASR. However, most existing AED methods neglect to simultaneously leverage both acoustic and semantic features in decoder, which is crucial for generating more accurate and informative semantic states. In this paper, we propose an Acoustic and Semantic Cooperative Decoder (ASCD) for ASR. In particular, unlike vanilla decoders that process acoustic and semantic features in two separate stages, ASCD integrates them cooperatively. To prevent information leakage during training, we design a Causal Multimodal Mask. Moreover, a variant Semi-ASCD is proposed to balance accuracy and computational cost. Our proposal is evaluated on the publicly available AISHELL-1 and aidatatang_200zh datasets using Transformer, Conformer, and Branchformer as encoders, respectively. The experimental results show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by Interspeech 2023

arXiv:2211.03885 [pdf, other]

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2209.09244 [pdf, other]

Flexible Neural Image Compression via Code Editing

Authors: Chenjian Gao, Tongda Xu, Dailan He, Hongwei Qin, Yan Wang

Abstract: Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility.… ▽ More Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility. In this paper we propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization. Our work is a new paradigm for variable bitrate NIC. Furthermore, experimental results show that our method surpasses existing variable-rate methods, and achieves ROI coding and multi-distortion trade-off with a single decoder. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022

arXiv:2207.14524 [pdf, other]

Evaluating the Practicality of Learned Image Compression

Authors: Hongjiu Yu, Qiancheng Sun, ** Hu, Xingyuan Xue, Jixiang Luo, Dailan He, Yilong Li, Pengbo Wang, Yuanyuan Wang, Yaxu Dai, Yan Wang, Hongwei Qin

Abstract: Learned image compression has achieved extraordinary rate-distortion performance in PSNR and MS-SSIM compared to traditional methods. However, it suffers from intensive computation, which is intolerable for real-world applications and leads to its limited industrial application for now. In this paper, we introduce neural architecture search (NAS) to designing more efficient networks with lower lat… ▽ More Learned image compression has achieved extraordinary rate-distortion performance in PSNR and MS-SSIM compared to traditional methods. However, it suffers from intensive computation, which is intolerable for real-world applications and leads to its limited industrial application for now. In this paper, we introduce neural architecture search (NAS) to designing more efficient networks with lower latency, and leverage quantization to accelerate the inference process. Meanwhile, efforts in engineering like multi-threading and SIMD have been made to improve efficiency. Optimized using a hybrid loss of PSNR and MS-SSIM for better visual quality, we obtain much higher MS-SSIM than JPEG, JPEG XL and AVIF over all bit rates, and PSNR between that of JPEG XL and AVIF. Our software implementation of LIC achieves comparable or even faster inference speed compared to jpeg-turbo while being multiple times faster than JPEG XL and AVIF. Besides, our implementation of LIC reaches stunning throughput of 145 fps for encoding and 208 fps for decoding on a Tesla T4 GPU for 1080p images. On CPU, the latency of our implementation is comparable with JPEG XL. △ Less

Submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.02662 [pdf, other]

Reconfigurable Refractive Surfaces: An Energy-Efficient Way to Holographic MIMO

Authors: Shuhao Zeng, Hongliang Zhang, Boya Di, Haichao Qin, Xin Su, Lingyang Song

Abstract: Holographic Multiple Input Multiple Output (HMIMO), which integrates massive antenna elements into a compact space to achieve a spatially continuous aperture, plays an important role in future wireless networks. With numerous antenna elements, it is hard to implement the HMIMO via phased arrays due to unacceptable power consumption. To address this issue, reconfigurable refractive surface (RRS) is… ▽ More Holographic Multiple Input Multiple Output (HMIMO), which integrates massive antenna elements into a compact space to achieve a spatially continuous aperture, plays an important role in future wireless networks. With numerous antenna elements, it is hard to implement the HMIMO via phased arrays due to unacceptable power consumption. To address this issue, reconfigurable refractive surface (RRS) is an energy efficient enabler of HMIMO since the surface is free of expensive phase shifters. Unlike traditional metasurfaces working as passive relays, the RRS is used as transmit antennas, where the far-field approximation does not hold anymore, urging a new performance analysis framework. In this letter, we first derive the data rate of an RRS-based single-user downlink system, and then compare its power consumption with the phased array. Simulation results verify our analysis and show that the RRS is an energy-efficient way to HMIMO. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: 5 pages, 4 figures

arXiv:2206.10810 [pdf, other]

A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift

Authors: Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li

Abstract: Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computati… ▽ More Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computational costs. In this study, we propose a simple yet effective framework for video restoration. Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique that can implicitly capture inter-frame correspondences for multi-frame aggregation. By introducing grouped spatial shift, we attain expansive effective receptive fields. Combined with basic 2D convolution, this simple framework can effectively aggregate inter-frame information. Extensive experiments demonstrate that our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost, on both video deblurring and video denoising tasks. These results indicate the potential for our approach to significantly reduce computational overhead while maintaining high-quality results. Code is avaliable at https://github.com/dasongli1/Shift-Net. △ Less

Submitted 22 May, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted to CVPR2023

Journal ref: 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

arXiv:2205.14501 [pdf, other]

PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Authors: Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, Yan Wang

Abstract: In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, w… ▽ More In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, we propose PO-ELIC, i.e., Perception-Oriented Efficient Learned Image Coding. To be specific, we adapt ELIC, one of the state-of-the-art LIC models, with adversarial training techniques. We apply a mixture of losses including hinge-form adversarial loss, Charbonnier loss, and style loss, to finetune the model towards better perceptual quality. Experimental results demonstrate that our method achieves comparable perceptual quality with HiFiC with much lower bitrate. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: CVPR2022 Workshop, 5-th CLIC Image Compression Track

arXiv:2205.04721 [pdf, other]

doi 10.1007/s11263-022-01627-3

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network

Authors: Dasong Li, Yi Zhang, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

Abstract: With the growing popularity of smartphones, capturing high-quality images is of vital importance to smartphones. The cameras of smartphones have small apertures and small sensor cells, which lead to the noisy images in low light environment. Denoising based on a burst of multiple frames generally outperforms single frame denoising but with the larger compututional cost. In this paper, we propose a… ▽ More With the growing popularity of smartphones, capturing high-quality images is of vital importance to smartphones. The cameras of smartphones have small apertures and small sensor cells, which lead to the noisy images in low light environment. Denoising based on a burst of multiple frames generally outperforms single frame denoising but with the larger compututional cost. In this paper, we propose an efficient yet effective burst denoising system. We adopt a three-stage design: noise prior integration, multi-frame alignment and multi-frame denoising. First, we integrate noise prior by pre-processing raw signals into a variance-stabilization space, which allows using a small-scale network to achieve competitive performance. Second, we observe that it is essential to adopt an explicit alignment for burst denoising, but it is not necessary to integrate a learning-based method to perform multi-frame alignment. Instead, we resort to a conventional and efficient alignment method and combine it with our multi-frame denoising network. At last, we propose a denoising strategy that processes multiple frames sequentially. Sequential denoising avoids filtering a large number of frames by decomposing multiple frames denoising into several efficient sub-network denoising. As for each sub-network, we propose an efficient multi-frequency denoising network to remove noise of different frequencies. Our three-stage design is efficient and shows strong performance on burst denoising. Experiments on synthetic and real raw datasets demonstrate that our method outperforms state-of-the-art methods, with less computational cost. Furthermore, the low complexity and high-quality performance make deployment on smartphones possible. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted for publication in International Journal of Computer Vision

Journal ref: IJCV 2022

arXiv:2203.16357 [pdf, other]

Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain

Authors: Lina Guo, Xinjie Shi, Dailan He, Yuanyuan Wang, Rui Ma, Hongwei Qin, Yan Wang

Abstract: JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning b… ▽ More JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning based JPEG recompression method that operates on DCT domain and propose a Multi-Level Cross-Channel Entropy Model to compress the most informative Y component. Experiments show that our method achieves state-of-the-art performance compared with traditional JPEG recompression methods including Lepton, JPEG XL and CMIX. To the best of our knowledge, this is the first learned compression method that losslessly transcodes JPEG images to more storage-saving bitstreams. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.10886 [pdf, other]

ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding

Authors: Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, Yan Wang

Abstract: Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we firs… ▽ More Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grou** model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising. △ Less

Submitted 29 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: accepted by CVPR 2022 (oral)

arXiv:2202.07513 [pdf, other]

Post-Training Quantization for Cross-Platform Learned Image Compression

Authors: Dailan He, Ziming Yang, Yuan Chen, Qi Zhang, Hongwei Qin, Yan Wang

Abstract: It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by int… ▽ More It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by introducing well-developed post-training quantization and making the model inference integer-arithmetic-only, which is much simpler than presently existing training and fine-tuning based approaches yet still keeps the superior rate-distortion performance of learned image compression. Based on that, we further improve the discretization of the entropy parameters and extend the deterministic inference to fit Gaussian mixture models. With our proposed methods, the current state-of-the-art image compression models can infer in a cross-platform consistent manner, which makes the further development and practice of learned image compression more promising. △ Less

Submitted 30 November, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

arXiv:2110.12859 [pdf, other]

Multi-vehicle experiment platform: A Digital Twin Realization Method

Authors: Chunying Yang, Jianghong Dong, Qing Xu, Mengchi Cai, Hongmao Qin, Jianqiang Wang, Keqiang Li

Abstract: With the development of V2X technology, multiple vehicles cooperative control has been widely studied. However, filed testing is rarely conducted due to financial and safety considerations. To solve this problem, this study proposes a digital twin method to carry out multi-vehicle experiments, which uses combination of physical and virtual vehicles to perform coordination tasks. To confirm effecti… ▽ More With the development of V2X technology, multiple vehicles cooperative control has been widely studied. However, filed testing is rarely conducted due to financial and safety considerations. To solve this problem, this study proposes a digital twin method to carry out multi-vehicle experiments, which uses combination of physical and virtual vehicles to perform coordination tasks. To confirm effectiveness of this method, a prototype system is developed, which consists of sand table testbed, its twin system and cloud. Several aspects are quantified to describe system performance, including time delay and localization accuracy. Finally, a vehicle level experiment in platoon scenario is carried out and experiment results confirm effectiveness of this method. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.04756 [pdf, other]

Rethinking Noise Synthesis and Modeling in Raw Denoising

Authors: Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li

Abstract: The lack of large-scale real raw image denoising dataset gives rise to challenges on synthesizing realistic raw image noise for training denoising models. However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors. Existing methods are unable to model all noise sources accurately, and building a noise model for each sensor is also laborious. I… ▽ More The lack of large-scale real raw image denoising dataset gives rise to challenges on synthesizing realistic raw image noise for training denoising models. However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors. Existing methods are unable to model all noise sources accurately, and building a noise model for each sensor is also laborious. In this paper, we introduce a new perspective to synthesize noise by directly sampling from the sensor's real noise. It inherently generates accurate raw image noise for different camera sensors. Two efficient and generic techniques: pattern-aligned patch sampling and high-bit reconstruction help accurate synthesis of spatial-correlated noise and high-bit noise respectively. We conduct systematic experiments on SIDD and ELD datasets. The results show that (1) our method outperforms existing methods and demonstrates wide generalization on different sensors and lighting conditions. (2) Recent conclusions derived from DNN-based noise modeling methods are actually based on inaccurate noise parameters. The DNN-based methods still cannot outperform physics-based statistical methods. △ Less

Submitted 23 February, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: ICCV2021

arXiv:2109.14863 [pdf, other]

HLIC: Harmonizing Optimization Metrics in Learned Image Compression by Reinforcement Learning

Authors: Baocheng Sun, Meng Gu, Dailan He, Tongda Xu, Yan Wang, Hongwei Qin

Abstract: Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it d… ▽ More Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it difficult to select the model with best visual quality or overall performance. Towards solving this problem, we propose to Harmonize optimization metrics in Learned Image Compression (HLIC) using online loss function adaptation by reinforcement learning. By doing so, we are able to leverage the advantages of both PSNR and MS-SSIM, achieving better visual quality and higher VMAF score. To our knowledge, our work is the first to explore automatic loss function adaptation for harmonizing optimization metrics in low level vision tasks like learned image compression. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: working paper

arXiv:2103.15306 [pdf, other]

Checkerboard Context Model for Efficient Learned Image Compression

Authors: Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, Hongwei Qin

Abstract: For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our tw… ▽ More For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our two-pass checkerboard context calculation eliminates such limitations on spatial locations by re-organizing the decoding order. Speeding up the decoding process more than 40 times in our experiments, it achieves significantly improved computational efficiency with almost the same rate-distortion performance. To the best of our knowledge, this is the first exploration on parallelization-friendly spatial context model for learned image compression. △ Less

Submitted 1 April, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2008.09103 [pdf, other]

A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data

Authors: Yunxiao Li, Shuai Li, Chenglizhao Chen, Aimin Hao, Hong Qin

Abstract: With the rapid development of deep learning techniques, image saliency deep models trained solely by spatial information have occasionally achieved detection performance for video data comparable to that of the models trained by both spatial and temporal information. However, due to the lesser consideration of temporal information, the image saliency deep models may become fragile in the video seq… ▽ More With the rapid development of deep learning techniques, image saliency deep models trained solely by spatial information have occasionally achieved detection performance for video data comparable to that of the models trained by both spatial and temporal information. However, due to the lesser consideration of temporal information, the image saliency deep models may become fragile in the video sequences dominated by temporal information. Thus, the most recent video saliency detection approaches have adopted the network architecture starting with a spatial deep model that is followed by an elaborately designed temporal deep model. However, such methods easily encounter the performance bottleneck arising from the single stream learning methodology, so the overall detection performance is largely determined by the spatial deep model. In sharp contrast to the current mainstream methods, this paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data by using the newly sensed and coded temporal information. Thus, the retrained image saliency deep model will be able to maintain temporal saliency awareness, achieving much improved detection performance. Moreover, our method is simple yet effective for adapting any off-the-shelf pre-trained image saliency deep model to obtain high-quality video saliency detection. Additionally, both the data and source code of our method are publicly available. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Comments: 12 pages, 10 figures, and, this paper is currently in peer review in IEEE TCSVT

ACM Class: I.4

arXiv:1911.08414 [pdf, other]

Comparison of Deep learning models on time series forecasting : a case study of Dissolved Oxygen Prediction

Authors: Hongqian Qin

Abstract: Deep learning has achieved impressive prediction performance in the field of sequence learning recently. Dissolved oxygen prediction, as a kind of time-series forecasting, is suitable for this technique. Although many researchers have developed hybrid models or variant models based on deep learning techniques, there is no comprehensive and sound comparison among the deep learning models in this fi… ▽ More Deep learning has achieved impressive prediction performance in the field of sequence learning recently. Dissolved oxygen prediction, as a kind of time-series forecasting, is suitable for this technique. Although many researchers have developed hybrid models or variant models based on deep learning techniques, there is no comprehensive and sound comparison among the deep learning models in this field currently. Plus, most previous studies focused on one-step forecasting by using a small data set. As the convenient access to high-frequency data, this paper compares multi-step deep learning forecasting by using walk-forward validation. Specifically, we test Convolutional Neural Network (CNN), Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Recurrent Neural Network (BiRNN) based on the real-time data recorded automatically at a fixed observation point in the Yangtze River from 2012 to 2016. By comparing the average accumulated statistical metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination in each time step, We find for multi-step time series forecasting, the average performance of each time step does not decrease linearly. GRU outperforms other models with significant advantages. △ Less

Submitted 21 November, 2019; v1 submitted 16 November, 2019; originally announced November 2019.

arXiv:1910.04331 [pdf, other]

Agent with Warm Start and Active Termination for Plane Localization in 3D Ultrasound

Authors: Haoran Dou, Xin Yang, Jikuan Qian, Wufeng Xue, Hao Qin, Xu Wang, Lequan Yu, Shujun Wang, Yi Xiong, Pheng-Ann Heng, Dong Ni

Abstract: Standard plane localization is crucial for ultrasound (US) diagnosis. In prenatal US, dozens of standard planes are manually acquired with a 2D probe. It is time-consuming and operator-dependent. In comparison, 3D US containing multiple standard planes in one shot has the inherent advantages of less user-dependency and more efficiency. However, manual plane localization in US volume is challenging… ▽ More Standard plane localization is crucial for ultrasound (US) diagnosis. In prenatal US, dozens of standard planes are manually acquired with a 2D probe. It is time-consuming and operator-dependent. In comparison, 3D US containing multiple standard planes in one shot has the inherent advantages of less user-dependency and more efficiency. However, manual plane localization in US volume is challenging due to the huge search space and large fetal posture variation. In this study, we propose a novel reinforcement learning (RL) framework to automatically localize fetal brain standard planes in 3D US. Our contribution is two-fold. First, we equip the RL framework with a landmark-aware alignment module to provide warm start and strong spatial bounds for the agent actions, thus ensuring its effectiveness. Second, instead of passively and empirically terminating the agent inference, we propose a recurrent neural network based strategy for active termination of the agent's interaction procedure. This improves both the accuracy and efficiency of the localization system. Extensively validated on our in-house large dataset, our approach achieves the accuracy of 3.4mm/9.6° and 2.7mm/9.1° for the transcerebellar and transthalamic plane localization, respectively. Ourproposed RL framework is general and has the potential to improve the efficiency and standardization of US scanning. △ Less

Submitted 3 March, 2024; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: 9 pages, 5 figures, 1 table. Accepted by MICCAI 2019 (oral)

arXiv:1909.07923 [pdf, other]

Lightfield Coordinates Adapted to Asgeirsson's Theorem

Authors: Haotian Li, He Qin, Todor Georgiev

Abstract: John's differential equation and its canonical form, the ultrahyperbolic equation, plays important role in lightfield imaging. The equation describes a local constraint on the lightfield, that was first observed as a "dimensionality gap" in the frequency representation. Related to the ultrahyperbolic equation, Asgeirsson's theorems describe global properties. These indicate new, global, constraint… ▽ More John's differential equation and its canonical form, the ultrahyperbolic equation, plays important role in lightfield imaging. The equation describes a local constraint on the lightfield, that was first observed as a "dimensionality gap" in the frequency representation. Related to the ultrahyperbolic equation, Asgeirsson's theorems describe global properties. These indicate new, global, constraints on the lightfield. In order to help validate those theorems on real captured images, we introduce a coordinate system for the lightfield, which suits better the Asgeirsson theorems, and analyze behaviour in terms of the new coordinates. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: 11 pages, 9 figures

arXiv:1907.01186 [pdf, other]

John Transform and Ultrahyperbolic Equation for Lightfields

Authors: Todor Georgiev, He Qin, Haotian Li

Abstract: This paper explores possibilities for new uses of the Radon transform for imaging and analysis of lightfields. We show that the previously reported Dimansionality Gap can be derived from an ultrahyperbolic PDE, first proposed by F. John, which is satisfied by lightfields. Based on inverse John transform we demonstrate rigorous Focal Stack rendering and viewing from arbitrary angles. Based on Asgei… ▽ More This paper explores possibilities for new uses of the Radon transform for imaging and analysis of lightfields. We show that the previously reported Dimansionality Gap can be derived from an ultrahyperbolic PDE, first proposed by F. John, which is satisfied by lightfields. Based on inverse John transform we demonstrate rigorous Focal Stack rendering and viewing from arbitrary angles. Based on Asgeirsson's theorems for the ultrahyperbolic PDE we derive new kernels for processing lightfields. Our kernels provide alternative methods for depth computation and other image processing in lightfields. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 18 pages, 15 figures

Showing 1–33 of 33 results for author: Qin, H